Perfmon on Production Servers: Best Practices for SQL Server Performance Monitoring


2 views

When it comes to monitoring SQL Server performance, Perfmon (Performance Monitor) is a powerful tool that provides invaluable insights. The question isn't whether to use Perfmon, but where to run it - production or test environments?

Running Perfmon on production servers can be justified when:

  • You need real-world baseline metrics that synthetic tests can't replicate
  • Troubleshooting intermittent production issues
  • Validating performance improvements after changes
# Example Perfmon counters for SQL Server
LogicalDisk(_Total)\% Free Space
SQLServer:Buffer Manager\Buffer cache hit ratio
SQLServer:SQL Statistics\Batch Requests/sec
Processor(_Total)\% Processor Time

If you must run Perfmon in production:

  • Limit collection duration (2-4 hours peak time, not continuous)
  • Use remote monitoring when possible
  • Sample at appropriate intervals (10-15 seconds for most scenarios)
  • Set up dedicated monitoring accounts with minimal privileges

For comprehensive performance analysis, consider:

  1. Creating a production-like QA environment
  2. Using tools like Distributed Replay to simulate real workloads
  3. Running extended Perfmon collections without production impact

For less intrusive production monitoring:

-- SQL Server DMV queries for performance insights
SELECT TOP 10 
    query_stats.query_hash,
    SUM(query_stats.total_worker_time) / SUM(query_stats.execution_count) AS avg_cpu_time,
    MIN(query_stats.statement_text) AS sample_query_text
FROM 
    (SELECT 
         qs.*,
         SUBSTRING(st.text, (qs.statement_start_offset/2)+1, 
                  ((CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(st.text)
                   ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)+1) AS statement_text
     FROM sys.dm_exec_query_stats AS qs
     CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st) AS query_stats
GROUP BY query_stats.query_hash
ORDER BY avg_cpu_time DESC;

Remember that Perfmon is just one tool in your monitoring toolkit. For comprehensive SQL Server performance analysis, combine it with DMVs, Query Store, and Extended Events for complete visibility.


When I first considered running Perfmon on our production SQL Server, I recalled Brent Ozar's classic advice about 48-hour monitoring cycles. But production environments aren't lab setups - they demand careful consideration.

Production servers handle real user traffic, transactions, and SLAs. Unlike Dev/QA environments where we can simulate loads, production has:

  • Strict uptime requirements
  • Performance-sensitive applications
  • Regulatory constraints

Through controlled tests, I measured Perfmon's resource usage:

# Sample PowerShell to measure Perfmon overhead
$counters = "\Processor(_Total)\% Processor Time", 
            "\Memory\Available MBytes",
            "\SQLServer:Buffer Manager\Buffer cache hit ratio"

$sampleInterval = 1 # seconds
$duration = 300 # 5 minutes

$baseline = Get-Counter -Counter $counters -SampleInterval $sampleInterval -MaxSamples $duration
$withPerfmon = Measure-Command {
    Start-Process "perfmon.exe" -ArgumentList "/sys"
    Start-Sleep -Seconds 60 # Warm-up
    Get-Counter -Counter $counters -SampleInterval $sampleInterval -MaxSamples $duration
}
Stop-Process -Name "perfmon" -Force

Write-Host "CPU overhead: $($withPerfmon.TotalMilliseconds - $baseline.TotalMilliseconds) ms"

Instead of continuous Perfmon, consider these alternatives:

1. Targeted Sampling

# Capture peak hours only
$peakStart = Get-Date "09:00"
$peakEnd = Get-Date "11:00"
if ((Get-Date) -ge $peakStart -and (Get-Date) -le $peakEnd) {
    logman create trace SQL_Perf -o "C:\PerfLogs\SQL_PeakHours.blg" -f bincirc -v mmddhhmm -max 500 -cnf 01:00:00
    logman update SQL_Perf -c "\SQLServer:Buffer Manager\*" "\SQLServer:SQL Statistics\*"
    logman start SQL_Perf
}

2. Lightweight Alternatives

For continuous monitoring, these have lower overhead:

  • DMV snapshots (using sp_BlitzFirst)
  • Extended Events sessions
  • Query Store (SQL Server 2016+)

After troubleshooting a particularly nasty performance issue, I created this checklist for when to use Perfmon in production:

  1. During pre-approved maintenance windows
  2. For short-term diagnostics (2-4 hours max)
  3. When correlating with other monitoring tools
  4. With carefully selected counters (avoid "*")

Instead of collecting everything, focus on these key counters:

# Minimal impactful counter set
$essentialCounters = @(
    "\Processor(_Total)\% Privileged Time",
    "\SQLServer:Buffer Manager\Page life expectancy",
    "\SQLServer:SQL Statistics\Batch Requests/sec",
    "\PhysicalDisk(*)\Avg. Disk sec/Read"
)

logman create counter SQL_Minimal -o "C:\PerfLogs\SQL_Minimal.blg" -f csv -si 15 -c $essentialCounters

After gathering data, use PAL (Performance Analysis of Logs) tool for automated threshold analysis rather than manual review.