When it comes to monitoring SQL Server performance, Perfmon (Performance Monitor) is a powerful tool that provides invaluable insights. The question isn't whether to use Perfmon, but where to run it - production or test environments?
Running Perfmon on production servers can be justified when:
- You need real-world baseline metrics that synthetic tests can't replicate
- Troubleshooting intermittent production issues
- Validating performance improvements after changes
# Example Perfmon counters for SQL Server
LogicalDisk(_Total)\% Free Space
SQLServer:Buffer Manager\Buffer cache hit ratio
SQLServer:SQL Statistics\Batch Requests/sec
Processor(_Total)\% Processor Time
If you must run Perfmon in production:
- Limit collection duration (2-4 hours peak time, not continuous)
- Use remote monitoring when possible
- Sample at appropriate intervals (10-15 seconds for most scenarios)
- Set up dedicated monitoring accounts with minimal privileges
For comprehensive performance analysis, consider:
- Creating a production-like QA environment
- Using tools like Distributed Replay to simulate real workloads
- Running extended Perfmon collections without production impact
For less intrusive production monitoring:
-- SQL Server DMV queries for performance insights
SELECT TOP 10
query_stats.query_hash,
SUM(query_stats.total_worker_time) / SUM(query_stats.execution_count) AS avg_cpu_time,
MIN(query_stats.statement_text) AS sample_query_text
FROM
(SELECT
qs.*,
SUBSTRING(st.text, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(st.text)
ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)+1) AS statement_text
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st) AS query_stats
GROUP BY query_stats.query_hash
ORDER BY avg_cpu_time DESC;
Remember that Perfmon is just one tool in your monitoring toolkit. For comprehensive SQL Server performance analysis, combine it with DMVs, Query Store, and Extended Events for complete visibility.
When I first considered running Perfmon on our production SQL Server, I recalled Brent Ozar's classic advice about 48-hour monitoring cycles. But production environments aren't lab setups - they demand careful consideration.
Production servers handle real user traffic, transactions, and SLAs. Unlike Dev/QA environments where we can simulate loads, production has:
- Strict uptime requirements
- Performance-sensitive applications
- Regulatory constraints
Through controlled tests, I measured Perfmon's resource usage:
# Sample PowerShell to measure Perfmon overhead
$counters = "\Processor(_Total)\% Processor Time",
"\Memory\Available MBytes",
"\SQLServer:Buffer Manager\Buffer cache hit ratio"
$sampleInterval = 1 # seconds
$duration = 300 # 5 minutes
$baseline = Get-Counter -Counter $counters -SampleInterval $sampleInterval -MaxSamples $duration
$withPerfmon = Measure-Command {
Start-Process "perfmon.exe" -ArgumentList "/sys"
Start-Sleep -Seconds 60 # Warm-up
Get-Counter -Counter $counters -SampleInterval $sampleInterval -MaxSamples $duration
}
Stop-Process -Name "perfmon" -Force
Write-Host "CPU overhead: $($withPerfmon.TotalMilliseconds - $baseline.TotalMilliseconds) ms"
Instead of continuous Perfmon, consider these alternatives:
1. Targeted Sampling
# Capture peak hours only
$peakStart = Get-Date "09:00"
$peakEnd = Get-Date "11:00"
if ((Get-Date) -ge $peakStart -and (Get-Date) -le $peakEnd) {
logman create trace SQL_Perf -o "C:\PerfLogs\SQL_PeakHours.blg" -f bincirc -v mmddhhmm -max 500 -cnf 01:00:00
logman update SQL_Perf -c "\SQLServer:Buffer Manager\*" "\SQLServer:SQL Statistics\*"
logman start SQL_Perf
}
2. Lightweight Alternatives
For continuous monitoring, these have lower overhead:
- DMV snapshots (using sp_BlitzFirst)
- Extended Events sessions
- Query Store (SQL Server 2016+)
After troubleshooting a particularly nasty performance issue, I created this checklist for when to use Perfmon in production:
- During pre-approved maintenance windows
- For short-term diagnostics (2-4 hours max)
- When correlating with other monitoring tools
- With carefully selected counters (avoid "*")
Instead of collecting everything, focus on these key counters:
# Minimal impactful counter set
$essentialCounters = @(
"\Processor(_Total)\% Privileged Time",
"\SQLServer:Buffer Manager\Page life expectancy",
"\SQLServer:SQL Statistics\Batch Requests/sec",
"\PhysicalDisk(*)\Avg. Disk sec/Read"
)
logman create counter SQL_Minimal -o "C:\PerfLogs\SQL_Minimal.blg" -f csv -si 15 -c $essentialCounters
After gathering data, use PAL (Performance Analysis of Logs) tool for automated threshold analysis rather than manual review.