Welcome to TechNet Blogs Sign in | Join | Help

Understanding Perfmon Counters while troubleshooting SQL Server Performance Issues

Understanding Performance Counters data while troubleshooting Performance issues in SQL Server

To troubleshoot overall Database system performance issue, analyzing performance counters is the best way to start.    By collecting performance counters during busy period for few days consistently and analyzing those data would give a better idea about overall system problems regarding Memory, CPU, and/or Disk I/O.   Please note, for troubleshooting a particular SQL problem such as a stored procedure or a piece of T-SQL, it is better to look at the query execution plan and SQL Trace data and identify the need of redesigning a query or table indexes.    

 Following are some key performance counters to use while assessing a performance issues on SQL Server. 

Memory:

Memory and Disk I/O complements each other.   Memory issues on the system could affect disk I/O and vice versa.   It is very critical to carefully observe the trend of performance counters data over a long period of time to identify the real problem.     

·         Memory\Available Mbytes

Threshold: A consistent value of less than 20 to 25 percent of installed RAM is an indication of insufficient memory.

Significance: This indicates the amount of physical memory available to processes running on the computer. Note that this counter displays the last observed value only. It is not an average.

 

·         Memory\Page Reads/sec

Threshold: Sustained values of more than five indicate a large number of page faults for read requests.

Significance: This counter indicates that the working set of your process is too large for the physical memory and that it is paging to disk. It shows the number of read operations, without regard to the number of pages retrieved in each operation. Higher values indicate a memory bottleneck.

If a low rate of page-read operations coincides with high values for Physical Disk\% Disk Time and Physical Disk\Avg. Disk Queue Length, there could be a disk bottleneck. If an increase in queue length is not accompanied by a decrease in the pages-read rate, a memory shortage exists.

 

·         Memory\Page Input/Sec:

Threshold: The value should not exceed 15.  Higher the value poor the performance.

Significance:  Pages Input/sec is the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk.

 

·         Memory\Pages/sec

Threshold: Sustained values higher than five indicate a bottleneck.

Significance: This counter indicates the rate at which pages are read from or written to disk to resolve hard page faults. To determine the impact of excessive paging on disk activity, multiply the values of the Physical Disk\ Avg. Disk sec/Transfer and Memory\ Pages/sec counters.  If the product of these counters exceeds 0.1, paging is taking more than 10 percent of disk access time, which indicates that you need more RAM.  If this occurs over a long period, you probably need more memory.

A high value of Pages/sec indicates that your application does not have sufficient memory. The average of Pages Input/sec divided by average of Page Reads/sec gives the number of pages per disk read. This value should not generally exceed five pages per second. A value greater than five pages per second indicates that the system is spending too much time paging and requires more memory (assuming that the application has been optimized). 

Processor

To measure processor utilization and context switching, you can use the following counters:  

·         Processor\% Processor Time

Threshold: The general figure for the threshold limit for processors is 65 percent.

Significance: This counter is the primary indicator of processor activity. High values many not necessarily be bad. However, if the other processor-related counters are increasing linearly such as % Privileged Time or Processor Queue Length, high CPU utilization may be worth investigating.

 

·         Processor\% Privileged Time

Threshold: A figure that is consistently over 75 percent indicates a bottleneck.

Significance: This counter indicates the percentage of time a thread runs in privileged mode. When your application calls operating system functions (for example to perform file or network I/O or to allocate memory), these operating system functions are executed in privileged mode.

 

·         System\Context Switches/sec

Threshold: As a general rule, context switching rates of less than 5,000 per second per processor are not worth worrying about. If context switching rates exceed 15,000 per second per processor, then there is a constraint.

Significance: Context switching happens when a higher priority thread preempts a lower priority thread that is currently running or when a high priority thread blocks. High levels of context switching can occur when many threads share the same priority level. This often indicates that there are too many threads competing for the processors on the system. If you do not see much processor utilization and you see very low levels of context switching, it could indicate that threads are blocked.  

Disk I/O

To measure disk I/O activity, you can use the following counters:  

·         PhysicalDisk\Avg. Disk Queue Length

Threshold: Should not be higher than the number of spindles plus two.

Significance: This counter indicates the average number of both read and writes requests that were queued for the selected disk during the sample interval.

 

·         PhysicalDisk\Avg. Disk Read Queue Length

Threshold: Should be less than two.

Significance: This counter indicates the average number of read requests that were queued for the selected disk during the sample interval.

 

·         PhysicalDisk\Avg. Disk Write Queue Length

Threshold: Should be less than two.

Significance: This counter indicates the average number of write requests that were queued for the selected disk during the sample interval.

 

·         PhysicalDisk\Avg. Disk sec/Read

Threshold: No specific value.

      Less than 10 ms – very good

      Between 10-20 ms – okay

      Between 20-50 ms – slow, needs attention

      Greater than 50 ms – Serious I/O bottleneck

 

Significance: This counter indicates the average time, in seconds, of a read of data from the disk.

 

·         PhysicalDisk\Avg. Disk sec/Transfer

Threshold: Should not be more than 18 milliseconds.

Significance: This counter indicates the time, in seconds, of the average disk transfer. This may indicate a large amount of disk fragmentation, slow disks, or disk failures. Multiply the values of the Physical Disk\Avg. Disk sec/Transfer and Memory\Pages/sec counters. If the product of these counters exceeds 0.1, paging is taking more than 10 percent of disk access time, so you need more RAM.

 

·         PhysicalDisk\Disk Writes/sec

Threshold: Depends on manufacturer's specification.

Significance: This counter indicates the rate of write operations on the disk.

 

·         Physical Disk: %Disk Time

Threshold:  Greater than 50 percent, it represents an I/O bottleneck

Significance: Represents the percentage of elapsed time that the selected disk drive was busy servicing read or write requests.

 

·         Physical Disk\Avg. Disk Reads/Sec  and Physical Disk\Avg. Disk Writes/Sec

Threshold: It should be less than 85% of the disk capacity

Significance: It represents the rate of read operations on the disk.

When using above counters, you may need to adjust the values for RAID configurations using the following formulas.

Raid 0 -- I/Os per disk = (reads + writes) / number of disks

Raid 1 -- I/Os per disk = [reads + (2 * writes)] / 2

Raid 5 -- I/Os per disk = [reads + (4 * writes)] / number of disks

Raid 10 -- I/Os per disk = [reads + (2 * writes)] / number of disks

For example, you have a RAID-1 system with two physical disks with the following values of the counters.

Disk Reads/sec            80

Disk Writes/sec           70

Avg. Disk Queue Length    5

In that case, you are encountering (80 + (2 * 70))/2 = 110 I/Os per disk and your disk queue length = 5/2 = 2.5 which indicates a border line I/O bottleneck.

Disk I/O issues can be minimized by having larger and sequential I/O activities.    To achieve than you should consider partitioning.    Partitioning can be done in many ways – Database partitioning, Tempdb partitioning, Table partitioning, index partitioning.    At a minimum, it is highly recommended to partition your database in such a way that clustered index, non-clustered index, and transaction logs are each on a separate physical drive and spread across multiple logical filegroups/files within that drive.     

Published Thursday, November 30, 2006 4:34 PM by vishah

Comments

# Links for using Performance Counters (perfmon) while troubleshooting SQL Server related issues

Thursday, November 06, 2008 10:45 AM by SQL Server SSIS and Replication

Understanding Performance Counters data while troubleshooting Performance issues in SQL Server: http://blogs.technet.com/vipulshah/archive/2006/11/30/understanding-perfmon-counters-while-troubleshooting-sql-server-performance-issues.aspx

# Performance Monitor, SQL Counters and Tresholds

Monday, November 17, 2008 5:04 AM by There's Something about SQL!

Whilst surfing on the web last week, I've found this very nice blog from Vipul Shah, which happens to

# re: Understanding Perfmon Counters while troubleshooting SQL Server Performance Issues

Wednesday, December 03, 2008 7:33 AM by acollins74

you mention above PhysicalDisk\Avg. Disk Queue Length should be number of spindles plus two but I believe thats a typo.  I have always gone by the rule its the number of spindles times two.

# re: Understanding Perfmon Counters while troubleshooting SQL Server Performance Issues

Friday, December 19, 2008 8:28 PM by bradmcgehee

Hi. I was reading this blog and noticed that many of the recommendations you make in it are significantly different that from other sources claim. I am curious as to where these numbers come from. Are they based directly on research done at Microsoft, and are they the "official guidelines" from Microsoft for this particular counters? And one more thing, do these recommendations apply to all versions of SQL Server, or to just some versions of SQL Server?

Anonymous comments are disabled
 
Page view tracker