Microsoft Enterprise Platforms Support: Windows Server Core Team
EPS Team Blogs
Product Team Blogs
My name is Flavio Muratore and I am a Senior Support Escalation Engineer with the Windows Core Team. One subject we haven’t written much about in the Core team blog is “disk performance”.
Today I would like to talk a little bit about measuring Physical Disk IO Latency with Windows Performance Monitor (perfmon). Most likely you have some experience with Perfmon, it’s been around since the NT days. You have probably heard general statements about what are acceptable disk latency measurements: “Less than 10 milliseconds is good and more than 20 milliseconds is bad”. Although these rules of thumb are used to simplify analysis, they do not apply in all cases and may lead to incorrect conclusions. Let’s check how this really works so we can understand these numbers.
Summary: The IO latency measured in perfmon includes all the time spent in the hardware layers as well as the time spent in the Microsoft Port Driver queue (Storport.sys for SCSI). If the running processes generate a large storport queue, the measured latency increases, as IO has to wait before getting dispatched to the hardware layers.
What is disk IO latency? We can define disk IO latency as: A measure of the time delay from the time a disk IO request is created, until the time the disk IO request is completed.
What counters in Windows Performance Monitor show the physical disk latency? “Physical disk performance object -> Avg. Disk sec/Read counter” - Shows the average read latency. “Physical disk performance object -> Avg. Disk sec/Write counter” - Shows the average write latency. “Physical disk performance object -> Avg. Disk sec/Transfer counter” - Shows the combined averages for both read and writes. The “_Total” instance is an average of the latencies for all physical disks in the computer. Each other instance represents an individual Physical Disk.
Note: Do no confuse with Avg. Disk Transfers/sec, which is a completely different counter.
Where does the performance data comes from? For the “physical disk performance object”, the data is captured at the “Partition Manager” level in the storage stack. Keep in mind Perfmon does not create any performance data per se; it only consumes data provided by other subsystems within Windows.
Where is the partition Manager in the Storage Stack?
A simplified explanation on the Windows Storage Stack follows. When an application creates an IO request, it sends it to the Windows IO Subsystem (at the top of the stack). The IO will then make its way all the way down the stack (to the Hardware Disk Subsystem) and then come all the way back up. During this process, each layer will perform its function and then hand over the IO to the next layer.
So what are we really measuring with the Physical disk performance object -> Avg. Disk sec/Transfer (or /Read, or /Write) counter? We are measuring all the time spent below the partition manager level. When the IO request is sent by the Partition Manager down the stack we time stamp it, when it arrives back we time stamp it again and calculate the time difference. The time difference is the latency.
This means we are accounting for the time spent in the following components:
How disk queuing affects the measured latency in Perfmon? There is only a limited number of IO a disk subsystem can accept at a given time. The excess IO gets queued until the disk can accept IO again. The time IO spends in the queues below the Partition Manager is accounted in the Perfmon physical disk latency measurements. As queues grow larger and IO has to wait longer, the measured latency also grows.
There are a multiple queues below the Partition Manager level:
Finally, special attention to the Port Driver Queue (for SCSI Storport.sys). The Port Driver is the last Microsoft component to touch an IO before we hand it off to the vendor supplied Device Miniport Driver. If the Device Miniport Driver can’t accept any more IO because its queue and/or the hardware queues below are saturated, we will start accumulating IO on the Port Driver Queue. The size of the Microsoft Port Driver queue is limited only by the available system memory (RAM) and can grow very large, causing large measured latency. In Conclusion: The time the IO spent in queue is added to the disk latency in perfmon.
To keep the queue under control you have to tune your applications to limit the maximum number of outstanding I/O operations they generate. That’s a subject for another blog post.
Reference: For SCSI Disks (FC/RAID) you can enable Storport tracing to measure the latency below the Port Driver level. This does not account for the time spent in the storport queue or anything above. Essentially this is the lowest level we can possibly monitor the latency inside Windows before the IO is handed over to third party components. Check this excellent blog from NTdebug team for details. “Storport ETW Logging to Measure Requests Made to a Disk Unit” http://blogs.msdn.com/b/ntdebugging/archive/2010/04/22/etw-storport.aspx
Flavio Muratore Senior Support Escalation Engineer Microsoft Enterprise Platforms Support
Interesting post. Is this something that works with Cluster Shared Volumes? For both owner and non-owner node?
Gran bel post! (Great post!)
It was short and well written, haven't seen this simple explanation of these perfmon counters. Would love to see similar short post on other important performance counters. The picture added great value, Great work, txs.
How does a virtualized machine affect the counters? I fight with my VMWare admins repeatedly about how these Perfmon counters are not viable for Windows VMs...
Some counters are fine, but time based ones need to be treated with Suspicion. I was sure there was a document floating around which listed which ones where safe.
This blog post does apply both to Windows VMs and for the for the Hyper-V host as well.
The special consideration for virtual machines has to do with how VMs calculate time. Because the VMs do not have a hardware real time clock, their time may drift slightly and affect the numbers in the counters. Although this is true, perfmon deals with averages and the results should be useful for a disk performance analysis.
The concept described here will be complemented by a blog I am finishing up on how disk queue works. Stay tuned!
Thank you all for the comments.
what is the latency unit of the counter listed in this article? ms??
The unit is seconds but have millisecond precision.
Avg. Disk sec/Transfer (Avg. Disk sec/Read, Avg. Disk sec/Write)
Displays the average time the disk transfers took to complete, in seconds. Although the scale is seconds, the counter has millisecond precision, meaning a value of 0.004 indicates the average time for disk transfers to complete was 4 milliseconds.
This is the counter in Perfmon used to measure IO latency.
For other counters see: Windows Performance Monitor Disk Counters Explained - blogs.technet.com/.../windows-performance-monitor-disk-counters-explained.aspx
Excellent, just what I was looking for thank you.
Good Stuff, Flavio!! Glad you got learnt something in ELS and sharing it out! :)
At what level do filter-leel drivers fit in? Anti-virus software applications tend to introduce those and cause IO latency issues that are difficult to confirm without removing the driver on the assumption it's the problem. Comparing two counters would be a great alternative. I've seen articles using kernrate to compare Safe Mode and Normal Mode execution, but that would require physical or console access to test and couldn't easily be done on a remote VM.
It would be nice if we could limit disk i/o per process and/or per virtual machine like in vmware..
Nice explanation, short & conclusive.
Very nice article. In this article you mentioned about [The size of the Microsoft Port Driver queue is limited only by the available system memory (RAM) and can grow very large, causing large measured latency. ] Can you please elaborate more on that. From SQL Server standpoint, Log Block is size of 64KB and cannot go or grow beyond that as per design. So I want to know how RAM limitation comes into play in regards to your article and statement.
is it OK when maximum value of Avg. Disk sec/Transfer gets over 1s during standard Windows Performamce 1 minute test ??