Mark Russinovich’s technical blog covering topics such as Windows troubleshooting, technologies and security.
A few weeks ago a poster with the handle dloneranger reported in the 2CPU forums that he experienced reduced network throughput on his Vista system when he played audio or video. Other posters chimed in with similar results, and in the last week attention has been drawn to the behavior by other sites, including Slashdot and Zdnet blogger Adrian Kingsley-Hughes.
Many people have correctly surmised that the degradation in network performance during multimedia playback is directly connected with mechanisms employed by the Multimedia Class Scheduler Service (MMCSS), a feature new to Windows Vista that I covered in my three-part TechNet Magazine article series on Windows Vista kernel changes. Multimedia playback requires a constant rate of media streaming, and playback will glitch or sputter if its requirements aren’t met. The MMCSS service runs in the generic service hosting process Svchost.exe, where it automatically prioritizes the playback of video and audio in order to prevent other tasks from interfering with the CPU usage of the playback software:
When a multimedia application begins playback, the multimedia APIs it uses call the MMCSS service to boost the priority of the playback thread into the realtime range, which covers priorities 16-31, for up to 8ms of every 10ms interval of the time, depending on how much CPU the playback thread requires. Because other threads run at priorities in the dynamic priority range below 15, even very CPU intensive applications won’t interfere with the playback.
You can see the boost by playing an audio or video clip in Windows Media Player (WMP), running the Reliability and Performance Monitor (Start->Run->Perfmon), selecting the Performance Monitor item, and adding the Priority Current value for all the Wmplayer threads in the Thread object. Set the graph scale to 31 (the highest priority value on Windows) and you’ll easily spot the boosted thread, shown here running at priority 21:
Besides activity by other threads, media playback can also be affected by network activity. When a network packet arrives at system, it triggers a CPU interrupt, which causes the device driver for the device at which the packet arrived to execute an Interrupt Service Routine (ISR). Other device interrupts are blocked while ISRs run, so ISRs typically do some device book-keeping and then perform the more lengthy transfer of data to or from their device in a Deferred Procedure Call (DPC) that runs with device interrupts enabled. While DPCs execute with interrupts enabled, they take precedence over all thread execution, regardless of priority, on the processor on which they run, and can therefore impede media playback threads.
Network DPC receive processing is among the most expensive, because it includes handing packets to the TCP/IP driver, which can result in lengthy computation. The TCP/IP driver verifies each packet, determines the packet’s protocol, updates the connection state, finds the receiving application, and copies the received data into the application’s buffers. This Process Explorer screenshot shows how CPU usage for DPCs rose dramatically when I copied a large file from another system:
Tests of MMCSS during Vista development showed that, even with thread-priority boosting, heavy network traffic can cause enough long-running DPCs to prevent playback threads from keeping up with their media streaming requirements, resulting in glitching. MMCSS’ glitch-resistant mechanisms were therefore extended to include throttling of network activity. It does so by issuing a command to the NDIS device driver, which is the driver that gives packets received by network adapter drivers to the TCP/IP driver, that causes NDIS to “indicate”, or pass along, at most 10 packets per millisecond (10,000 packets per second).
Because the standard Ethernet frame size is about 1500 bytes, a limit of 10,000 packets per second equals a maximum throughput of roughly 15MB/s. 100Mb networks can handle at most 12MB/s, so if your system is on a 100Mb network, you typically won’t see any slowdown. However, if you have a 1Gb network infrastructure and both the sending system and your Vista receiving system have 1Gb network adapters, you’ll see throughput drop to roughly 15%.
Further, there’s an unfortunate bug in the NDIS throttling code that magnifies throttling if you have multiple NICs. If you have a system with both wireless and wired adapters, for instance, NDIS will process at most 8000 packets per second, and with three adapters it will process a maximum of 6000 packets per second. 6000 packets per second equals 9MB/s, a limit that’s visible even on 100Mb networks.
I caused throttling to be visible on my laptop, which has three adapters, by copying a large file to it from another system and then starting WMP and playing a song. The Task Manager screenshot below shows how the copy achieves a throughput of about 20%, but drops to around 6% on my 1Gb network after I start playing a song:
You can monitor the number of receive packets NDIS processes by adding the “packets received per second” counter in the Network object to the Performance Monitor view. Below, you can see the packet receive rate change as I ran the experiment. The number of packets NDIS processed didn’t realize the theoretical throttling maximum of 6,000, probably due to handshaking with the remote system.
Despite even this level of throttling, Internet traffic, even on the best broadband connection, won’t be affected. That’s because the multiplicity of intermediate connections between your system and another one on the Internet fragments packets and slows down packet travel, and therefore reduces the rate at which systems transfer data.
The throttling rate Vista uses was derived from experiments that reliably achieved glitch-resistant playback on systems with one CPU on 100Mb networks with high packet receive rates. The hard-coded limit was short-sighted with respect to today’s systems that have faster CPUs, multiple cores and Gigabit networks, and in addition to fixing the bug that affects throttling on multi-adapter systems, the networking team is actively working with the MMCSS team on a fix that allows for not so dramatically penalizing network traffic, while still delivering a glitch-resistant experience.
Stay tuned to my blog for more information.
Mike: I feel exactly your pain. I've been using vista at home for about 3 months. It feels slow compared to my XP box at work. I don't do much network transfer speeds at home, so could live with it.
Now I upgraded to Vista at work where I do a lot of network transfers from mapped drives and it's just horrible.
I'll be going back to XP anyday now.
Why has this site (main page of Mark's blog) suddenly started asking me for a certificate? I frequented it regularly in the last few days and it hasn't done so. It started today. I use Firefox 2.xx. Sorry for off-topic.
On the whole the many comments above have been quite educational for me and I thank the commenters. I would have welcomed an explanation of why Vista's TCP processing is taking over six times longer to execute than Linux's (consuming 40% of available clock cycles compared to 7%). The absence of an effort to explain it suggests that people simply don't know why. In the end it's the main question.
CAVEAT: I stopped being a computer engineer a while ago, so it is possible that I am missing some obvious things. If anyone would chip in it would be greatly appreciated.
There are several things I don't understand.
1. What is the point of contention? So far it looks like the only contention is computing power. Then this problem should not happen on a multi core processor - one core processes network packets while another core handles multi media. I am being led to believe that either the point of contention is not computing power (but then what is the constraint?) or the multimedia system used a quick hack to cope with the current hardware, and failed to put in place a fix for future hardware.
2. So the multi media engineer goes tries to solve the glitch solution. It's a hard problem, he or she thinks a long time about this and comes up with the only answer: throttle the network. Not only that, but she or he puts in a hard coded limit. What I don't understand is why is this situation a surprise at all? According to Larry Osterman's blog it took a team of engineers from Tuesday morning until Friday afternoon to find the root cause. I don't get it!!! I can think of two possible answers: 1) the engineer that made the change left the company and nobody knew about this dependency. 2) components in Windows are so interconnected that a change in one component shows up in a completely different component. For Window's long term future as well as the prestige of the Windows engineering team I hope the right answer is answer 1: the engineer left the company.
All in all I am left with a bad taste regarding this particular engineering team. They clearly dropped the ball on this one.
Now I want to offer some un-asked advice:
1. fire the engineer to made the design. If the engineer received a code review, fire the code reviewer as well. If said engineers are hot shots then it won't matter to them - she or he will get another job easily. If the engineers are not so hot, then it's better for the team to make room for better qualified professionals. Either way, it will send a strong signal to the rest of the team: think hard about what solutions you give and be more careful when giving code reviews. I remember the last time I got a code review in which I hard coded a value. The hard coded value would have probably stopped working a year after putting the product on the market. I had to come back in the weekend to put in place the proper solution. This is a job, not some kid's game.
2. If there are any college kids reading, here is something I learned in my networking class: in general, when investigating latencies focus on high throughput networks. Low throughput networks are too coarse to identify incidents - it takes more to move the needle on a low throughput network than it does on a high thouroghput network.
If playing ANY audio will degrade network performance, then let's imagine a scenario where a telemedicine software is installed on a Vista machine... the high bandwidth network interface supplies both audio and video; so Vista it appears the quality of both video AND audio will suffer.
I think hospitals acquiring new PCs must now consider upgrading to XP which doesn't have these problems!
"If anyone would chip in it would be greatly appreciated."
"What is the point of contention? So far it looks like the only contention is computing power. Then this problem should not happen on a multi core processor"
The point of contention is NOT computing power. The point of contention is making sure that every 10ms the audio buffer gets refilled. The problem they noticed what that network packet arrival caused these DPCs to be created that were long running, and were at a higher priority then the audio thread, so the audio thread would be stalled. Remember not everyone has multiple core CPUs, and Vista was designed for everyone.
Secondly, how can you come in with an attidude like that? It took then Tuesday-Friday to come up with the answer? That's incredibly fast. They had to reproduce the initial problem. Then there was probably several meetings over the course of the 4 days to catch up, discuss current findings and make sure everyone is on the same page. Then they have to verify exactly what is taking place, and decide how to approach it. And all of this as well as what they need to do for any other projects that might be going on.
I can't recall where I heard this but I thought I had read somewhere that this wasn't supposed to be hard coded, and it was intended to be in the registry. That's not a design problem, that 's an implementation problem. And hard coding doesn't necessarily mean the value was written directly in the code right there. It could have been some constant in another realm somewhere, and there was a disconnect between the two sub systems for what was actually going on.
As for the employee who wrote this, they might have been on vacation, transfered to another group, working on other projects, (Most likely it wasn't that everyone in the networking and MM group was trouble shooting this). Event then, they might not have made the link instantaneously.
Should they be fired over this? Absolutely not. It was a mistake. Plain and simple. I'm sure you've made them before. If not, I've got a job for you. I want to pay you 100k a year to invest 200k and double it. What you don't want to? But you never make mistakes, SURELY you can do it.
I experimented with the removal of the dependency of the MMCSS Service. I copied some stuff from one encrypted drive to another (CPU hog). I play some MP3 in the background. This always worked on XP without stuttering. Now, on Vista, the sound gets heavily distorted. I'm on Vista 64 and have all the latest drivers, BIOS and have not. It seems like a bad joke. Just read slashdot. People pump out data on really outdated machines at whatever speed the machine can handle and play MP3 or Video at the same time. On Linux. No stuttering.
Brian - thank you for taking the time to respond.
Like I said, I have been out of the field for a while. But...
"The point of contention is NOT computing power. The point of contention is making sure that every 10ms the audio buffer gets refilled. The problem they noticed what that network packet arrival caused these DPCs to be created that were long running, and were at a higher priority then the audio thread, so the audio thread would be stalled."
According to you, the problem is that the audio thread pre-empts the networking thread. To me this implies that the point of contention is the CPU - i.e. computing power.
"Remember not everyone has multiple core CPUs, and Vista was designed for everyone."
Vista is (was?) supposed to be the operating system for the next 10 years. As such it should have planned for technology improvements. If your comment above reflects the actual thinking during Vista design process, then I am at loss for words.
"Secondly, how can you come in with an attidude like that?"
(Left in the original mis-spelled word) It's just my opinion, take it or leave it. But when your media design explicitly throttles the network and then it takes the engineering team the better half of a week to find that out then either:
1. there is a serious disconnect between different teams inside Vista. I am thinking something along the lines of work done by media team affecting the networking team and no communication between them.
2. the Vista design is fragile. I would liken it to a balloon - you push it in one spot and then another random spot expands. You make a change in one spot and it affects another random spot.
3. the engineering team that did the investigation is just not up to speed. I mean how much more clear than this can it get: media playing throttles the network. There are reports that media playing affects network. And nobody knows what is going on. THIS IS LAME!!!!
In any of these cases is true there is room for improvement for the Vista engineering team. My comment stands.
"I can't recall where I heard this but I thought I had read somewhere that this wasn't supposed to be hard coded, and it was intended to be in the registry. That's not a design problem, that 's an implementation problem."
I hope you realize that your "solution" is nothing more than a band-aid. And I hope you don't push this as sound design - some college kids may be still reading this thread.
"Should they be fired over this? Absolutely not. It was a mistake. Plain and simple."
Let's not fire them. And leave the message that you can seriously screw up and get away with it. But seriously now: I don't know why you would advocate sending this message.
I was thinking trying something different: instead of firing the engineers that came up with the design, make them work for free until they fix it :). Hey, if they are hot-shots it should take no time at all. Actually, let's expand this idea: engineers should not be paid for the time it takes to fix any bug. There should be an economic penalty for screwing things up.
"I'm sure you've made them before. If not, I've got a job for you. I want to pay you 100k a year to invest 200k and double it. What you don't want to? But you never make mistakes, SURELY you can do it."
Why would I want to take this job? I already make more than what you mention with way smaller targets. Honestly, you would have to up your offer to be taken seriously. Much like some of your other comments...
It's me again.
Larry Osterman's blog has more technical information - look around http://blogs.msdn.com/larryosterman/archive/2007/08/28/windows-vista-sound-causes-network-throughput-slowdowns.aspx#4615267.
Here is the contention: in Vista interrupts run only on CPU0. Both networking and media systems are driven by interrupts so they have to share CPU0. And in Vista media trumps network.
It seems that other OS'es are able to distribute interrupts across different cores - see comment in Lary Osterman's blog: http://blogs.msdn.com/larryosterman/archive/2007/08/28/windows-vista-sound-causes-network-throughput-slowdowns.aspx#4619393
Well, I guess Vista can't do everything.
This was an interesting problem to look at. Good geeky start to the Labor Day weekend.
@atglabs: "But what if that thread is swapped with a higher priority thread while IRPTs are disabled? It's a really complicated problem to solve correctly."
I fail to see how you could preempt a thread that has interrupts disabled.
@everyone else: It has been mentioned that disabling MMCSS seems to solve this problem quite well, so if you are being effected by it I would suggest using that as a workaround. Also, experimentation leads me to believe that disabling every service except DHCP and DNSCache lead to the best system performance.
On my system
Athlon X2 4600+
My audio stutters like crazy whenever there is any hdd activity on the primary drive. I've been tearing my hair out trying to sort it out.
A PC with this power on a modern OS should not have issues playing mp3s and using the hdd at the same time!
I can't even begin to understand how anyone could have thought this was a good idea... much less got enough buy in to the idea that it actually ended up as a (mis)feature in Microsoft's new flagship operating system.
For instance, simply having an application that like Steam (a common gaming service that you would normally leave minimized in your system tray) that relies on multimedia functionality causes Vista's network throttling to kick in as well. I spent hours trying to figure out why my gigabit nic was transferring at a whopping 7 MB/s while I wasn't playing audio.
The really sad thing here? Most of the times the throttling will be triggered and make a difference, it would have been better all round to use a bigger buffer instead: when simply playing back ripped CDs, for example, there is absolutely no need for the 10ms response times - except when skipping tracks, when you can simply flush the buffer, you know not just seconds but whole *minutes* in advance what is going to be played, and can buffer accordingly! (Something the iPod exploits to cut power consumption, by reading several minutes' worth of music in a single burst then powering the hard drive down until the next burst is needed, minutes later.)
Yes, for game sound effects you need fast response times: you don't always know in advance when to generate explosion sound effect, so you need to work with short buffers and fast response times - but for music playing, using 10ms segments is not just pointless, it's a net loss, both in efficiency (you're performing 100 times as many context switches as a 1s buffer would require!) and in the detrimental effects we've already seen on other Windows components.
So, what's the verdict?
How do I fix this?
Wait for SP1?
I am having this Vista network slow down too. I have tried all of the fixes I have found on the net but had no success. I have a gigabit network adapter but it is only connected at 100mbps and I get a transfer speed of about 600KB/s.
The thing is that I am not playing any music on my system, I have disabled Multimedia Scheduler Service and Windows Audio, and a load of other services but that did not help.
Is there something which is using the audio service (without my knowledge) which causing my speed cap to be enabled?