Mark Russinovich’s technical blog covering topics such as Windows troubleshooting, technologies and security.
A few weeks ago a poster with the handle dloneranger reported in the 2CPU forums that he experienced reduced network throughput on his Vista system when he played audio or video. Other posters chimed in with similar results, and in the last week attention has been drawn to the behavior by other sites, including Slashdot and Zdnet blogger Adrian Kingsley-Hughes.
Many people have correctly surmised that the degradation in network performance during multimedia playback is directly connected with mechanisms employed by the Multimedia Class Scheduler Service (MMCSS), a feature new to Windows Vista that I covered in my three-part TechNet Magazine article series on Windows Vista kernel changes. Multimedia playback requires a constant rate of media streaming, and playback will glitch or sputter if its requirements aren’t met. The MMCSS service runs in the generic service hosting process Svchost.exe, where it automatically prioritizes the playback of video and audio in order to prevent other tasks from interfering with the CPU usage of the playback software:
When a multimedia application begins playback, the multimedia APIs it uses call the MMCSS service to boost the priority of the playback thread into the realtime range, which covers priorities 16-31, for up to 8ms of every 10ms interval of the time, depending on how much CPU the playback thread requires. Because other threads run at priorities in the dynamic priority range below 15, even very CPU intensive applications won’t interfere with the playback.
You can see the boost by playing an audio or video clip in Windows Media Player (WMP), running the Reliability and Performance Monitor (Start->Run->Perfmon), selecting the Performance Monitor item, and adding the Priority Current value for all the Wmplayer threads in the Thread object. Set the graph scale to 31 (the highest priority value on Windows) and you’ll easily spot the boosted thread, shown here running at priority 21:
Besides activity by other threads, media playback can also be affected by network activity. When a network packet arrives at system, it triggers a CPU interrupt, which causes the device driver for the device at which the packet arrived to execute an Interrupt Service Routine (ISR). Other device interrupts are blocked while ISRs run, so ISRs typically do some device book-keeping and then perform the more lengthy transfer of data to or from their device in a Deferred Procedure Call (DPC) that runs with device interrupts enabled. While DPCs execute with interrupts enabled, they take precedence over all thread execution, regardless of priority, on the processor on which they run, and can therefore impede media playback threads.
Network DPC receive processing is among the most expensive, because it includes handing packets to the TCP/IP driver, which can result in lengthy computation. The TCP/IP driver verifies each packet, determines the packet’s protocol, updates the connection state, finds the receiving application, and copies the received data into the application’s buffers. This Process Explorer screenshot shows how CPU usage for DPCs rose dramatically when I copied a large file from another system:
Tests of MMCSS during Vista development showed that, even with thread-priority boosting, heavy network traffic can cause enough long-running DPCs to prevent playback threads from keeping up with their media streaming requirements, resulting in glitching. MMCSS’ glitch-resistant mechanisms were therefore extended to include throttling of network activity. It does so by issuing a command to the NDIS device driver, which is the driver that gives packets received by network adapter drivers to the TCP/IP driver, that causes NDIS to “indicate”, or pass along, at most 10 packets per millisecond (10,000 packets per second).
Because the standard Ethernet frame size is about 1500 bytes, a limit of 10,000 packets per second equals a maximum throughput of roughly 15MB/s. 100Mb networks can handle at most 12MB/s, so if your system is on a 100Mb network, you typically won’t see any slowdown. However, if you have a 1Gb network infrastructure and both the sending system and your Vista receiving system have 1Gb network adapters, you’ll see throughput drop to roughly 15%.
Further, there’s an unfortunate bug in the NDIS throttling code that magnifies throttling if you have multiple NICs. If you have a system with both wireless and wired adapters, for instance, NDIS will process at most 8000 packets per second, and with three adapters it will process a maximum of 6000 packets per second. 6000 packets per second equals 9MB/s, a limit that’s visible even on 100Mb networks.
I caused throttling to be visible on my laptop, which has three adapters, by copying a large file to it from another system and then starting WMP and playing a song. The Task Manager screenshot below shows how the copy achieves a throughput of about 20%, but drops to around 6% on my 1Gb network after I start playing a song:
You can monitor the number of receive packets NDIS processes by adding the “packets received per second” counter in the Network object to the Performance Monitor view. Below, you can see the packet receive rate change as I ran the experiment. The number of packets NDIS processed didn’t realize the theoretical throttling maximum of 6,000, probably due to handshaking with the remote system.
Despite even this level of throttling, Internet traffic, even on the best broadband connection, won’t be affected. That’s because the multiplicity of intermediate connections between your system and another one on the Internet fragments packets and slows down packet travel, and therefore reduces the rate at which systems transfer data.
The throttling rate Vista uses was derived from experiments that reliably achieved glitch-resistant playback on systems with one CPU on 100Mb networks with high packet receive rates. The hard-coded limit was short-sighted with respect to today’s systems that have faster CPUs, multiple cores and Gigabit networks, and in addition to fixing the bug that affects throttling on multi-adapter systems, the networking team is actively working with the MMCSS team on a fix that allows for not so dramatically penalizing network traffic, while still delivering a glitch-resistant experience.
Stay tuned to my blog for more information.
What I find most discouraging about this isn't the hack to workaround what was probably a non-issue, but the fact that <i>copying a file</i> takes 41% of the CPU. What kind of networking stack has that kind of processor overhead?
To me, this seems like overly optimistic resource allocation...
I agree 100% that it needed to happen, (and I'm greatful for Mark's detailed response/ explanation) however, it seems to be at so much overhead that it degrades other services...
Given the multi-core CPU scenario we live in now, is this optimization even so necessary?
I remember trying to set up MS ISA 2 years ago on a SMP server (in a hurry) and got very poor performance due to "CPU's fighting over controlling the NIC's" - (I realize that now that you can pin a NIC to a particular CPU)... Is this similar?
Is there any way to pin or isolate these competing processes such that we are not stuck with such low limits to utilization?
thanks again for a great explanation Mark.
What I think a few posters are missing is the fact that this has nothing to do with overall CPU speed. Overall, CPUs are fast and getting faster. This has to do with priority over the course of milliseconds.
Both networking and media playback require instant results - because TCP/IP gets processed when it's received, and because most audio is sampled at over 40 kHZ. On a 2 GHz machine, that means that it's playing something every 22 microseconds (about once every 45,350 processor cycles, and it takes a few cycles to play something). If it delays too long, or drops a few samples, you hear skipping.
As a result, the concern is that if both network functions and multimedia need the CPU *right now* then you have a collision. That's why MS limited the networking so it only comes in on average once every 100 microseconds, to prevent that.
One addition: Microsoft completely rewrote the TCP/IP stack for Vista, and doing so they surely made some... hm, mistakes. This might be another reason of strange behavior.
See http://www.microsoft.com/technet/community/columns/cableguy/cg0905.mspx and lots of other pages.
Actually, multimedia glitching is a pretty well-known problem to heavy audio and video users (i.e. multitrack work, editing, not just playback). A whole alternate universe of drivers has sprung up to deal with this. It's much less of a problem than it used to be, but under heavy loads, it's still an issue.
Sounds like Microsoft tried to be over-proactive about it, and (as others have said) shot themselves in the foot. I wonder if this isn't something that went into Vista back in 2001 as an obvious necessity, and then wasn't looked at toward the end of the release cycle...
Responding specifically to Chris, my understanding is that Vista can't do the kind of things you talk about (segregate functions by processor) even if it would be an effective solution, because they simply can't target dual-core machines yet.
There's a significant adoption lag that Microsoft has to adapt to. Let's briefly look at the gaming world for an example, and then come back to this. Currently, game designers/programmers for many games have to make sure the game can run on computers are far back as pre-HT P4s. Therefore, they can't fully optimize for a two-core world across the line yet.
The same situation occurs with Vista. Given the average age of the PC install base, and Vista's minimum CPU requirements (800 MHz single core), they can't optimize for dual-core in that sort of a manner. Also, remember that Vista was designed in a period from 2001-2006. Dual-cores only became available in mid-2005, and weren't really prevalent in most selling models until mid-2006 (release of Conroe in July, and subsequent price war). The Vista RTM was shortly thereafter.
Something as essential as music playback and file copying should not require days of investigation by users. Someone write better docs!
PS - I have a *phone* running a 200 MHz ARM with a piddly little OS and am able to play streaming MP3's on a broadband wireless network with no hassles. And a dual-core, 2 GHz desktop uses 41% CPU to simply copy files???
I did some playing on my XP system and got similar results as Richard. I transfered a copy of Win2KSP4 while I listened to a podcast I have downloaded on my local system. I did not see any decay in the file transfer speed over the LAN. I do not consider Mark a sellout because he reported on this issue. He identified it, he did not condone it. IMHO, this seems like a misplaced performance tweak on MS part. I hope it goes away in SP1. We still have not moved to Vista, I have intentionally ordered new systems with XP. It works and I am not willing to turn my shop into a test bed for a first release of an OS.
Hi, I am the IPv6 Program Manager at Microsoft.
Regarding the comment "Unless the hostname has an AAAA record associated with it, the system shouldn't use IPv6 to try and communicate with the host."
Even if the destination has an A and an AAAA record, Vista will prefer IPv4 over Teredo. The order of precedence is IPv6, IPv4, THEN Teredo. So a destination host with an A and AAAA record will always be reached using IPv4, NOT TEREDO.
The only time Teredo would be used is if the destination host ONLY had a AAAA record, and there are darn few of those out there. In other words, leaving IPv6 (and Teredo) enabled on your home PC have absolutely no impact on your networking performance.
Rob: ... <i>copying a file</i> takes 41% of the CPU. What kind of networking stack has that kind of processor overhead?
I regarded this kind of behavior when copying files using SMB. The same file copied from the same server using FTP was many times faster.
@Chris re: (I realize that now that you can pin a NIC to a particular CPU)
can you elaborate? I'm looking around and can't find any info - this could be helpful on one of my servers.
I have had a ton of issues with media playback in Vista on both single core and dual core platforms, with and without Aero even when it could easily support it. My solution has been to shutdown as many ancillary services as possible, and there are a lot. Most, if not all of the security services are gone, and that was strictly for my sanity. Many of the disk services and indexing are shutdown. I do actually know where my content is and don't need any help finding it. And then I shutdown some more things that just seemed to be hanging out and not really providing any immediately useful service. The result is a fairly smooth running system that runs aero, the sidebar, other applications, and my i-tunes videos full screen without glitches. Prior to shutting all of these things down, i-tunes videos were unwatchable, streaming internet video (ie simple slideshows) were unwatchable, and the whole media experience was enough to make me beg for XP back, or even 2000 pro. My disk access has gone from essentially constant to only when I'm actively doing something. Now I have my LED for power that stays on, and my disk activity LED is no longer solid 24/7. Things that ran fine on XP systems with less than half the performance of the current system now run like they should. Moving large files across the Gigabit network also move like they should. Vista still won't play wmv files that run fine on my other XP boxes. Winamp will play them just fine on VISTA, but Media Player says there is a problem with my WHQL video card.
Why all of this? There are many other issues with Vista playing media than the one network issue mentioned here and eliminating many of the ancillary services will go a long way, but not nearly all the way, to solving them. There error codes that Windows Media Player provide are of course useless because there is no description for them. I would have thought that all of the effort going into the driver certification process and application certification process would have resulted in a more stable and better performing system, but alas this has not been the case.
> Despite even this level of throttling, Internet
> traffic, even on the best broadband connection, won’t
> be affected.
This is only a valid analogy for someone downloading a single file off the Internet from a single source.
Throw in P2P-anything and you get huge number of small packets which Vista will happily throttle for you [sigh], regardless of your actual bandwidth.
One problem is you use packets/sec regardless of how 'full' those packets are. Another is that Vista apparently has a horribly inefficient TCP stack... 40% CPU usage for a file copy?
"Games are slower overall because of all the added functionality and features in Vista... even with many services, eyecandy, and programs disabled I still find programs run better in XP (and even better in Linux!). For example, BioShock runs horribly on my computer in Vista with input and audio lag making it unplayable. This didn't surprise me too much as my computer didn't meet the minimum specs in the CPU department. But, in XP, it ran quite acceptably."
- Not to mention all the DRM throughout the audio and video stack. Benefitting who exactly? Not really the consumer, they're better off with XP arguably. DX10 runs a fair bit quicker on XP and Linux (with the "backported" un-official release), that surely means something is very wrong.