Mark Russinovich’s technical blog covering topics such as Windows troubleshooting, technologies and security.
Windows Vista SP1 includes a number of enhancements over the original Vista release in the areas of application compatibility, device support, power management, security and reliability. You can see a detailed list of the changes in the Notable Changes in Windows Vista Service Pack 1 whitepaper that you can download here. One of the improvements highlighted in the document is the increased performance of file copying for multiple scenarios, including local copies on the same disk, copying files from remote non-Windows Vista systems, and copying files between SP1 systems. How were these gains achieved? The answer is a complex one and lies in the changes to the file copy engine between Windows XP and Vista and further changes in SP1. Everyone copies files, so I thought it would be worth taking a break from the “Case of…” posts and dive deep into the evolution of the copy engine to show how SP1 improves its performance.
Copying a file seems like a relatively straightforward operation: open the source file, create the destination, and then read from the source and write to the destination. In reality, however, the performance of copying files is measured along the dimensions of accurate progress indication, CPU usage, memory usage, and throughput. In general, optimizing one area causes degradation in others. Further, there is semantic information not available to copy engines that could help them make better tradeoffs. For example, if they knew that you weren’t planning on accessing the target of the copy operation they could avoid caching the file’s data in memory, but if it knew that the file was going to be immediately consumed by another application, or in the case of a file server, client systems sharing the files, it would aggressively cache the data on the destination system.
File Copy in Previous Versions of Windows
In light of all the tradeoffs and imperfect information available to it, the Windows file copy engine tries to handle all scenarios well. Prior to Windows Vista, it took the straightforward approach of opening both the source and destination files in cached mode and marching sequentially through the source file reading 64KB (60KB for network copies because of an SMB1.0 protocol limit on individual read sizes) at a time and writing out the data to the destination as it went. When a file is accessed with cached I/O, as opposed to memory-mapped I/O or I/O with the no-buffering flag, the data read or written is stored in memory, at least until the Memory Manager decides that the memory should be repurposed for other uses, including caching the data of other files.
The copy engine relied on the Windows Cache Manager to perform asynchronous read-ahead, which essentially reads the source file in the background while Explorer is busy writing data to a different disk or a remote system. It also relied on the Cache Manager’s write-behind mechanism to flush the copied file’s contents from memory back to disk in a timely manner so that the memory could be quickly repurposed if necessary, and so that data loss is minimized in the face of a disk or system failure. You can see the algorithm at work in this Process Monitor trace of a 256KB file being copied on Windows XP from one directory to another with filters applied to focus on the data reads and writes:
Explorer’s first read operation at event 0 of data that’s not present in memory causes the Cache Manager to perform a non-cached I/O, which is an I/O that reads or writes data directly to the disk without caching it in memory, to fetch the data from disk at event 1, as seen in the stack trace for event 1:
In the stack trace, Explorer’s call to ReadFile is at frame 22 in its BaseCopyStream function and the Cache Manager invokes the non-cached read indirectly by touching the memory mapping of the file and causing a page fault at frame 8.
Because Explorer opens the file with the sequential-access hint (not visible in trace), the Cache Manager’s read-ahead thread, running in the System process, starts to aggressively read the file on behalf of Explorer at events 2 and 3. You can see the read-ahead functions in the stack for event 2:
You may have noticed that the read-ahead reads are initially out of order with respect to the original non-cached read caused by the first Explorer read, which can cause disk head seeks and slow performance, but Explorer stops causing non-cached I/Os when it catches up with the data already read by the Cache Manager and its reads are satisfied from memory. The Cache Manager generally stays 128KB ahead of Explorer during file copies.
At event 4 in the trace, Explorer issues the first write and then you see a sequence of interleaved reads and writes. At the end of the trace the Cache Manager’s write-behind thread, also running in the System process, flushes the target file’s data from memory to disk with non-cached writes.
Vista Improvements to File Copy
During Windows Vista development, the product team revisited the copy engine to improve it for several key scenarios. One of the biggest problems with the engine’s implementation is that for copies involving lots of data, the Cache Manager write-behind thread on the target system often can’t keep up with the rate at which data is written and cached in memory. That causes the data to fill up memory, possibly forcing other useful code and data out, and eventually, the target’s system’s memory to become a tunnel through which all the copied data flows at a rate limited by the disk.
Another problem they noted was that when copying from a remote system, the file’s contents are cached twice on the local system: once as the source file is read and a second time as the target file is written. Besides causing memory pressure on the client system for files that likely won’t be accessed again, involving the Cache Manager introduces the CPU overhead that it must perform to manage its file mappings of the source and destination files.
A limitation of the relatively small and interleaved file operations is that the SMB file system driver, the driver that implements the Windows remote file sharing protocol, doesn’t have opportunities to pipeline data across high-bandwidth, high-latency networks like WLANs. Every time the local system waits for the remote system to receive data, the data flowing across the network drains and the copy pays the latency cost as the two systems wait for the each other’s acknowledgement and next block of data.
After studying various alternatives, the team decided to implement a copy engine that tended to issue large asynchronous non-cached I/Os, addressing all the problems they had identified. With non-cached I/Os, copied file data doesn’t consume memory on the local system, hence preserving memory’s existing contents. Asynchronous large file I/Os allow for the pipelining of data across high-latency network connections, and CPU usage is decreased because the Cache Manager doesn’t have to manage its memory mappings and inefficiencies of the original Vista Cache Manager for handling large I/Os contributed to the decision to use non-cached I/Os. They couldn’t make I/Os arbitrarily large, however, because the copy engine needs to read data before writing it, and performing reads and writes concurrently is desirable, especially for copies to different disks or systems. Large I/Os also pose challenges for providing accurate time estimates to the user because there are fewer points to measure progress and update the estimate. The team did note a significant downside of non-cached I/Os, though: during a copy of many small files the disk head constantly moves around the disk, first to a source file, then to destination, back to another source, and so on.
After much analysis, benchmarking and tuning, the team implemented an algorithm that uses cached I/O for files smaller than 256KB in size. For files larger than 256KB, the engine relies on an internal matrix to determine the number and size of non-cached I/Os it will have in flight at once. The number ranges from 2 for files smaller than 2MB to 8 for files larger than 8MB. The size of the I/O is the file size for files smaller than 1MB, 1MB for files up to 2MB, and 2MB for anything larger.
To copy a file 16MB file, for example, the engine issues eight 2MB asynchronous non-cached reads of the source file, waits for the I/Os to complete, issues eight 2MB asynchronous non-cached writes of the destination, waits again for the writes to complete, and then repeats the cycle. You can see that pattern in this Process Monitor trace of a 16MB file copy from a local system to a remote one:
While this algorithm is an improvement over the previous one in many ways, it does have some drawbacks. One that occurs sporadically on network file copies is out-of-order write operations, one of which is visible in this trace of the receive side of a copy:
Note how the write operation offsets jump from 327,680 to 458,752, skipping the block at offset 393,216. That skip causes a disk head seek and forces NTFS to issue an unnecessary write operation to the skipped region to zero that part of the file, which is why there are two writes to offset 393,216. You can see NTFS calling the Cache Manager’s CcZeroData function to zero the skipped block in the stack trace for the highlighted event:
A bigger problem with using non-cached I/O is that performance can suffer in publishing scenarios. If you copy a group of files to a file share that represents the contents of a Web site for example, the Web server must read the files from disk when it first accesses them. This obviously applies to servers, but most copy operations are publishing scenarios even on client systems, because the appearance of new files causes desktop search indexing, triggers antivirus and antispyware scans, and queues Explorer to generate thumbnails for display on the parent directory’s folder icon.
Perhaps the biggest drawback of the algorithm, and the one that has caused many Vista users to complain, is that for copies involving a large group of files between 256KB and tens of MB in size, the perceived performance of the copy can be significantly worse than on Windows XP. That’s because the previous algorithm’s use of cached file I/O lets Explorer finish writing destination files to memory and dismiss the copy dialog long before the Cache Manager’s write-behind thread has actually committed the data to disk; with Vista’s non-cached implementation, Explorer is forced to wait for each write operation to complete before issuing more, and ultimately for all copied data to be on disk before indicating a copy’s completion. In Vista, Explorer also waits 12 seconds before making an estimate of the copy’s duration and the estimation algorithm is sensitive to fluctuations in the copy speed, both of which exacerbate user frustration with slower copies.
During Vista SP1’s development, the product team decided to revisit the copy engine to explore ways to improve both the real and perceived performance of copy operations for the cases that suffered in the new implementation. The biggest change they made was to go back to using cached file I/O again for all file copies, both local and remote, with one exception that I’ll describe shortly. With caching, perceived copy time and the publishing scenario both improve. However, several significant changes in both the file copy algorithm and the platform were required to address the shortcomings of cached I/O I’ve already noted.
The one case where the SP1 file copy engine doesn't use caching is for remote file copies, where it prevents the double-caching problem by leveraging support in the Windows client-side remote file system driver, Rdbss.sys. It does so by issuing a command to the driver that tells it not to cache a remote file on the local system as it is being read or written. You can see the command being issued by Explorer in the following Process Monitor capture:
Another enhancement for remote copies is the pipelined I/Os issued by the SMB2 file system driver, srv2.sys, which is new to Windows Vista and Windows Server 2008. Instead of issuing 60KB I/Os across the network like the original SMB implementation, SMB2 issues pipelined 64KB I/Os so that when it receives a large I/O from an application, it will issue multiple 64KB I/Os concurrently, allowing for the data to stream to or from the remote system with fewer latency stalls.
The copy engine also issues four initial I/Os of sizes ranging from 128KB to 1MB, depending on the size of the file being copied, which triggers the Cache Manager read-ahead thread to issue large I/Os. The platform change made in SP1 to the Cache Manager has it perform larger I/O for both read-ahead and write-behind. The larger I/Os are only possible because of work done in the original Vista I/O system to support I/Os larger than 64KB, which was the limit in previous versions of Windows. Larger I/Os also improve performance on local copies because there are fewer disk accesses and disk seeks, and it enables the Cache Manager write-behind thread to better keep up with the rate at which memory fills with copied file data. That reduces, though not necessarily eliminates, memory pressure that causes active memory contents to be discarded during a copy. Finally, for remote copies the large I/Os let the SMB2 driver use pipelining. The Cache Manager issues read I/Os that are twice the size of the I/O issued by the application, up to a maximum of 2MB on Vista and 16MB on Server 2008, and write I/Os of up to 1MB in size on Vista and up to 32MB on Server 2008.
This trace excerpt of a 16MB file copy from one SP1 system to another shows 1MB I/Os issued by Explorer and a 2MB Cache Manager read-ahead, which is distinguished by its non-cached I/O flag:
Unfortunately, the SP1 changes, while delivering consistently better performance than previous versions of Windows, can be slower than the original Vista release in a couple of specific cases. The first is when copying to or from a Server 2003 system over a slow network. The original Vista copy engine would deliver a high-speed copy, but, because of the out-of-order I/O problem I mentioned earlier, trigger pathologic behavior in the Server 2003 Cache Manager that could cause all of the server’s memory to be filled with copied file data. The SP1 copy engine changes avoid that, but because the engine issues 32KB I/Os instead of 60KB I/Os, the throughput it achieves on high-latency connections can approach half of what the original Vista release achieved.
The other case where SP1 might not perform as well as original Vista is for large file copies on the same volume. Since SP1 issues smaller I/Os, primarily to allow the rest of the system to have better access to the disk and hence better responsiveness during a copy, the number of disk head seeks between reads from the source and writes to the destination files can be higher, especially on disks that don’t avoid seeks with efficient internal queuing algorithms.
One final SP1 change worth mentioning is that Explorer makes copy duration estimates much sooner than the original Vista release and the estimation algorithm is more accurate.
File copying is not as easy as it might first appear, but the product team took feedback they got from Vista customers very seriously and spent hundreds of hours evaluating different approaches and tuning the final implementation to restore most copy scenarios to at least the performance of previous versions of Windows and drastically improve some key scenarios. The changes apply both to Explorer copies as well as to ones initiated by applications using the CopyFileEx API and you’ll see the biggest improvements over older versions of Windows when copying files on high-latency, high-bandwidth networks where the large I/Os, SMB2’s I/O pipelining, and Vista’s TCP/IP stack receive-window auto-tuning can literally deliver what would be a ten minute copy on Windows XP or Server 2003 in one minute. Pretty cool.
Regarding Previous Versions, I agree, that's a major selling point of Vista, and something that sets Microsoft ahead of its most significant competitors. I've never understood why versioning hasn't caught on more on other OSes and consider Microsoft's steps in this direction an instance of real leadership. With that said, it could be improved, of course. In fact, Microsoft could do worse than to clone VMS's versioning behavior as precisely as possible for Seven and worry about more innovative improvements subsequently. VMS has had this since, approximately, the stone age, and it's really nice, but other systems (which in pretty much all other ways are much more up-to-date) for some odd reason have never really adopted it, until Vista. OTOH, unless I have misunderstood badly, it's not really an Explorer function (at least, I should hope not) but part of the filesystem driver or somesuch, so there may not be a lot that the Explorer team can do about it, other than handling how versions are represented in the GUI.
Stephan, that's great to hear. Not stopping in the middle of a large copy operation just because one file (usually one you don't actually need; e.g., when copying Documents and Settings for backup purposes on XP you run into locked system housekeeping files that you don't actually need) is uncopyable is a huge improvement. If that's true, I am really looking forward to seeing Vista replace XP. (No, I haven't deployed Vista yet. I'll address that below.) I do have a question, though. Does it give you a separate "Can't do this one, skipping" dialog box for each file that fails, make a list and present you with a single dialog box at the end of the whole operation, or asynchronously populate a visible "skipped file" list in the progress dialog?
As far as being able to actually go ahead and copy the file, that would probably mean moving to an inode-oriented filesystem, which would have a *lot* of implications beyond just being able to make copies of open files. I'm not saying it would ultimately be bad in the long run (indeed, there would be other advantages, e.g., not having to reboot when system files other than the kernel are updated), but it would be a major change for Windows, and major changes always mean bugs and incompatibilities in the short-term, so it's not something that would be done lightly.
We're holding off Vista deployment where I work until at least SP1 is available, because we have had bad experiences in the past with brand-new Microsoft OSes until the service packs start coming out. (XP for instance was not really a viable replacement for 2K until SP1 came out, and it wasn't a really marked improvement until SP2, in my estimation.) But that does not mean there aren't things about Vista that I'm really looking forward to. There are. (UAC, for all the flack it has received, is one of the biggies. Other systems have had something like it for a long time, and while Microsoft's first implementation no doubt needed some adjustments, it is a huge step in the right direction. I am very much looking forward to not having to log in as Administrator periodically just to get LiveUpdate to work correctly.) And holding off deployment for a few months doesn't mean we think Vista is the plague (though I do have some coworkers who think upgrades are the plague). It just means we aren't early adopters. We understand that you never find all the bugs until you deploy to a bunch of real users, and we'd like to let other real users be the guinea pigs, as it were. SP1 is something I consider to be very important, even though I haven't actually used Vista yet (nor seen it, really, except for a few minutes in a Microsoft travelling vehicle at a tech rally), because it brings Vista closer to the point where we *will* want to start deploying it. Microsoft needs to understand that service packs are a key factor in broader uptake of new versions (though they are not the only factor; time is also important).
Thanks Mark, for the great technical explanation. The number of passionate comments to this blog entry are very telling. I've been using Vista for 14 months now on 4 different machines (both x86 and x64)and while the file copy issues remain quite an annoyance, it's the overall lack of OS quality that really disappoints me. I've seen this in other, recently released Microsoft products too. I'll stand up here and suggest that something may be fundamentally broken at Microsoft. I noticed it personally when I was on campus recently for several weeks, as compared to a previous extended visit there 5 years ago.
Something does not sound right. The file-copy problems I experienced were usually copies of a handful of very small files being copied from one directory to another on the same partition taking 10-20 minutes. Usually the copy file dialog box would not show the files starting to copy for 7-8 minutes -- it would just sit and hang. Once the files did start copying, it still took a painfully long time.
When I tested the same copies on the command line (using 4NT, which could make a difference), the files would copy in ~1 second.
The problem was *not* perception. There was a real, serious, problem. I've had to recommend to numerous people that they wait for SP1 before upgrading to Vista.
I've installed SP1 RC1 and I have not experienced those problems, so that's good. But passing them off as simply perceived problems concerns me. Were they actually fixed or not? It seems so, but what was the cause?
I hope the comment regarding 'a slower server 2003 file transfer experience' is changed. Windows Home Server users are already complaining bitterly regarding the abysmal file transfer speeds from Vista. It sounds as though their experience is going to get worse, not better. Not a good way to try and sell what is already a flawed system. File corruption and even slower files isn't a plus in my book.
Copy is slow in some cohfigurations. I think the same thing causing slow router performance in some systems is at fault. I call it 'long wire to router' problems. When the Cat5 cable is short, no packet loss, and no speed problem. When a longer wire (say 70 meters) is used, the delay caused by the natural 2/3rds speed of light in a copper wire cause the router to lose packets and make for very poor communication (say 300 bps, vs 100 mbps). When set to 'half-duplex' the router loses no packets.
I think there may be a posibility that the packet loss in the router is an example of some kind of over-run in the copy process. The destination may have USB, or some other connectors in the path.
Who gives a rats ass if they improved the copy performance of Vista by 10% or even 20%? Why does anybody care?
Vista is a SLOW os.
So it's like getting a 15% better fuel-delivery-line installed on your crappy economy family wagon.
It really makes no difference. It won't convince us to drive around in a smoldering heap of junk.
This article pretty much explaing why my WinXP system is soooo slow in copying large files (700MB+) between volumes of the same disk and who is behind all this clicking noise from HDD seeks that immidiately fills the room :)
Furthermore, it explained, why even it this situation, other task can access HDD relatively easy.
But I want to make accent on some other point:
From the end-user standpoint, despite all the effort being made to improve the situation, good old Norton Commander and it's derivatives perform 1 000 000 percent better that default XP/Vista copy engine.
Why you say?
Because end-user doesn't really care about all these kernel optimisations. End user cares about comfort, control and predictable results. And what we see for XP/Vista?
1. When calculating time-to-copy Explorer just hangs there displayig nothing to the user. Now very comfy. Couldn't the whole TEAM think of FIRST displaying "calculating the time" and THEN start the routine?
2. After having calculated the total size of all the data to be moved the system doesn't warn you if you have less space of destination drive - it just begins the process. Another frustration. And when the free space end it just tells you "Not enough space" and aborts the operation. See next paragraph.
3. When the copy process is aborted (for example, when copying multiple files and one of them becomes locked) - it just breaks and you have manually compare directories to see what was copied and what was not. And what do you expect now.
4. After the last bit of a huge file goes to the disk you get some small amount of time when explorer is unable to access the file because it's being checked by antivirus/antispyware/etc. What's the reason to check the file that was already checked when in was accessed by Explorer for reading?
Good old NC overcame most of this by just persistent file selection (non-processed files were not deselected) and smartly designed user dialogs in the middle of the process. And more modern file managers like FAR or Total Commander allow you to use cached copy where you can set cache size (for ex. 15% of free memory) which makes for copying dozens of small files and copying single large files much easier and faster as you do sequential reads and writes instead of HDD head-hopping back and forth.
What my point is? Now matter how good a kernel would be - I't of no use to the user without a proper usable interface.
This is why Windows beated Unix, OS/2 and other rivals in UserLand space - it was more pretty (while being pretty basic in comparison to those OSes).
Now we see, that while improving Windows internals, the Externals of the OS are still pretty much rudimentary. And most of UI development is directed now towards comfort&control, but to more important features, like 3D Window Flipping and Sidebar Slideshow from My Pictures.
So, if the whole TEAM works of File Copy - please, pass the message to them that they spend next week improving the user interface (or notify the appropriate team). That would increase sales much better, then another cache tweak that whould anyways be un-noticed by the user because of many other problems.
$20,000 + invested in so called Microsoft Certified equipment... and I can't copy a file from a local drive to a network drive at more than 100 KILO BYTES per second.
XP copies/loads over 20 MEGA BYES per second.
Upgrading effectively renders the computer unusable for business purposes.
If microsoft cant/wont fix this, then I expect a refund on all Vista software our firm has purchased.
Rumor has it that several suits under the federal deceptive trades act are being considered. At least they offer triple damage awards.
Our lost productivity installing, debugging, trouble shooting, backup, restore, re-install XP or Linux... has cost well over $10,000.
I'd bet hundreds of millions have been wasted by Vista users in similar ways.
There seems to be an awful lot of effort put into avoiding filling up the system's RAM with cached filesystem data. But why is this necessary? Linux systems will happily cache stuff in RAM, but those cache buffers have lower priority than application memory allocations. That way, you never have app code and data forced out of memory just to make room for caches.
With this simple design decision, you no longer need to worry about using too much RAM for cache. Linux has been doing this for years, decades. How hard would it be for Windows to do the same?
It's curious that 3rd party programs like Total Commander are capable of achieving higher speeds when copying a file than explorer.exe (at least in my case). I have a laptop with Windows Vista Business 32-bit and a laptop with Windows XP Professional. When copying a single 700 MB file from XP (FAT32 partition) to Vista using Total Commander (run under Windows Vista), the Task Manager reports network usage to reach about 80-90% (100 Mbps network). When using explorer.exe, the network usage drops to 60% (plus it occasionally "spikes" from 2% to 60%) and the copy process is slower. Additionally, the hard disk heads on the XP laptop "go ape" (it might be the "out-of-order write operations" mentioned in the post). This does not occur when using Total Commander. Thumbnails on network folders are set to off. Setting autotuninglevel to disable under Vista had no effect as well as removing the RDC in Programs and Features.
I would assume that 3rd party programs use CopyFileEx API mentioned to copy the file. So how come Total Commander appears to be faster over network - even after putting aside the fact that it calculates the remaining time faster?
It's great to read some of the thought process behind the design... to a point. The bottom line, though, is that Vista copy performance may be the single most complained about "feature". How about, just for internal MS use, you guys do some benchmarks using different OSes. Compare copying 5,000 files of mixed content and sizes between Vista, Vista SP1, XP SP2, XP SP3, Mac OS X, Win Server 2003, Win Server 2008, etc. See if Vista isn't involved in most of the worst performances!
And Mark, while you're at it, could you please lend a hand to the WHS developers who are STILL unable to keep WHS from corrupting data????
Thanks for this interesting article.
Where can I find more about the increased I/O sizes in Vista?
It's off topic, but I'm hoping larger I/O transfer lengths might translate in being able to access larger record sizes in magnetic tape. Currently, in XP/2003 by default we're limited to 64kb records. Modifying MaximumSGList in the SCSI HBA service parameters gets us to 1 Mb - 4k (254 4k-pages). I'm hoping larger I/O transfers means we can break this limit.
Thanks for article again.
Thanks Mark for the news as well as explaination regarding the improved copy in SP1.
On the subject of copying, I noticed that when I do transfers from my SD or Compact Flash card, there would a possibility of at least one of the my photos being corrupted. Initally, I thought it was due to simultaneous usage but even if its just doing the copy. Is this a remote problem or it has been reported by others too.
"biggest change they made was to go back to using cached file I/O again for all file copies, both local"
If this applies also to internally connected SATA HDD's and large (50+ MB) moves/copies there will be a data loss/corruption risk:
Using ESATA PCI/Front port brackets is increasingly popular way to connect up all the excess internal SATA ports. Who really has 6 internal HDD's? I have 0 internal HDD's personally, all my HDD's are through ESATA using the internal SATA headers.
In tests I did on 2003, it can take even 15 seconds before all the data was flushed down to disk after the copying was "finishes" according to explorer.
There is no "Safely remove Hardware" icon at all even though I have few external internal HDD's connected to the internal SATA headers as there's no way for the OS to tell whether the HDD are externally or internally connected.
The option to change HDDs to "Optimize for quick removal" is grayed out too, even if hot plugged.
Now certainly I could use USB to get around this, however that would drop transfer rate from 120 MB/s (ESATA+single modern HDD) to 20 MB/s (USB2).
"So the new bug is that Vista Explorer sometimes does part of what you and I want, for a short time, when it shouldn't be doing that at all."
So the SP1 doesn't have any UI setting to make all folders in explorer look like they do in 2003 by default?
Is there any way to fix this besides modifying the explorer.exe? A registry fix would be fine.