Slow Large File Copy Issues

Slow Large File Copy Issues

  • Comments 60
  • Likes

From time to time, customers will call in to report "performance problems" that they are having when copying large files from one location to another.  By "performance problems", they mean that the file isn't copying as fast as they expect.  The most common scenario is copying large SQL databases from server to server, but this could just as easily occur with other file types.  More often than not, the customer has tried different methods of copying the file including Windows Explorer, Copy, XCopy & Robocopy - with the same results.  So ... what's going on here?

Assuming that you aren't experiencing network issues (and for the purposes of this article, we'll assume a healthy network), the problem lies in the way in which the copy is performed - specifically Buffered v Unbuffered Input/Output (I/O).  So let's quickly define these terms.  Buffered I/O describes the process by which the file system will buffer reads and writes to and from the disk in the file system cache.  Buffered I/O is intended to speed up future reads and writes to the same file but it has an associated overhead cost.  It is effective for speeding up access to files that may change periodically or get accessed frequently.  There are two buffered I/O functions commonly used in Windows Applications such as Explorer, Copy, Robocopy or XCopy:

  • CopyFile() - Copies an existing file to a new file
  • CopyFileEx() - This also copies an existing file to a new file, but it can also call a specified callback function each time a portion of the copy operation is completed, thus notifying the application of its progress via the callback function.  Additionally, CopyFileEx can be canceled during the copy operation.

So looking at the definition of buffered I/O above, we can see where the perceived performance problems lie - in the file system cache overhead.  Unbuffered I/O (or a raw file copy) is preferred when attempting to copy a large file from one location to another when we do not intend to access the source file after the copy is complete.  This will avoid the file system cache overhead and prevent the file system cache from being effectively flushed by the large file data.  Many applications accomplish this by calling CreateFile() to create an empty destination file, then using the ReadFile() and WriteFile() functions to transfer the data.

  • CreateFile() - The CreateFile function creates or opens a file, file stream, directory, physical disk, volume, console buffer, tape drive, communications resource, mailslot, or named pipe. The function returns a handle that can be used to access an object.
  • ReadFile() - The ReadFile function reads data from a file, and starts at the position that the file pointer indicates. You can use this function for both synchronous and asynchronous operations.
  • WriteFile() - The WriteFile function writes data to a file at the position specified by the file pointer. This function is designed for both synchronous and asynchronous operation.

For copying files around the network that are very large, my copy utility of choice is ESEUTIL which is one of the database utilities provided with Exchange.  To get ESEUTIL working on a non-Exchange server, you just need to copy the ESEUTIL.EXE and ESE.DLL from your Exchange server to a folder on your client machine.  It's that easy.  There are x86 & x64 versions of ESEUTIL, so make sure you use the right version for your operating system.  The syntax for ESEUTIL is very simple: eseutil /y <srcfile> /d <destfile>.  Of course, since we're using command line syntax - we can use ESEUTIL in batch files or scripts.  ESEUTIL is dependent on the Visual C++ Runtime Library which is available as a redistributable package.

Addendum:  The XCOPY /J switch was added in Win7/2008R2.

http://technet.microsoft.com/en-us/library/cc771254(WS.10).aspx

/j

Copies files without buffering. Recommended for very large files. This parameter was added introduced in Windows Server® 2008 R2.

- Aaron Maxwell



Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment
  • What ROLE would IRPStackSize play in such scenarios.. I know that usually tweaking this reg value would work-around similar problems

  • This is an interesting post from the performance team blog ... bottom line, if your copying a really

  • you can also use esefile.exe from the same place on the Exchange Server.

  • Has anybody done any testing to see how much quicker eseutil would be compared to robocopy?

  • Great Article Aaron!

    I think this is really good to point out because when you are doing large file copies using anything with Buffered I/O, you can actually exhaust your resoruces.

    Buffered I/O will consume Paged Pool (1 KB of paged pool is required for each megabyte MB of file size that is opened for buffered I/O)

    So the Bigger your File Copy, The more Paged Pool you will consume and eventually your Copy might fail if your resource runs out before your copy Finishes.

    Thats why its good to use tools such as ESEUTIL that don't utilize CopyFile or CopyFileEx for large file copies.

  • I test it for BKF files... 2Gb aprox.

    The Copy And Paste last 3 minutes

    The eseutil takes over 6 minutes :(

  • I like wget the best and it's gpl'd

    http://www.gnu.org/software/wget/index.html#downloading

  • Every Exchange Administrator knows what ESEUTIL does. It can be used to defragment the Exchange databases

  • Great Article,

    Remembering that the option of copy files the ESEUTIL alone is available on Exchange Server 2003, on Exchange 2000 Sever it does not have this option.

    regards,

    Mauro Kirsch

    mkirsch@br.ibm.com

  • I was able to copy a 250 gig database file in just over an hour using ESEUTIL.  The same copy took 4 hours using robocopy.  What a great tool!

  • This does one file at a time I assume? I can't specify an entire directory, for instance an oracle directory where there are 70 or 80 database files, all roughly 2 gig per DB part.

  • Kevin:

    ESEUTIL does one file at a time.  Not a big problem though; just connect it to the for command.

    Ex: for %f in (d:\source\directory\*.MDF) do ESEUTIL /Y "%f"

  • Aaron:

    You really really REALLY oughta suggest this to the robocopy maintainers!

    $0.02 ;)

  • This is CRAZY!!! I need to get work done. I'm a pro photographer and move large amounts of files around on internal, external (USB and 1394) and network drives. WinXP would move files on internal drives as fast as the drives could go. Vista (the albatross) cuts my file transfer speeds to between 1/4 and 1/10 XP speeds. WHY does Microsoft do this crazy stuff??? I have spent days researching and optimizing Vista to try to get it to where XP was as an out-of-the-box product, and Vista is still doing an excellent job of thwarting all of my attempts at productivity. Are you trying to get me to abandon Microsoft entirely and go Linux or Apple?

  • Holy CRAP!  I can't thank you enough for this.  I thought for a long time that my Promise SX4000 IDE RAID5 was my performance killer.  I tend to transfer large DVD ISOs.  

    Using copy, I got about 5MB/sec writes over gigabit.  Server computer was essentially useless until the copy completed.

    Using eseutil, I get about *40*MB/sec!    

    I have been searching for the cause of this issue for DAYS and finally stumbled on your post.  I fought with the gigabit network card, and system cache tweaking, all to no avail.  

    Computer is still pretty bogged down during the transfer.  The system cache jumps to 750MB (out of the system's 1GB) usage which is the probable culprit.  But now it takes less than 1/8th the time to move the data, so the whole situation is a lot more tolerable.  I eventually figured out that the problem only was with a network file copy and not a local transfer.