A blog by Jose Barreto, a member of the File Server team at Microsoft.
All messages posted to this blog are provided "AS IS" with no warranties, and confer no rights.
Information on unreleased products are subject to change without notice.
Dates related to unreleased products are estimates and are subject to change without notice.
The content of this site are personal opinions and might not represent the Microsoft Corporation view.
The information contained in this blog represents my view on the issues discussed as of the date of publication.
You should not consider older, out-of-date posts to reflect my current thoughts and opinions.
© Copyright 2004-2012 by Jose Barreto. All rights reserved.
Follow @josebarreto on Twitter for updates on new blog posts.
Every once in a while I hear from someone that they believe they have a performance problem with their Scale-Out File Server. When I dig a little further, it’s very common to find that file copies are being used as the mechanism for measuring storage performance for all kinds of scenarios.
This blog post is about why file copies are not a good metric for evaluating storage performance. It also covers what other tools you could use instead to measure things. As usual, my focus here is on copies that use SMB3 file shares or File Server Clusters.
2. Why people use file copies to measure storage performance
First of all, it’s important to understand why people use file copies to measure performance. There are actually quite a few reasons:
2.1. It’s so easy…
File copies are a simple thing to do. You can even do it from File Explorer.
2.2. It’s so pretty…
File Explorer now has a nice visualization of the file copy operation, including a chart showing bandwidth as the copy progresses.
2.3. I have lots of large VHDX files sitting around ready to copy
We now have many large VHDX files readily available for testing file copies, and they will take a while to transfer. When someone is looking for a simple way to generate IOs against a storage subsystem, they might be tempted to simply copy those files in or out.
2.4. Someone told me it was a good idea
There are a lot of blog posts and demo videos out there about how fast file copies are in this new storage system or this new protocol. I might have done that myself a few times (sorry about that).
3. Why using file copies to measure storage performance is not a good idea
Using file copy to measure storage performance has a number of issues. You might essentially end up reaching incorrect conclusions. Here are a few reasons for that:
3.1. Copy might not be optimized
In order to get the most of your storage subsystem, you need to queue up enough IO requests to keep the entire system busy end-to-end. For instance, when using hard disk drives, we like to watch the queue depth performance counters to make sure we maintain at least two IOs queued for each HDD you have in a system.
In order to take advantage of this, when you’re copying large files using the Windows copy engine, it will issue 8 asynchronous writes of 1MB each by default. However, when copying lots of small files (and using a fast storage backend), you are less likely to stand up the required IO queue depth unless you queue more IOs and use multiple threads.
To make matters worse, most file copy programs will copy only one file at a time, waiting for one file copy to finish before starting the next one. This serialization will give you a much lower storage performance than the system is actually capable of delivering when more requests are being queued in parallel.
The SMB protocol is able to queue up multiple asynchronous requests, but that does not happen unless the application (the file copy program in this case) takes advantage of it. ROBOCOPY is one of these programs that can use multiple threads to copy multiple files at once.
More importantly, the file copy behavior is not a good stand-in for all other workloads. You will find that its behavior is different from a SQL Server OLTP databases or a Hyper-V Virtual Desktop deployment, both in regards to IO sizes and the number of IOs that are queued up.
3.2. Every copy has two sides
People will often forget that file copies measure the performance of both source storage and the destination storage.
If you’re copying from a single slow disk at the source to 10 fast disks in a striped volume on the other end, the speed at which you can read from that source will determine the overall speed of the file copy process.
You’re basically measuring the performance of the slower of the two sides.
3.3. Offloads and caching
There are a number of specific technologies that can accelerate certain kinds of copies like Offloaded Data Transfers (ODX) and SMB file copy acceleration (COPYCHUNK) which will impact certain kinds of file copies and it might be hard to determine when they apply or not.
File copies will attempt to use buffered IOs, which can be accelerated by regular windows file caching. On the other hand, buffered file copies will not be accelerated by cluster CSV caching, which only accelerates unbuffered IOs.
As with item 3.1, these are examples of how file copy performance will not match the performance of other types of workload, like an OLTP database or a VDI deployment.
3.4. CA will not cache
Continuously Available (CA) file shares are commonly used to provide Hyper-V and SQL Server with a storage solution that will transparently failover in case of node failure.
In order to deliver on its availability promise, CA file shares will make sure that every write goes straight to disk (write-through) and no writes are cached only in RAM. That’s how SMB can recover from the failure of a file server node at any time without data loss.
While this is great for resiliency, it will slow down file copies, particularly if you are copying a single large file.
It’s important to note that most server workloads (including SQL Server and Hyper-V) will always use write-through regardless of CA, so while file copies are affected by the write-through behavior change caused by CA, most server workloads are not.
You can read more about this at http://blogs.technet.com/b/filecab/archive/2013/12/05/to-scale-out-or-not-to-scale-out-that-is-the-question.aspx.
4. Better ways to measure performance
4.1. Run the actual workload
The best way to measure performance is to run the actual workload that you are targeting.
If you’re configuring some storage that will be used for an OLTP database, install SQL Server, run an OLTP workload and see how many transactions per second you can get out of it.
If you’re creating a solution for static web sites running inside a VM, install Hyper-V, create a VM configured with IIS, set up a few static web sites and see how they handle multiple clients accessing the static web content.
4.2. Run a workload simulator
When it’s not possible or practical to run the actual workload, you can at least run a synthetic workload simulator.
There are many of these simulators out there that mimic the actual behavior of the application and allow you to simulate a large number of clients accessing your machine.
For instance, to simulate running SQL Server databases, you might want to try the DiskSpd tool (see http://aka.ms/DiskSpd).
DiskSpd is actually fairly flexible and can go beyond simulating just SQL Server behavior. I even created a specific blog post on how to use DiskSpd, which you can find at http://blogs.technet.com/b/josebda/archive/2014/10/13/diskspd-powershell-and-storage-performance-measuring-iops-throughput-and-latency-for-both-local-disks-and-smb-file-shares.aspx.
4.3. Keep your simulations as real as possible
While it’s more practical to use workload simulators, you should try to stay as close as possible to the actual solution you will deploy.
For instance, if you planning to deploy 4 SQL Servers in a Hyper-V-based private cloud, you should measure the storage performance by actually creating 4 VMs and running your DiskSpd simulation inside each one.
That is a much better simulation than just running 4 instances of DiskSpd in the host, since the IO pattern of 4 instances of SQL Server running on bare metal will be different from the IO pattern of four VMs each running one instance of SQL Server.
5. If your workload is actually file copies
All that aside, there’s a chance that what you are actually trying to test a file copy workload. I mean, you actually have a production scenario where you will be transferring files. In that case (and only in that case), here are a few tips to optimize that specific scenario.
5.1. Check both sides of the copy
Remember to optimize both the source and the destination storage subsystems. As mentioned before, you will be as fast as the weakest link in the chain. You might want to redesign your storage solution so that source and destination have better performance or are “closer” to each other.
5.2. Use the right copy tool
Most file copy tools like the EXPLORER.EXE GUI, the COPY command in the shell, the XCOPY.EXE tool and the PowerShell Copy-Item cmdlet not optimized for performance. They are single-threaded, one-file-at-a-time solutions that will do the job but are not designed to transfer files as fast as possible.
The best file copy tool included in Windows is actually the ROCOBOPY.EXE tool. It includes very useful options like /MT (for using multiple threads to copy multiple files at once) and /J (copy using unbuffered I/O, which is recommended for large files).
That tool got some love from the Performance Fundamentals team at Microsoft and it’s usually much faster than anything else in Windows.
It’s important to note that even ROBOCOPY with the /MT option won’t help if you’re copying a single file. Like most other file copy programs, it uses a common file copy API instead of custom code.
5.3. Offload with COPYCHUNK
If it’s an option for you, put source and destination of your copy on the same file server to leverage the built-in SMB COPYCHUNK. This optimization is part of the SMB protocol and basically avoids sending data over the wire if the source and destination are on the same machine.
You can read about it at http://msdn.microsoft.com/en-us/library/cc246475.aspx (yes, this has been there since SMB1 and it’s still there in SMB3).
Note that COPYCHUNK only applies if the source and destination shares are on the same file server and if the file size is at least 64KB.
5.3. Offload with ODX
If your file server uses a SAN back-end, consider using the Offloaded Data Transfers (ODX). This T10 standard improves performance by using a system of tokens to avoid transferring actual data over the wire.
It works only if the source and destination paths live on the same SAN (or somehow connected in the back-end). This also works with SMB file shares (SMB basically lets the request pass down to the underlying storage subsystem).
ODX support was introduced in Windows Server 2012 and requires specific support from your SAN vendor. You can read about it at http://msdn.microsoft.com/en-us/library/windows/hardware/dn265439.aspx.
5.4. Create a non-CA file share
If your file server is clustered, you can use SMB Continuously Available file shares that allow you to lose any node of the cluster at any time without impact to the applications. The file clients and file servers will automatically recover though a process we call SMB Transparent Failover.
However, this requires that every write be written through to the storage (instead of potentially being cached). Most server workloads (like Hyper-V and SQL Server) already have this unbuffered IO behavior, but not file copies. So, CA has the potential of slowing down file copy operations, which are normally done with buffered IOs.
If you want to trade reliability for performance during file copies, you can create a file share with the Continuous Availability property turned off (it’s on by default on all clustered file shares).
In that case, if there is a failover during a file copy, you might get an error and the copy might be aborted. But if you don’t have any failovers, the copy will go faster.
For server workloads like Hyper-V and SQL Server, turning off CA will not make things any faster, but you will lose the ability to transparently failover.
Note that you can create two shares pointing to the same folder, one without CA for file copy operations only and one with CA for regular server workloads. Having those two shares might have the side effect of confusing your management software and your file server administrators.
5.5. Use SMB Multichannel
If you can afford some extra hardware, consider adding a second network interface of the same type and speed to leverage SMB Multichannel (using multiple network paths simultaneously).
This was introduced in Windows Server 2012 (along with SMB3) and you must have it on both sides of the copy to be effective.
SMB Multichannel might be able to help with many scenarios, including a single large file copy when you are constrained by bandwidth or IOPS.
Also check if you have a second port on your NIC that is not wired to the switch, which might be an even easier upgrade (you will still need some extra cables and switch ports to make it happen).
You can learn more about SMB Multichannel at http://blogs.technet.com/b/josebda/archive/2012/05/13/the-basics-of-smb-multichannel-a-feature-of-windows-server-2012-and-smb-3-0.aspx.
When using SMB Multichannel in a file server cluster, be sure to use multiple subnets, as described at http://blogs.technet.com/b/josebda/archive/2012/11/12/windows-server-2012-file-server-tip-use-multiple-subnets-when-deploying-smb-multichannel-in-a-cluster.aspx.
5.6. Windows Server 2012 R2 Update
There are specific copy-related improvements in the Windows Server 2012 R2 Update released in April 2014. That update is especially important if you are using a Continuously Available file share as the destination of your file copy. You can find information on how to obtain the update at http://technet.microsoft.com/en-us/library/dn645472.aspx.
By the way, we are constantly evolving Windows Server and the Scale-Out File Server. We release updates regularly and keeping up with them will give you the best results.
5.7. File copy for Hyper-V VM provisioning
One special case of file copy is related to the provisioning of virtual machines from a template. Typically you keep a “Library Share” with your VHDX files and you copy from this library to the deployment folder, where the VHDX will be associated with a running virtual machine.
You can avoid this by using differencing VHDX files, or you can use some interesting tricks (like live VHDX file re-parenting, introduced in Windows Server 2012) to optimize your VM provisioning.
You can find more details about your options at http://blogs.technet.com/b/josebda/archive/2012/03/20/windows-server-8-beta-hyper-v-over-smb-quick-provisioning-a-vm-on-an-smb-file-share.aspx.
5.8. System Center Virtual Machine Manager 2012 R2
If you’re using SCVMM for provisioning your VMs from a library, it’s highly recommended that you upgrade to SCVMM 2012 R2.
Before that release, SCVMM used the slower BITS protocol to transfer files from the library to their final destination. In the 2012 R2 release, VMM uses a new way to copy files, which will leverage things like SMB Multichannel, SMB COPYCHUNK and ODX offloads to significantly improve performance.
You can find some details on how VMM deploys virtual machines at http://technet.microsoft.com/en-us/library/hh368991.aspx(that bottom of the page includes the details on Fast File Copy).
If you’re using copies to measure performance of anything except file copies, I hope this post made it clear that’s not a good idea and convinced you to use other methods of storage performance testing.
If you’re actually trying to optimize file copies, I hope you were able to find at least one or two useful tips here.
Feel free to share your own file copy experiences and tips using the comment section below.
Note: This post was updated on 10/26/2014 to use DiskSpd instead of SQLIO.
Great post Jose! Well done here.
This post really serves a greater purpose that it was intended for, in that it provides a very concise reference to a multitude of options and insights for accessing and copying files in different situations for those us ous that are not aware of them.
Thanks for taking the time to put this together.
Excellent post. We often encounter customers who mistakenly use file copies to compare one hardware platform or configuration to another. This will definitely help convince them it is not the appropriate methodology. Thanks for the write-up!
What about file copies to VMs running on the SOFS? Is that a good test? Because if we have virtualized file servers (i.e. file servers as VMs), then file copy speeds are a very important factor.
@M_M2009 If your workload is to copy to or from a file server running inside a VM, then that is a good test. If your workload is a home folders workload, you might want to consider a simulator like FSCT (File Server Capacity Tool). Note that for these types of workloads, you should be using a classic file server, not a scale-out file server. More at http://blogs.technet.com/b/filecab/archive/2013/12/05/to-scale-out-or-not-to-scale-out-that-is-the-question.aspx
Nothing about using things such as Intel IOMeter, Atto Disk Bench or even just perf mon disk counters? Or is that coming in a follow up post ... ;)
When doing filecopy, I copy to the NUL device. This eliminate writing delay of your local system
In general, SQLIO behaves somewhat like other micro-benchmarking tools like IOMeter.
I mention only SQLIO because that tool is provided by Microsoft, but the ones you mention are also fine.
@Mfisch If you test by copying large files to NUL, you're testing only large reads from your storage subsystem. If you workload is all about only doing large reads, then copy is a good stand in for your workload. Most of the workloads I see have a mix of different IO sizes and a mix of reads/writes.
Thanks for this great article!
Jose, we have an issue with copying files to a Windows 2012 File Server Cluster (using "Continuous Availability" file shares)
Client OS: Windows 8.1 Enterprise (x64) incl. all Updates.
If we have a mapped drive (with drive-letter) for this share connected - everything is fine!
But if we address the file share via UNC path names (e. g. in File Explorer), then it takes about 1 minute until the window for this "copy-job" appears and the file Transfer starts!
If we disable "Continous Availability" on the share - the window appears immediately and the file Transfer starts.
Is this a normal behaviour for "Continuous Availability" or perhaps a bug?
What's going on in the Background if this Feature is enabled?
I've also found that due to Windows caching, you can exhaust your available RAM while trying to copy very large files.
KUDOS to you! Great post, simple and to the point.
I have so much to say about this post, but so little time! I wish this post were around about 8 months ago--it would have saved me several months of frustration while getting our storage solution up and running. I opened a ticket with Microsoft Support,
but they seem unaware of this performance characteristic. In fact, they actually changed the workload I was testing (fixed VHDX creation) specifically to file copy performance as the test load for the duration of the ticket. Hopefully this post will help address
It wasn't until I went back and reread your blog posts that I stumbled upon #12 in
http://blogs.technet.com/b/josebda/archive/2014/01/25/troubleshooting-file-server-networking-issues-in-windows-server-2012-r2.aspx. I had read it previously, but did not make the connection between performance during large file copies and performance during
fixed VDHX creation, nor did the cap of 150 MB/s match what I was seeing, which was a max of about 55 MB/s. The SQLIO throughput tests I ran had excellent results, but as I started the first pre-production workload test--creating a fixed VHDX file for a new
VM for testing--and saw the capped performance, I had to stop the rollout until the issue was sorted. I was getting very close for the same operation to the theoretical hardware peak of ~600 MB/s through local access and pretty close even on a traditional
file share, so it just didn't make any sense why it was capped at ~55 MB/s through the SMB3 SOFS share. When I read your article, it seemed like the only explanation that made any sense--that there was actually nothing wrong with the system, that 55 MB/s must
be the speed limit for that particular operation on this hardware, and that it didn't necessarily mean that the VMs on the system would also perform poorly. I went back and tested file copy performance with ROBOCOPY and the /MT switch, and sure enough, the
performance was significantly better.
However, I still find this behavior to be confusing in a couple ways. One, it doesn't compare well against tests of other storage solutions, or even for a single disk in the case of a large file sequential write. If the issue is getting queue depth higher,
wouldn't the same large-file-copy issue affect those other storage platforms as well? Our previous iSCSI storage system (similar number of spindles) capped at ~115 MB/s for a single file transfer, so I figured the SOFS system (40 Gbps Infiniband and SAS JBODs)
would push much higher since we were no longer limited by GbE connections. Also, the performance hit in this case--from ~600 MB/s to 55 MB/s--seemed just too dramatic. Another thing I find problematic about it is that not every file operation has an optimization/workaround
like the ROBOCOPY /MT switch. My original performance issue was with fixed VHDX creation, but there are other similar scenarios like storage migration, converting a VHD to VHDX, etc. that don't have a /MT counterpart. Sometimes the performance issue means
the difference between making a maintenance window or not.
The Microsoft Support ticket was closed as "by design," and a Microsoft Partner engineer said that the behavior was "to protect the overall health and performance of the cluster." Are these performance limitations something to expect as inherent to the SOFS
solution in general, or is it something that can be addressed in future releases?
I also wanted to share some performance tips and workarounds that I've used.
Creating a fixed VHD/VHDX file: create it in advance using the DISKPART (CREATE VDISK) tool on the SOFS server (C:\ClusterStorage). You should get close to the rated speed of your hardware.
Converting a VHD to VHDX: As far as I can tell, the Hyper-V Manager tool (the "Copy the contents of the specified virtual hard disk" option in the New Virtual Hard Disk Wizard) seems to be significantly better than PowerShell's Convert-VHD. I was seeing speeds
of ~30-60 MB/s using PowerShell, but ~50-120 using Hyper-V Manager (initially around 60 MB/s, but averaging closer to 100 MB/s over time).
Copying a VM for import: initiate the copy command on the destination SOFS server rather than on the source. When migrating from a Hyper-V 2008 R2 cluster (on a different storage solution in this case), copying from the Hyper-V node to the SOFS share was averaging
below 30 MB/s. When initiating the copy on the SOFS node (copying from C:\ClusterStorage Hyper-V node to the C:\ClusterStorage folder on the SOFS node), it was consistently at ~115 MB/s (limited by the GbE connection).