The Storage Team Blog about file services and storage features in Windows and Windows Server.
Hi folks, Ned here again. If Shakespeare had run Windows, Hamlet would be a play about configuring failover clusters for storage. Today I discuss the Scale-Out File Server configuration and why general use file server clustered shares may still have a place in your environment. These two types of Windows file servers have different goals and capabilities that come with trade-offs; it’s critical to understand them when designing a solution for you or your customer. We’ve not always done a great job explaining the differences between these two file servers, and this post aims to clear things up.
It's not enough to speak, but to speak true. So let’s get crackin’.
We released Scale-Out File Server (SOFS) in Windows Server 2012. SOFS adds highly available, active-active, file data access for application data workloads to Windows Server clusters through SMB, Continuous Availability (CA), and Cluster Shared Volumes (CSV). CA file shares ensure - amongst other things - that when you connect through SMB 3, the server synchronously writes through to the disk for data integrity in the event of a node failure. In other words, it makes sure that your files are consistent and safe even if the power goes out.
You get the option when you configure the File Server role in a Windows Server failover cluster:
SOFS has some other key benefits:
Claus Joergensen has a good blog post on these capabilities, and if you want to give to set up a test environment, Jose Barreto is the man with the step-by-step plans.
All of this has a single customer in mind: application data accessed via SMB, like Hyper-V virtual machine disks and SQL database files. With your hypervisor running on one cluster and your storage on another cluster, you can manage – and scale - each aspect of the stack as separate high-performance entities, without the need for expensive SAN fabrics. Truly awesome stuff that sounds like a great fit for any high availability scenarios.
For those who want to use SOFS for regular user shares, though: proceed with caution.
Inside Microsoft, we define “Information Worker” as the standard business user scenario. In other words, a person sitting at their physical client or virtual desktop session, and connecting to file servers to access unstructured data. This means SMB shares filled with home folders, roaming user profiles, redirected folders, departmental data, and common shared data; decades of documents, spreadsheets, and PDFs.
Ooh, there’s leftover cake in the break room!
Ooh, there’s leftover cake in the break room!
The typical file operations from IW users are very different when compared to application data like Hyper-V or SQL. IW workloads are metadata heavy (operations like opening files, closing files, creating new files, or renaming existing files). IW operations also involve a great many files, with plenty of copies and deletes, and of course, tons of editing. Even though individual users aren’t doing much, file servers have many users. These operations may involve masses of opens, writes, and closes, and often on files without pre-allocated space. This can mean frequent VDL extension, which means many trips to the disk and back, all over SMB.
Right away, you can see that going through a share enabled with CA to provide the data integrity guarantee might have an impact on performance, when compared to previous releases of Windows Server, which did not have shares with CA and thus did not provide this data integrity guarantee. Continuous Availability requires that data write-through to the disk to ensure integrity in the event of a node failure in SOFS, so everything is synchronous and any buffering only helps on subsequent reads, not writes. A user that needs to copy many big files to a file server - such as by adding them to a redirected My Documents folder - can see significantly slower performance on CA shares. A user that spent a week working from home and returns with their offline files cache brimming will see slower uploads to CA shares.
Nothing is broken here – this is just a consequence of how IW workloads operate. A big VHDX or SQL database file also sees slower creation time through a CA share, but it’s largely a cost paid once, because the files have plenty of pre-allocated space to use up, and subsequent IO prices are comparatively much lower. We also optimize SMB for them, such as with SMB Direct’s handling of 8K IOs.
To demonstrate this, I performed a few small-scale tests in my gross test environment. Don’t worry too much about the raw numbers; just focus on the relative performance differences.
Note: to set this up, see Jose’s demo here. My only big change was to use one node instead of three, so I had more resources in my very gross test environment.
Important: again, this could be faster in absolute terms on your systems with similar data, as my test system is very gross. It could also be slower if your server is quite busy, has crusty drivers installed, is on a choked-out network, etc.
MS Office 2013’s big three – Word, Excel, and PowerPoint – performed well with both CA and non-CA shares and don’t have notable performance differences in my tests even when editing and saving individual files that were hundreds of MB in size. This is because later versions of Office operate very asynchronously, using local temporary files rather than forcing the user to wait on remote servers. On a remote 210MB PPTX, the save times on an edited file were nearly identical, so I didn’t bother posting any results.
Office’s good performance is less likely in other user applications; MS Office has been at this game for 22 years. One internal test application I used to generate files had non-CA performance similar to the synthetic file creation test above. However, when the same tool ran against a CA share, it was 8.6 times slower, because of how it continuously asked the server to allocate more space for the file and kept paying the synchronous write-through cost. There’s no way to know what the more “write-through inefficient” apps are until you find out in testing.
Important: even general-purpose file server clusters have CA set on their shares by default when created via the cluster admin tool, Server Manager, or New-SmbShare. You should consider removing that setting if you require performance over data write-through integrity on shares on clusters. On non-clustered file servers, you cannot enable CA.
This is conceivably useful even with SOFS and application data workloads: for instance, you could create two shares to the same folder. One is for Hyper-V to mount VHDXs remotely, and one is to copy VHDXs to that share when configuring new VMs, such as through SCVMM.
Final important note: make sure you install (at a minimum) KB2883200 on your Windows Server 2012 R2 servers and Windows 8.1 clients; it makes copying to shares a little faster. Better yet, stay up to date on your file server by using this list of currently available hotfixes for the File Services technologies in Windows Server 2012 and in Windows Server 2012 R2
The performance issues are actually manageable; many users probably won’t notice any write-through impact, depending on their work patterns. The real issue here is that Scale-Out requires CSV. Moreover, this paints your environment into a corner, because many IW applications do not support that file system.
At first, you configure files on a scale-out cluster share and it works fine. Nevertheless, a year later, when you decide you need more file server capabilities like Work Folders, Dynamic Access Control, File Classification Infrastructure, and FSRM file quotas and screens – you are blocked.
Let’s go to the big board.
1 Only works if CA is enabled on shares 2 Not recommended on Scale-Out File Servers. 3 Not recommended on general use file servers. 4 Requires NTFS 5 CSC is less compatible with CA shares than the other IW technologies, due to how it decides a share is offline combined with the SMB 3 client. This means that Offline Files will stay online even if the user no longer has access to the share, for 3-6 minutes.
1 Only works if CA is enabled on shares
2 Not recommended on Scale-Out File Servers.
3 Not recommended on general use file servers.
4 Requires NTFS
5 CSC is less compatible with CA shares than the other IW technologies, due to how it decides a share is offline combined with the SMB 3 client. This means that Offline Files will stay online even if the user no longer has access to the share, for 3-6 minutes.
Ultimately, this means that if you, your boss, or your customer decides “after that recent audit, we need to use DAC+FCI for more manageable security and we definitely need to screen out MP3 files and Grumpy Cat meme pics”, you will be forced to recreate the entire configuration using NTFS and general use file server clusters. This does not sound pleasant, especially when you now have to shift around terabytes of data.
Moreover, let’s not forget about down-level clients like Windows 7; any CA shares require SMB 3.0 or later and older clients connecting to them cannot use SOFS features. While a Windows 7 or Vista client can connect to a CA share, you need Windows 8 or later to use the CA feature.
As for XP? It cannot connect to a CA share at all. This doesn’t matter though, because you already got rid of XP. Right?
Finally, though, is the big question: if you accept the performance overhead, what does continuous availability provided by SOFS buy you with IW workloads?
The answer: little.
Many end-user applications don’t need the guarantees of continuous availability that SQL and Hyper-V demand in their workload. Your IW applications like Office and Windows Explorer are often quite resistant to the short-term server outages during traditional cluster failover. MS Office especially – it has lived for years in a world of unreliable networking; it uses temp files, works offline, and retries constantly without telling the user if there are intermittent problems contacting a file on a share.
The bottom line is that Word and all its friends will be just fine using traditional general use shares on clusters. Make sure that before you go down the scale-out route in a particular cluster design, it’s the right approach for the task.
If you caught all the pseudo-Shakespeare references in this article, post the count in the commons and win a fabulous No-Prize!
Until next time,
- Ned “Exit, pursued by a bear” Pyle
This is perhaps the most valuable Microsoft Blog I have ever read. Please keep writing in this style!
High quality post. Full of sound and fury!
Thank you, Ned. This is very helpful and timely as I'm in the testing and validation stage of our SoFS storage solution.
My performance results are all over the map--some astoundingly great and some bafflingly slow. As it is now, I'm very hesitant to put this into production, because I really can't tell what are expected results and what aren't. Your post here makes me wonder if at least some of what I'm seeing is to be expected.
- Two-node 2012 R2 Hyper-V cluster
<-> Infiniband RDMA NICs
- Two-node 2012 R2 SoFS cluster
<-> LSI SAS controllers
- 3 JBOD enclosures with a mix of 4TB HGST SAS HDDs* and 200 GB STEC SAS SSDs*
*MPIO is globally set to Least Blocks
I've created shares of mirrored spaces using virtual disks from 1 to 8 columns, and the throughput when creating a VHDX is consistently awful--it's always about 47 MB/s maximum, regardless how many columns. If I create the same file from the SoFS coordinator node (pointed to the UNC path, though), the throughput is near 200 MB/s for even a single disk, and 500-600 MB/s for 3 columns. Should I really expect performance to be so poor? The performance difference for the examples you listed never exceeded 1.5x, but I'm seeing worse than 12x (and potentially worse still if I tried 8 columns).
When I connect to the CAP share from a (separate) 2008 R2 Hyper-V cluster (using a 1 GbE connection--there are no RDMA NICs in that cluster) and create a VHDX file, I get ~110 MB/s. That makes me think something is quite wrong, but I don't know what or why.
Additionally, in numerous SQLIO tests on HDD mirrored spaces, the SoFS solution often outperforms our other two SAN environments, sometimes by far ...with the exception of 8 KB sequential write tests. I think it's important to make sure the HDD environment is working properly before enhancing it with flash, so I've been testing HDDs alone, and then adding SSDs later. The HDD-only numbers seem low to me, but how do I know whether the performance is at expected levels? Regardless the number of columns, they hover around 7-8 MB/s at a queue depth of 2 (8-12 threads), whereas our other HDD-based SANs are consistently around 60-100 MB/s (1 GbE connections only) regardless of queue depth or number of threads. Even at a queue depth of 16, our SoFS solution doesn't push more than 30 MB/s at 3 columns and 46 MB/s at 8 columns.
(The 8 KB random write numbers also seem relatively low, but none of the other SAN environments seem to do well at that, either. Flash helps significantly here, but I'm having trouble finding the right size for the WBC. The performance numbers--IOPS, MB/s, and especially latency--completely plummet, even much worse than without the WBC, in certain scenarios. Presumably the WBC gets full, but the behavior here is not good--it seems simply to stop accepting writes until the cache is written to disk, resulting in latency numbers in exceeding 60s in some cases.)
Is what I'm seeing normal? Or is something wrong with my setup? The 2008 R2 VHDX creation test makes me think it's the latter.
Thanks guys. :)
That is very interesting, Ryan. I have some further questions and want to also get some thoughts from our Spaces team here, can you email us at email@example.com? Once we figure everything out we can reply on the comment. :)
Thank you very much, Ned. I just sent a message to firstname.lastname@example.org and will be happy to answer any questions. I'm grateful for your guidance!
Not a single unnecessary phrase...talk about precise communication! A genius and a Marine! Guess it wouldn't be the first time...