Geo Replication in Hybrid Cloud Storage
The excitement around software-defined networking (SDN) this year has had a domino effect on the rest of IT infrastructure industry and spawned many discussions about the future of the industry, including the implications for companies like Cisco and EMC and VMware. A couple days ago, Christos Karamanolis from VMware published a blog post saying he thinks 2013 will be the year for software-defined storage (SDS). That got me thinking.
I don't know about 2013 being the year for SDS, but I suspect 2103 will be the year of SDN and SDS hype and confusion. It's bad enough having one marketing battle royale (SDN) but having two of them at the same time will drive many of us crazy. I shudder to think where the whole thing will stop - SD Zombies?
Here's how I see things shaping up for SDS next year:
On December 5th Microsoft announced a pricing reduction for Windows Azure Storage. One of the more noticeable aspects of the announcement was the breakdown of storage costs between Geo Replicated Storage and Locally Redundant Storage. To summarize, Geo Replicated Storage costs approximately 28% more for the additional service of replicating your data to a remote secondary Azure data center. When you understand the details of how data Azure Storage works it means there are six copies of data stored - three locally and three remotely. This is an example of an extremely robust design where an awful lot has to go wrong to lose data and it is part of the reason why Windows Azure Storage has such an excellent track record.
If you are considering a Hybrid Cloud Storage solution using StorSimple Cloud-integrated Storage (CiS) and Windows Azure Storage, my advice is that you plan to use Geo Replicated Storage. The additional 28% price premium for Geo Replication is a small amount to pay for remote replication with automated failover. If you compare the cost for Azure's Geo Replication with other forms of data replication that conservatively double the cost of storage, it is an incredible bargain.
So here is how the connections and data flows work with a CiS Hybrid Storage Cloud. Thanks to Avkash Chauhan for posting about this previously in his blog -the graphic below came from there.
When data is uploaded by the on-premises CiS solution to Azure Storage, three copies of the data are written to separate domains within the primary data center and an acknowledgement is sent to the CiS on-premises. Some time afterwards, which can be several minutes later, the data is replicated to the secondary data center and another three copies are written to three different domains there. This is done transparently in the background, without involving the CiS system in any way.
With CiS-powered Hybrid Cloud Storage, uploads to Windows Azure Storage occur when nightly CiS Cloud Snapshots are taken but they also happen when inactive data it is tiered to Azure Storage. Under normal conditions, the amount of traffic between CIS on premises and Azure Storage is negligible. Exceptions to that occur during the initial Cloud Snapshot for a volume when the entire volume's data is snapped or during DR scenarios when a lot of data may need to be downloaded from Azure Storage to CiS. If you are concerned about the amount of bandwidth that might be consumed by Hybrid Cloud Storage traffic, CiS provides scheduled bandwidth throttling. Many of our customers use it to assure they have all the bandwidth they need for other production applications. Geo Replication between Azure data centers does not consume bandwidth between the customer site and the primary Azure data center, so there is no need to avoid Geo Replication in order to conserve bandwidth.
When you think about the economics of cloud storage, make sure to include the incredible value of Geo Replication.
The term "hybrid cloud" has been defined many different ways. At Microsoft hybrid cloud refers to data center functionality that spans on-premises and cloud service boundaries. At least that's how I'm understanding it now after having been part of the company for a few weeks. To clarify my perspective, my appreciation of cloud is slotted narrowly into IAAS functionality and the things that are likely to appeal to data center types. In this context hybrid cloud services will augment the things that customers are already doing on-premises with the cloud offloading tasks and workloads that are under-served on-premises. Where data center operations are concerned, the cloud represents a new kind of enterprise plug-in. If you think this sounds like poppycock, keep reading because I'll tell you how it is already being used this way every day by a growing number of companies.
One of the misunderstandings people have about enterprise cloud storage is that it must be similar to consumer file sharing apps like Dropbox, Box or Microsoft's own SkyDrive. To begin with, much of enterprise storage works on block processes and if you are going to offload enterprise storage you need to provide block-level functionality. As for file sharing, data center managers are not looking to share corporate data as much as they are to secure it. BTW, I fully expect to get comments here about the great virtues of file sharing for enterprises. Rest assured, there are probably few companies who use the cloud for file sharing as much as Microsoft does internally with SkyDrive and SharePoint, but that's not what I'm discussing here in this post.
StorSimple developed technology called Cloud-integrated Storage (CiS) that is implemented as a SAN appliance that acts like a hybrid cloud storage plug-in for enterprise storage. CiS packages and indexes blocks along with accompanying metadata and stores them in the cloud. These block packages may be generated by snapshots or as archives that need to be stored for an extended period of time or as dormant unstructured data that is no longer being accessed and can be vacated to reclaim on-premises storage capacity. Different customers use this technology every day because their backup systems are under-serving them, their archiving processes are too cumbersome and they don't want to use tier 1 storage for data that is no longer active. The thing that is a little bit hard for some to understand about CiS is that the data transfers to the cloud are all automated, requiring no effort on the part of system and storage administrators.
The other key to understanding the plug-in nature of CiS is that the ability to access and download data from cloud storage is also transparent because data in the cloud is either viewable in an online file system or mountable as a snapshot the same way local snapshots are mounted for restoring older versions of files. I'll explain how that all works in future blog posts, but for now I'll say it's a function of the metadata system in CiS.
Cloud-integrated Storage really is different. It breaks the mold for enterprise storage by seamlessly integrating on-premises enterprise storage with Windows Azure Storage services and the incredibly valuable Geo Redundant Storage it provides. CiS doesn't do everything you might want it to, but the things it does well are revolutionary.
Note: this blog is the short version of a white paper that was published by StorSimple. Click this link to see the PDF version of the original white paper (opens in a new window)