Hybrid Cloud Storage

Integrating cloud storage services with on-premises enterprise storage

The rise of cloud-integrated storage and EMC's ViPR

The rise of cloud-integrated storage and EMC's ViPR

  • Comments 4
  • Likes

David Isenberg wrote his famous and controversial paper, The Rise of the Stupid Network in 1997.  Its a short and historically interesting read. If you have never read it, follow the link there now. It will take you less than 10 minutes. If you want the Cliff notes version, the gist of his paper is copied below:

JUST DELIVER THE BITS, STUPID

A new network "philosophy and architecture," is replacing the vision of an Intelligent Network.  The vision is one in which the public communications network would be engineered for "always-on" use, not intermittence and scarcity.  It would be engineered for intelligence at the end-user's device, not in the network.  And the network would be engineered simply to "Deliver the Bits, Stupid," not for fancy network routing or "smart" number translation.

Fundamentally, it would be a Stupid Network.

 I've thought about corollaries in storage for many years. Networks and storage are much different. Storage is much more tightly coupled with data management in a way that networks will never be. Data management takes intelligence to make sure everything gets put in its optimal place where it can be accessed again complying with corporate governance, legal requirements and workers expectations. Networks don't really have these sorts of long-term consequences and so apples to apples comparisons aren't very useful.

But that doesn't mean there wouldn't be ways to eliminate unnecessary aspects of storage and lower costs enormously. As soon as data protection and management could be done without needing specialized storage equipment to do the job, that equipment would be eliminated.  Cloud storage changes things radically for the storage industry, especially inventions like StorSimple's cloud-integrated storage (CiS) and a solution like Microsoft's hybrid cloud storage. But StorSimple was a startup and Microsoft isn't a storage company and so it wouldn't start becoming obvious that sweeping changes were underfoot until a major storage vendor came along to make it happen.

That's where EMC's ViPR software comes in. EMC refers to it as software-defined storage, which was predictable, but necessary for them. FWIW, Greg Schulz does a great job going through what was announced on his StorageIO blog

One of the things ViPR does is provide an out-of-band virtualization layer that Greg's blog describes that opens the door to using less-expensive, stupid storage and protecting the data on it with some other global, intelligent system. This sort of design has never been very successful and it will be interesting to see if EMC can make it work this time.

The aspects of ViPR that are most interesting are its cloud elements - those that are expected initially and those that have been strongly hinted at, including:

  • It runs as a VSA (virtual storage appliance), which means it is a storage controller that runs as a virtual machine, including as a virtual machine in the cloud.
  • It will include access to object storage as a back end, which is how "real" storage works in the cloud, unlike AWS' EBS
  • It can use cloud APIs, which is obviously a cloud-thing 

If EMC wants their technology to run on the cloud, and it's clear they do, they needed all three of these things. For instance, consider remote replication to the cloud - how would the data replicated to the cloud be stored in the cloud? To a piece of hardware? No. Using storage network/device commands? No. To what target? The backend to a hypothetical EMC VSA in the cloud uses object storage services and cloud APIs. There is no other way to do it. They could have a VSA that uses iSCSI to a facility like EBS, but that would be like putting the contents of a container ship on rowboats. So, a VSA that accesses object storage services using cloud APIs is the only way. It is a clear signal that ViPR will be their version of CiS. They probably won't call it that, but that's beside the point.

The important thing is what happens to data protection after ViPR is made fully cloud-capable? Once you start using cloud services for data protection, there are a few things that immediately become obvious:

  • You don't need separate data protection equipment any more because you are using a cloud service
  • You can actually use incremental-forever data protection schemes
  • You want to use primary dedupe and compression to reduce the amount of cloud traffic required
  • You maintain a hybrid cloud metadata system that identifies all data whether its on premises or in the cloud 

Those are all things that hybrid cloud storage from Microsoft does today by the way, but that's beside the point too. What's interesting is what will happen to EMC's sizeable data protection business - how will that be converted to cloud solutions and what value can they add that enhances cloud storage services? The technologies they have available for hybrid cloud data protection are already mostly in place and there will undoubtedly be a transformation for Data Domain products in the years to come, but these are the sorts of things they need to figure out over time.

It's going to be a slow transition for the storage industry, but EMC has done what it usually does - it made the first bold move and is laying the groundwork for what's to come.  It will be interesting to watch how the rest of the storage industry responds.

Comments
  • Great post Marc, balanced, objective...

    Cheers

    gs

  • Thanks Greg - and thanks for your excellent posts on ViPR

  • Thanks for the link to the "Stupid Network" paper, though after reading it I cant make up my mind now whether a SDN is "Stupid" because most of the intelligence in the physical infrastructure has been abstracted to a control layer that is loosely tied to it, or if its an "Intelligent" network because of its programmability (assuming you say the SDN includes both the physical infrastrcuture and the controller).

    Either way there are lots of analogues to the features in the intelligent network and the intelligent storage array -

    e.g.

    A network was limited to a 64Kbit communication link and had things like echo-cancelling to improve percieved quality of service

    A traditional disk array is limited by the non-determanistic I/O latencies of rotating media and has things like sophisticated write caching and read-ahead algorithms to improved perceived quality of service.

    This brings the interesting question, are modern disk storage arrays designed to live in a world of abundance ? And to that I think I'd have to answer no, because for the most part they live in a world where iops/gb of disk is becoming an increasingly scarce resource. Adding SSD to that as a cache helps, but in many cases the resulting IOPS density isnt much better than it was back in the days of the 72GB disk, which is good, but not the kind of exponential increase towards a hyper-abundance of resources that we've seen in network bandwidth.

    I think there is a case for "Just store the damn bits" when it comes to storage class memory, though now our constraint moves back towards efficient capacity management.

    All of this suggests that for the time being (5+ years ?), we will continue to live in a world in which certain critical resources remain scarce (inlcuding human intelligence/time), which require machine resident intelligence to get the most efficiency from those resources. Where that intelligence resides and how tightly tied that is to physical infrastructure may be a factor of how much efficiency you want to drive.

    Interesting times ahead, don't you think ?

    John Martin - @JohnMartinIT

  • Yes, very interesting times, indeed John!  I can't make up my mind either with respect to SDN and stupid networks. It's stimulating to think about the skeptical views of technology in the context of where we are headed. Nicholas Carr's "Does IT Matter?" is another terrific piece of technology skepticism that is worth considering in the rush towards data analytics.

    Your comment about machine resident intelligence is straight to the point, the abundance of processing power can be used to manage capacity through technologies like dedupe and various encoding schemes. The effectiveness of them depends on the workloads and application behaviors. In cases where data disappears forever 3 months after its created, intelligence can do amazing things. In cases where data is constantly being examined, not so much.

    This gets back to the rush to define requirements for things like big data where some people are saying that real time is a hard and fast requirement.  The thing they probably don't realize is that an ill-informed decision like that can saddle them with having to live with management practices that suck away lots of other resources.  If, on the other hand, close to real time is acceptable, they might be able to have their cake and eat it too where storage effectiveness and costs are concerned.  

Your comment has been posted.   Close
Thank you, your comment requires moderation so it may take a while to appear.   Close
Leave a Comment