Well, dealing with the creative responses of my co-workers after the first video was so much fun -- comments like "The man may excel at computational data storage and analysis, but he needs to learn a thing or two about whiteboard real estate" So, why stop? The answer is: we already taped the second one and the marketing team really doesn't understand the concept of sunk costs.
In the new video we spend some time talking about benefits of the replication based storage model over backups for protecting your data.
Basically the issue comes down to a couple key benefits of any replication based strategy:
1) Since you are replicating continuously, you don't have a discontinuous process that is intrusive to regular operation that needs to run regularly and more importantly with replication you only need to back up each piece of content ONCE. With a backup model you have to backup each piece every time you do a full backup. This means that the cost and complexity of backup becomes unsupportable once mailboxes get very large. Replication really only copies the mail once so it is continuous and the cost and complexity are proportional to delivery rates NOT how long each mail is kept.
2) Because the copies are fully up-to-date all the time and are true, validated replicas, the time it takes to restore (i.e. get the backup up and running) is much faster in a replication model.
There are more benefits of replication that aren't covered. Things like being much more secure against logical corruption in the storage stack because the write paths are so different on the primary and replicas and the ability to spread the replicas around a continent and get a true disaster recovery benefit. But we can go into more detail on those issues another time.
At this point, I suppose some might be wondering if classic backups are good for anything. Well some people might very well think that. I couldn't possibly comment. At least right now... But I'd like to hear what you think!
REALLY liked how you set out the replication models against traditional backups, however I guess the thing I still worry about is the trust in the replicated copy, assuming that
a) replication doesn't break due to a network based event
b) betting the farm on a replication model, that as you righly stated could get an itpro fired for loosing data.
To me it boils down to trust, and building trust in the technology. The SQL guys have had log shipping for years, but they're still backing up, what does that say about trusting log replication in general?
I'm looking forward to MORE videos please!
Another nice talk.
I guess like last time I feel it was a bit brief.
My concern in recommending replication as a backup strategy is the compliance and retention side of things. I often get requests like “Who sent this email to whom and we need a copy…. Oh an yes it happened last year ”.
Should I be thinking of journaling/ archiving to solve this and should I then backup my journals / archives?
Thanks for the Videos, I am also looking forward to more.
My only comment here (more probably for commenters than Perry :)), apart complementing Perry for a great presentation, is that, and this probably is a common mistake done by many companies is mixing two conceps, such as Data Protection and Service Continuity... and even sometimes mixing backups with journaling, archiving and legal hold...Each one of those things may be or not be important depending in each case and each customer... At UK we will have a messaging operations days next month, and my session will be based on Backup vs Backupless, and we will discuss that deeply, and after presented and screened for any confidential info, will be posted at my blog!
Looking forward for more stuff Perry!
Good topic for discussion. I agree with your cost model and the benefits outlined.
Replication for Data Protection is a technology that most companies are (or should be) aware of, even if it is local replication using RAID. Replication to another site or array using SAN based replication or Exchange 2010 High Availability technologies just takes it to the next level.
However replication isn't the the whole package for "Data Protection", as the technology inherently just replicates the current state. When a user logs a request along the lines of "I just hard deleted the entire content of my Inbox can you please get it back" you realise replication isn't going to save you. Oops, i shouldn't have stopped doing backups.
Replication also doesn't protect your data if it replicates a corruption in the database.
Replication is a great tool, but one of many you should use. It would be great if you could continue with the other "tools" required to complete the Data Protection toolbox.
I'd be interested to hear your recommendations on how to deal with scenarios like the one OldMan has described: how to recover items that someone only realised they deleted last week/month/year, or how to recover emails from an mailbox that we know existed 4 years ago.
The answer might involve journalling everything, preventing deletion, etc. But how would you design the system now, if you thought you might have those kinds of requests 3 years into the future?
OldMan and Koolhand, You are right, replication is just one component of the full data protection model.
Replication protects you from hardware/software/datacenter failures.
As indicated you need protection mechanisms for accidental/malicious item deletion. I highly recommend reading my two blog articles on this subject to see how this is provided within Exchange 2010 - http://msexchangeteam.com/archive/2009/09/25/452632.aspx and http://msexchangeteam.com/archive/2010/04/26/454733.aspx.
You may also need protection from administrative error, automation errors, rogue administrators, store logical corruption events - all of which require some type of point in time backup mechanism. Do you need to protect against all of these? No. It boils down to risk, the acceptance to risk, and how you can mitigate these in other ways. Some of these can be met with lagged database copies. Some may require isolation (e.g. rogue administrator).
As for how do I got back 3 years? Well if the requirements at the time of the design were to only go back 90 days, then you cannot and no other system/solution would be able to do that either (simply can't retroactively make it happen). In essence, you need to know your requirements at design time and have documented SLAs around the recovery scenarios, specifying the what, how, and when around data recovery. Could that mean the SLAs and design are changed at a later date? Yes. But if you don't know the requirements or SLAs you cannot design the solution to meet those unknown goals.
As far as compliance goes, that is another layer that has to be considered in the data protection space. Knowing the regulations (corporate or legal) and what must be retained and for how long will define your strategy. As one example was used - proving how a user received an item - that requires a journaling solution.
Hope this helps.
Great questions -- I'm working on another video which answers many of these questions. It should be posted in a week or 2.
Having attended Pedro's presentation at MS UK along with othe MS Exchange field engineers I have the feeling that for compliance reasons I may still need a backup solution. The biggest issue for me is that Exchange 2010 won't allow me to assign archive mailboxes to a different user. So user A leaves I'm required to at least disable the account or more likely delete the account due to IT Security requirements. But I can't move all the stuff to an archive mailbox and allocate to another account. I'm wondering if I could just increase the time of mailbox retention for the period of our corporate rentention policy but I haven't done the math. Nor am I sure the cross mailbox search would see mailboxes that are deleted but in the mailbox dumpster. For these reasons I'll need to look at journalling or DPM