I have been asked to do a post on Windows Azure SQL Databases (formerly known as SQL Azure), but I think it might be more useful to discuss some of the wider issues so I put together a few FAQs that I often get asked.
Why use a public cloud service like Azure to store data?
Probably not to save money as you may have already invested in on premises data centres to store your data. Accountants love moving costs from capital to running costs, but most of us like to own things be it cars, houses or server so having someone else own your data seems a lot like losing control. Also there are often good reason to keep the data near the users particularly if you are doing BI.
There are good use cases or else there would be no market for this. The obvious one is that you have an application running on a cloud service that needs data to service it, perhaps nothing more complex than your online store backed by a database with orders, promotion details products prices and so on. This data might not even persist there – you’ll pull it down every night to load into your warehouse to get a holistic view of your business performance.
Some business don’t have all their users in place or their data. Uses might be working at home on customer site and rather then being limited by your outbound connectivity as a point of failure you could put your data and applications into a cloud service accessible form anywhere there is internet.
Sometimes it is the aggregation and collection of data that’s the problem as in shipping, manufacturing and geographic surveys where lots or remote sensors collect data which needs to be aggregated.
There are also startups chancing their arm on a new idea by running everything in the cloud on a laptop in a bedsit in docklands(I have actually met an outfit doing exactly this).
Other business models are built on sharing sets of data with others, like the ratings agencies.
Then there’s the new big thing in data management Big Data. This is simply data that is too big for your data centre to handle unless you are a government or an oil company, either because its to big, coming at you at more than your network can handle, or of a type that you can’t really process even if you have got it. Social media is an obvious example and the trick here is to take the processing to the data and reduce it down to a digestible form e.g. sentiment analysis.
What about Privacy?
As a Microsoft employee I am probably not the right person to convince you that your data will be kept private but here’s a couple of things,
You can of course encrypt your data with your own certificates, setup private networks, use ADFS to setup trust between what is on Azure and your own infrastructure.
Microsoft’s official Windows Azure Privacy Statement can be found here
What about compliance?
The most common problem in this error is data protection and the laws in the UK and Europe that limit what can be done with personal data. Microsoft policy on this is here in the section on the EU Data Protection Directive. The other common ask is around compliance for Payment Card Industry (PCI). Azure itself isn’t PCI compliance because its the application itself that will pass this tests. For example Zoura a Microsoft partner have developed a solution on Azure which is PCI compliant.
How should I store my data?
The NoSQL debate is actually older than relational databases, so I am not going to cover it here, except to say that if you are using Azure you absolutely don’t have to use a SQL database as part of your solution. For example you can just use the cheaper Azure Tables and Blob Storage to provide a NoSQL solution, depending on what you want to store. As with any application you might develop to run in a datacentre you simply use the right tools for the job.
Big data as I mentioned before has its own special problems and Apache Hadoop is a set of technologies which addresses these by federating the data and the computation performed on it. HortonWorks and Microsoft are working developing Hadoop for Azure and I have a post on this here.
How do I backup my data
In the case of there is no backup a SQL database there is no backup per se. When you create a SQL database you are actually creating 2 extra copies of it on other servers and any changes be that data changes or design changes are automatically replicated to the other copies. However that doesn’t cover user error so you’ll either want to use replication services to copy the database locally to you own servers or another SQL Database at periodic intervals.
What the difference SQL Databases and running SQL Server in a virtual machine on Azure
SQL databases, are Platform as a Service (PaaS), which means Microsoft controls the SQL Server environment, the underlying OS etc. and all you worry about is creating database and managing the data inside them. However that loss of control also means you loose some features notably the other bits of SQL Server you may want use such as analysis services. note you can run Reporting Services in Azure as PaaS with the restriction that the only source of data for the reports are SQL databases.
If you run SQL Server in a virtual machine then its exactly the same as running it inside your datacentre, you have to configure it, manage it, patch it, etc. so more control but more work for you. There is a gallery of virtual machines for you to use as templates e.g. SQL Server 2008 R2 running on Windows Server 2008 R2 or you can create your own.
Every 90 days or so Azure gets revised and new features appear so this post has a pretty limited shelf life so come back and check for updates!