I caught a really good presentation from Chris Kranz(www.wafl.co.uk and @ckranz),  Principal Solutions Architect responsible for Storage and Virtualisation projects at leading Systems Integrator, Kelway (UK) Ltd. at the Leeds VM user group last month.  Chris first encountered virtualisation technologies back in 2005 with early releases of VMware ESX and has since gained the highest level of certification with VMware and continues to expand his knowledge with Microsoft and Citrix virtualisation technologies.

 

His talk was on virtualising  tier 1 SQL Server applications and he has kindly translated this into a guest post  -  take it away Chris.. 

There are some real challenges around virtualisation at this stage, but (un)fortunately most of these are due to being over cautious about the technology now available or holding on to some of the key concepts and fears from when virtualisation first burst into the x86 market some years ago.

Many people have now started the journey into virtualisation, and many have a little experience of the process of virtualising smaller applications. This has been hugely successful and a favoured route for people to follow while they are getting to grips with this new technology. Grab the “low hanging fruit” and virtualise these first, then learn from mistakes and move on and up the service stack. The challenge comes in that many people stall at around 70-80% virtualised; around this time we get to the business critical or performance sensitive applications. Through the initial virtualisation process, performance issues and capacity issues are often encountered and dealt with through the normal learning curve, but this creates a concern and business hesitance for tier-1 applications, specifically for SQL DBAs!

However an important point to understand is that virtualising tier-1 applications should not be the same as virtualising the rest of the infrastructure! Tier-1 is very important and business critical, so why consolidate this? The platform costs are rarely an issue as these systems pay for themselves through business services and their availability is hard to put proper costs on as they are critical to the business actually functioning. So we need to create new rules for virtualising tier-1 that are different from standard virtualisation.

  • Encapsulation of any workload is going to make it easier to protect, easier to replicate, and easier to upgrade!
  • Dedicate resources as they are required. There is no issue with a 1:1 consolidation ratio of tier-1 applications if you require 100% guarantee of resources. But don’t forget that the hypervisor requires resources!!!
  • Ensure that the sizing is correct, not just CPU and memory, but disk and network also! These are all critical areas and something a tier-1 application will use differently than other virtualisation candidates.
  • Make sure the design is validated and you do you maths! Double check any calculations supplied and don’t make assumptions unless you have a way of validating them later.

A very important statement to drill home: “Good practices are industry recommendations. Best practices are validated by yourself based on your services”. Don’t believe everything you read and make sure you validate your design yourself. Even this article here, validate my recommendations and others to prove that they work for you. Public Enemy said it best: “Don’t believe the hype”.

Check with your vendors and resellers about licensing and support. Many software vendors are now fully on board with virtualisation, Microsoft have the Microsoft Server Virtualisation Validation Program (SVVP) and VMware have an ISV portal specifically designed to encourage ISVs to get VMware approval. Hassle your ISV if they haven’t certified yet! CPU’s can be restricted in virtual environment, so you could actually get license cost savings. Make sure you do your research though! There’s no point in me covering the specifics of licensing here as the vendors seem to change their mind quite often! Different licensing models may be more effective for you once you are virtualised from how you license the systems today.

Virtualisation efficiency has improved phenomenally over the years. 6 years ago when I first got into virtualisation the overheads were 20% or greater, but today with better CPU efficiency, hardware offloading and general hypervisor improvements, the overhead should be less than 5% on CPU, and memory. IO throughput is large enough to satisfy even the most resource intensive applications. The advantage of virtualisation is that you have the ability to choose between scale-up and scale-out with fairly easy and simple management and control. Both offer their own arguments and different styles in making your infrastructure agile, although a scale-out approach tends to be favoured to create a more modular infrastructure. Remember this is just one point of view, work out what works best for you and validate!

 

Scale Up Approach

Scale Out Approach

Multiple databases per VM

Single databases per VM

Fewer VMs

Better workload isolation

Greater single point of failure

Easier change management

Larger VMs (SMP overheads)

Load balancing more effective

OS bottlenecks

Faster migrations

Less flexibility or load balancing

Greater performance due to multiple BUS and IO planes

Greater impact on maintenance

Quicker provisioning

While you’re analyzing the infrastructure, consider moving the servers to a scale-out design when virtualised. This will make the adoption quicker as you can migrate single instances and applications rather than entire database servers. This will also minimise any downtime or impacts of configuration issues as you migrate and the effected services will be greatly reduced. Another nice feature of virtualisation is being able to create virtual appliances. This allows you to tie a database server to an application server and package these up. This is really compelling when you come to consider moving to a shared infrastructure as you can pickup individual applications and services and move them around much more flexibly.

While talking scale-up / scale-out it is important to ask the question, do you really need multiple CPUs? Do you understand the overhead that SMP can cause? Now while you may never question that SQL requires multiple CPUs, do other VMs that share the same hardware require this? Whenever you move from 1 CPU to 2 CPUs you add the overhead of CPU scheduling. Every CPU thread needs to be scheduled at the same time, and if that number of cores is not available, then the CPU is put into a wait state and the transactions pause. Additionally don’t forget the hypervisor. A SQL VM with 8 vCPUs assigned to a dual socket, quad core system will be in direct competition with the hypervisor as the SQL VM will be requesting all CPUs at once. This is important in the planning stages as you don’t want to create this sort of contention. Remember that with Windows 2008, CPU and memory can be hot-add to virtual machines. I have seen application servers run quicker as single CPU servers because of the CPU scheduling overhead on a larger shared infrastructure. Do you testing and don’t give away SMP unless it is proven to give a performance improvement.

Remember to read and follow the Microsoft best practices for SQL deployments. These don’t change critically just because it’s a VM, you should still follow the basic rules defined by Microsoft. Evaluate your load and run your performance planning when the SQL server is under intensive load. It’s no use getting the capacity reports for a 5 day business week if the SQL server does a huge amount of batch processing and backups over the weekend! While you’re sizing the solution, make sure you consider the back-end storage requirements and the connectivity media. 1GbE iSCSI may not be able to deliver if you are serving storage over 4Gb FC today. This can have a huge impact on performance if it’s done wrong! Also consider defragmenting your SQL databases (http://support.microsoft.com/kb/943345). This can have a big performance benefit to your SQL server, but make sure you consult your storage administrator as it can have a major impact (negatively) on the SAN! Monitor all aspects of the existing implementation, including disk performance and queues. High queues could show that there is an existing bottleneck that may be hiding other potential configuration issues. Make sure that you always have dual paths to the storage and you are using full multi-pathing software when it is available.

When sizing and placing the storage, don’t forget to consider that multiple virtual disks on a single data store or LUN will not provide physical IO separation, and you may require this for separating database files or logs. Make sure the back-end storage is built in such a way that you are using as many spinning disks as possible, or if you are lucky enough to have SSD or Flash, make full use of what is available! If you separate data files for each of your databases (which can be a very good idea to effectively spread the database load across multiple storage areas), make sure you use equal sized data files. SQL uses a proportional fill algorithm that favours allocations in files with more free space, so if these are all equal in size, the allocation across all files should be equal. Hopefully it should go without saying that you should pre-size the data files, both for databases and logs. Auto-grow can have a huge impact on both performance and data fragmentation (which will affect all future read performance), if you haven’t sized the data files effectively at the start, then grow the database manually to prevent too many file extensions! If you do use auto-grow, set the growth iterations large enough to minimise the number of extensions a database will have to make. Make sure that log files are designed the same way; pre-size these for the expected load (typically 10% of the database size). A good rule of thumb is that a separate TempDB should be created for each CPU in the system, so size for the number of vCPUs. Testing has shown that above 4 TempDBs, the performance improvements start to diminish, so look at a ceiling point of 4 TempDBs (even with 8/16 CPU systems).

Monitoring the performance is probably the most important aspect of virtualisation, especially after you have virtualised. It helps you pre-empt any configuration issues and plan for performance growth and requirements. Some key areas to monitor from within PerfMon are the following…

  • CPU
    • Ready
    • Usage
  • Memory
    • Active
    • Swapin
    • Swapout
  • · Storage
    • Commands
    • deviceWriteLatency
    • deviceReadLatency
    • kernelWriteLatency
    • kernelReadLatency
  • Network
    • packetsRx
    • packetsTx

These should give you a good overview of how the system is performing. Also check performance at the hypervisor level and make sure you don’t have any contention issues. A traditionally simple way to diagnose CPU contention issues is if the clock is more skewed than you expect as the system clock usually relies on CPU cycles.

Virtualising SQL should not be such a dangerous project, and if you approach it carefully with a clear plan of what you want to achieve and how you want to achieve it, it will be a success! You don’t need expensive infrastructures or lengthy consultancy engagements, with the above information and knowledge you should be able to approach this and succeed.

The last thing I’ll leave you with is this: validate your own designs and develop your own best practices! My recommendations are only good practices!