Hyper-V is a very cool technology. It's also a very complex technology, with a lot going on under the covers. When I deliver sessions on Hyper-V, and I talk about the architecture, it's a difficult concept to discuss quickly, hence, hopefully, this post will go some of the way to trying to explain the Hyper-V architecture in an easy to remember, and digest, way!
So, let's start with the Microkernalised approach to virtualisation. The diagram on the left shows the Hyper-V hypervisor. What is the hypervisor? Well, its a layer of code that runs natively on the hardware - it's the lowest thing on your system, and it's responsible for owning the hardware, and doing hardware and resource partitioning. The main difference between this and a monolithic hypervisor such as VMware's ESX, is the location of the drivers and location of some core operating system components. As you can see, in Hyper-V, the drivers exist in the actual partitions themselves, rather than actually in the hypervisor. This means that we can massively reduce the size of the hypervisor. In fact, Hyper-V's hypervisor is around 600kb in size. That is a small bit of code. Interestingly enough, this is a small bit of Microsoft code.
That's right. It's 100% Microsoft - there's no 3rd Party code gone into this platform. There are no operating systems components in the hypervisor. It's just a thin slither of reliable, and secure code. Now, certain people in the industry have questioned Microsoft's driver model, based on previous experiences with Windows drivers, and the fact that anyone could pretty much knock up a driver and it would be 'accepted' by Windows. I'd have to disagree with x64 drivers and Windows Server 2008. Microsoft have worked incredibly hard to provide mechanisms and structured programs for device manufacturers to produce top quality drivers, and sure, some will slip through the net, but as the mechanism evolves, this will be reduced and reduced. When it comes to x64 drivers, they all have to be signed, which means they have to go through even more testing to ensure they are of a good quality, and we've made loads of guidelines available for driver-writers (if that's what they are called!) here: http://www.microsoft.com/whdc/driver/64bitguide.mspx so it's not like they have to find their own way around the topic. Anyway, I digress.
You may or may not have had the chance to read a whitepaper about 'Blue Pill' which is a hypervisor rootkit security whitepaper. It's naming is in reference to the Matrix, where in the movie, if you take the blue pill, you exist in the computerised world, and continue to be controlled by the computer and you have no idea you are being controlled...The concept here is, if someone was to take control of your hypervisor, with it being the lowest element on your system, the elements above the hypervisor would not know they were being controlled, and would find it very difficult to detect. With this in mind, keeping core operating systems bits out of the hypervisor, and keeping it trim, clean and secure, also brings a strong level of reliability.
So, that's enough about the hypervisor as such, but lets look at the rest of the architecture:
So, when it comes to the hardware, it needs to be hardware with Intel-VT or AMD-V hardware assisted virtualisation technology, it needs to be x64 (not Itanium), you also need to enable the 'No Execute Bit' in the BIOS too. This is used to create a more secure environment. The diagram above is an installation before enabling Hyper-V. What you see on the left hand side of the diagram could be a full GUI version of Windows Server 2008, or it could be a Server Core installation. Advantages of Server Core include smaller footprint, lower attack surface, and reduced patching to name but a few. When we enable Hyper-V:
The hypervisor now slides under the OS, and this now becomes the lowest part of your system. Remember, it's a very thin, secure hypervisor. It's like a thin layer of veneer on the hardware. Enabling Hyper-V also brings the bits in the purple boxes:
The VM Worker Processes are individual processes spawned for each virtual machine, and are designed to handle things like emulation...
As you can see, I've added 2 virtual machines (child partitions) here to explain VM Worker Processes a little further. You can see that the 2nd VM is a non-hypervisor aware OS, which means that the hardware it sees, needs to be emulated. This is the old IO model, used in Virtual Server and Virtual PC. Why is emulation good? It's good because it emulates known hardware. Known hardware such as 440BX Chipset Motherboard, a DEC Ethernet Controller card, and more. These examples are pieces of hardware that are very standard in the industry, and nearly every operating system under the sun understands. Microsoft or otherwise! Downfall of emulation is the cost in terms of an IO perspective. If there is an app running in the emulated VM, say, Excel, and it's wanted to save a 100kb file down to the hard drive. What happens is, is that Excel tries to write down into Kernal mode - it's not aware that's its running on a hypervisor and thinks it has direct access to the hardware. So, what we have to do, is do a trap to grab that request, bring it over to the Parent partition, into Kernel Mode, up into User Mode, into the VM Worker Processes, and that's where the emulation happens. Now, to give a crude estimate, but for a 100kb write, it takes about 80 traverses from the User Mode on the Child Partition, down into Kernel Mode, up into Kernel Mode on the Parent, and up into User Mode and the VM Worker Processes, and back again to make that 100kb write. So, inevitably, there's a performance hit with this type of virtualisation, but, on the flip side, you have a broad range of operating system support, such as below:
If you now look at the other Child Partition, (in this case, with Windows Server 2003 / 2008 listed as the OS), this VM is not using emulation. It's using the VSP, VSC and VMBus architecture. This is the Virtual Service Provider, Virtual Service Clients, and they communicate over the high speed 'In Memory Bus'; VMBus, that we've created. It's 100% in memory, and not physically tangible in any way. It's been designed for IO traffic. So, effectively, the VM is writing directly to a driver (which is a VSC), and this information transfers directly to the VSP over VMBus (jumps back and forth a few times), and then onto the hardware below.
So, to expand on the VSP/VSC relationship, as you can see, we have the Parent partition on the left, and the hypervisor aware Child Partition on the right, split into User (top) and Kernel (bottom) modes. The Orange/Yellowy colour you can see, are all Hyper-V related bits. So, on the right hand side, the application tries to do a write, via the Windows File System, Volume, Partition and on to the Disk. If you're using emulation, you'd keep going down, across and up to the VM Worker Processes (not using VMBus) and it would go back and forth, back and forth, before it makes it's way to the StorPort MiniPort driver, and down to the hardware.
Now, with VSP/VSC, you work your way down, and it hits the VSC, goes across the VMBus to the VSP, still in Kernel mode here - there has been no need to go into User mode to handle this. Every time you go between User/Kernel mode, you take a performance hit, and because you are going back and forth, quite a few times in emulation, you take quite a few performance hits. This doesn't happen in the VSP/VSC world. Once the data is over at the VSP, it writes directly to disk, and then down to the hardware. Very fast and very performant. So, you get a very performant guest operating system, providing it knows it's running on the hypervisor. This reduces down the number of operating system choices by quite a way - right now, Windows Server 2008 and 2003 (with SP2), XP SP3 and Vista SP1 too.
What we're also hearing from customers is that you'd like a single platform to virtualise not only Microsoft OS's, but also Linux OS's, and that's where the partnerships with companies like Citrix, and Novell come in.
So, as you can see, we have 3rd type of Child Partition, namely, the Xen-Enabled Linux VM. This could be Novell SUSE SLES 10 SP1 as an example. So, this VM is running the Linux Kernel, and we've worked closely with the relevant organisations' to write the relevant VSC's and Hypercall Adaptors, that ensure that calls for hardware made by Xen-Enabled Linux VMs are handled in the most optimal way, rather than pushing them down the emulation route, as described earlier. This means certain Xen-Enabled Linux VMs really will be first class citizens on the Hyper-V platform. I'm sure, as time goes by, you'll find even more OS's come along with the ability to take advantage of the VSP/VSC architecture, as it really is the way to go :-)
That's about it - hopefully that's helped you understand the architecture, I know it's helped me by getting it off my chest!
PingBack from http://topdriversblog.net/2008/05/16/getting-my-head-around-the-hyper-v-architecture/
Matt, 600Kb hypervisor is great....But aren't all of the VM's reliant upon VM1 being available? E.g. if the kernel on VM1 TRAPS then VM1 crashes along with all of the other VMs on the machine?
You're dead right - VM1, being the parent, is a single point of failure, so it's a good job that the team have built a Server OS, i.e. the parent, to be as stable as possible.
The recommendations will be to run Hyper-V as a role on it's own, so don't have the server doing a load of other roles, such as being a web server, and a domain controller and so on. The other recommendation, and this is what we see as the main recommendation, will be to use the Server Core installation of Windows Server 2008, and enable Hyper-V in this environment. This way, you have a very small footprint, low attack surface, reduced overhead, and reduced patching. We then envisage people managing the system remotely, from another machine or a centralised management platform like Virtual Machine Manager.
So, in summary, the parent partition is the single point of failure (at least for now), but the Windows team have created a very stable OS, already on SP1 code, and using the Server Core install will ensure that the parent partition is doing the exact job you want it to do, in a secure, reliable and scalable way.
Hope that helps,
It has been a busy time over the last couple months. Microsoft are in the last quarter of their financial
600K for server core + hypervisor? That would be amazing. Please tell me its true..
Unfortunately not - it's 600k for just the Hypervisor. The Parent Partition will vary in size depending on whether you choose to deploy Server Core or Full Windows Server 2008. Server Core will ensure a minimal footprint, and you can even strip out the other roles that you aren't using on the Server Core box to reduce size still further.
Your blog is very explanatory. Appreciate your time.
In VSC/VSP world, VSP writes in to the hardware directly. What is the hypervisor role in that scenario?
Awesome job.....now thing are much clear
My understanding of the VSP/VSC Hypervisor relationship in the scenario you describe is that the VSC (Virtual Service Consumer) communicated with the VSP (Virtual Service Provider) in the Parent Partition over VMbus, and i believe that VMbus does not provide a mechanism for partitions to directly communicate with the hypervisor but it does rely on the hypervisor to establish the communication channels between partitions. Once communication has been established, I believe that the VSP communicates directly with hardware, rather than with the Hypervisor itself.
You may find this video very useful: http://edge.technet.com/Media/Hyper-V-how-it-works-Interview-with-PMs-Part-1/?tapm=A89S35A08 - look at about 4mins on the video.
Can an IHV provide both VSP/VSC components to optimize performance? Can IHVs take advantage of this knowledge when desiging the hardware?
That's a great question, and in all honesty, I'm not 100% sure. For definate, the closest we come around an IHV tuning components for Virtualisation is Intel-VT, or AMD-V, where they are building specific hypervisor awareness and capability into the chipsets. This is now moving forward into memory, with Extended and Nested Page Tables.
I'm sure this is something that vendors are onboard with now, and going forward into the future, where I'm sure we'll see even more native performance for a VM.
Excellent explanation of the whole process.Thanks Matt.