When dealing with modern IT infrastructure the chances are pretty high that you already came in contact with virtualized systems. Be it running a virtual machine on your Linux or Windows computer with software like VirtualBox, using virtualized networks like VPN or working with modern container-based virtualization as provided by tooling such as Docker.
The broad term virtualization can cover a lot of ground and my main focus in this article lies on platform as well as container-based virtualization; with the goal to provide a basic distinction between different virtualization techniques while introducing some key concepts to build upon.
The general term virtualization refers to the act of creating an abstraction (a virtual version) of a physical IT resource, which can include hardware as well as software. The virtualized component can then be used interchangeably with the physical counterpart.
Virtualization is commonly used to partition an IT resource into separate, self-contained and isolated workloads, which can result in a better and more cost efficient resource usage. One example is to run multiple virtualized web server on top of physical server hardware in a data center. When a single workload is not using the hardware’s full potential, the free capacity can be used for other workloads as well.
Hardware virtualization is also known as platform virtualization and has the goal to create a virtual computing environment, usually named a virtual machine (VM). The virtual machine provides an abstraction layer between underlying hardware resources and the software running inside it. A common use case for this type of virtualization is running a Linux operating system inside a virtual machine on a computer which is set up with Windows.
In this context there are two important terms to know: guest and host. Guest refers to the virtual machine and the software running in it while host is referring to the physical machine where the virtual machine is running on.
To control the life-cycle of a virtual machine a specific software component is required: the hypervisor or virtual machine monitor (VMM). This component allows the creation of virtual machines and provides the abstraction layer between the host and the guest system.
Hypervisors can be grouped in different types: type-1 and type-2.
A type-1 hypervisor is also referred to as a bare-metal hypervisor. This basically means that the hypervisor does not need a separate operating system to run on. Instead, the type-1 hypervisor is its own operating system, which usually allows for better performance. Although, without the driver support of a common operating system like Windows or Linux, running this type of hypervisor requires compatibility with the underlying hardware. An example for this type is VMware ESXi.
The type-2 hypervisor runs in the context of another operating system. While the operating system allows the access control of the underlying hardware, the hypervisors sole responsibility is creating an abstraction layer between the host operating systems and the guest virtual machines. Common examples here are classic desktop based tools like VirtualBox or VMware Workstation.
It should be noted, that the classical differentiation between type-1 and type-2 hypervisors can get quite difficult. Take for example KVM (the Kernel-based Virtual Machine), a virtualization solution for Linux. As the name implies it utilizes a kernel module, which allows running the hypervisor directly on the underlying hardware. Apart from that, there is still a fully functional host operating system running, and a KVM created virtual machine is represented as a regular Linux process on the host.
Types of Hardware Virtualization
Hardware virtualization by itself can be categorized in different types as well.
- Full Virtualization
- Hardware Assisted Virtualization
- Hybrid Virtualization
In this form of hardware virtualization the guest system is completely unaware of the fact that it is running in a virtualized environment. Therefore, no modifications to the guest system have to be made. Although possible without it, full virtualization is almost always associated with hardware assisted virtualization.
Hardware Assisted Virtualization
This type of virtualization is a form of full virtualization and requires that the host CPU microprocessor architecture has special instruction sets which can be used to speed up virtualization. For the nowadays common processor architecture x86-64 e.g. Intel and AMD are providing their implementations of an extended instruction set in form of Intel VT-x and AMD-V.
With hardware assistance the virtualization process can be set up in a way that allows a subset of instructions issued by the guest system to be executed directly on the underlying hardware. This circumvents performance costly interactions with the hypervisor while still maintaining the host’s security. The last point is very important, as it was already stated that virtualization allows for running multiple isolated workloads on a physical resource. Therefore, security measures to prevent virtual machines to arbitrarily interact with other virtual machines on the same host or even messing with the host directly in an insecure manner need to be in place.
In this context the guest system is aware that it runs in a virtualized environment. A paravirtualization-enabled hypervisor provides a special interface, which has to be used by the guest to communicate directly with the hypervisor it is running on. This can result in better performance as the hypervisor’s interface can be more specific tailored to the needs of a virtualized system. The downside of this approach is that running arbitrary guest systems is not possible; the guest system always needs to provide the implementation of the paravirtualization interface. One example of a hypervisor with support for paravirtualization is Xen.
This is a mix of full virtualization and paravirtualization. In this scenario the guest system uses paravirtualization for certain hardware drivers, while relying on full virtualization for other features. This has the benefit, that the guest system does not need to be completely paravirtualized and can also result in a superior performance, as pointed out by Lin et al., 2012.
Virtualization vs. Emulation
The following section is about the differentiation of virtualization from the related but different concept of emulation. Before describing what emulation is about in contrast to virtualization it can be helpful to introduce the term Instruction Set Architecture (ISA).
Instruction Set Architecture (ISA)
The ISA is a formal specification of a processor and defines an interface between the hardware and software. The ISA therefore contains everything you have to know to develop software for a given processor architecture. This includes i.a. fundamental data types, registers, operation modes and the instruction set, which are the actual commands a processor can understand.
The difference between the instruction set and the instruction set architecture is that the former is solely about commands and their parameters while the latter is about architecture and behavior: How is the stack organized? How does the CPU react on interrupts?
As the ISA is a specification, the underlying implementation can differ. It is not even necessary to have an implementation realized in hardware.
One of the most common ISAs nowadays is the x86 architecture (with its 64-bit variant x86-64), which is dominant in the notebook, desktop computer and server market. Another different example is the ARM ISA which can be found in almost every modern smartphone.
If you are interested in how an ISA specification looks like and how complex it can be, take a look at the Intel x86 Architectures Software Developer’s Manual.
The general definition of emulation is to enable one computer system to behave like another computer system. Sometimes you can also find the term imitating or mimicking in this context. This approach is different to virtualization: While emulation tries to mimic a physical hardware device as a whole, virtualization usually tries to virtualize or simulate enough hardware components to run the guest system without modifications; with the goal to execute as much as possible on the host system hardware to speed up the process.
One interesting aspect of emulation is that you usually want to replicate every little detail of a given computer system, including weird behaviors and also bugs. One example for this can be found in the Dolphin Emulator, an emulator for the two gaming consoles Nintendo GameCube and Nintendo Wii. This emulator is able to mimic the original systems in that regard, that even bugs happening on the real hardware can be triggered inside the emulation (see the following issue for a detailed description).
Of course, emulating a whole computer system requires replicating the behavior of different components like memory, IO-devices and the CPU. CPU emulation is usually the most complicated part of an emulator and is basically about replicating the ISA. Running an executable on a system with a different ISA than it was created for requires an emulator which is able to translate between two different ISAs. Therefore, with emulation it is for example possible to execute software applications originally developed for an ARM processor architecture on a system with x86-64 architecture.
The emulation process usually suffers from a huge performance impact as whole instruction sets have to be translated into each other. In contrast to that, full virtualization has the goal to execute as much instructions as possible from a virtual machine directly on the underlying host hardware to maximize performance while still providing full isolation. Therefore, virtualization can only be used if the guest system operates on the same hardware architecture as the host system. Taking the example from above: Running software developed for the ARM architecture can not be executed on x86-64-based systems with virtualization.
Container-based virtualization is also called operating system virtualization. This terminology refers to utilizing operating system capabilities to create container in the context of the running host environment as isolated processes. The software component which is providing the features to achieve this is the kernel) (the core component of an operating system).
Software running inside the container communicates directly with the kernel of the host and therefore has to be able to run on the operating system and the CPU architecture of the host. Executing Windows-based software as a container on a Linux-based host is therefore not possible.
Software isolated in a container is not getting a complete virtualized hardware environment but instead utilizes the hardware of the host. This gets rid of a huge performance impact you otherwise have with virtual machines. Not having to boot a complete operating system and its necessary infrastructure (e.g. device drivers) allows containers to start within milliseconds. Moreover, container images are usually much smaller than the images of virtual machines. Running an application as part of the container can result in a container image trimmed to basically the size of the application itself, which makes handling of these images much faster and easier.
While container-based virtualization is a rather old concept, it got more popular in the mainstream with tooling like Docker. With more recent activities in this area like establishing the Open Container Initiative (OCI) for standardization and evolving container orchestration projects such as Kubernetes, this form of virtualization had a huge impact on how modern cluster and cloud computing platforms have developed.
Other Forms of Virtualization
There are other varieties of virtualization not part of this article. To name a few:
- Network virtualization to create virtual network devices or virtual networks on top of a physical network infrastructure
- Storage virtualization to abstract storage presented to the user from the underlying physical storage resources
- Virtual hard disk drives
These kind of virtualization approaches can be combined with the concepts introduced in this article. For example, hypervisor software like VirtualBox operates on a virtualized hard disk drive which is stored on the host system in one or multiple files (depending on the file format used). This file contains i.a. the actual data of the virtual machine (the operation system installation, applications, etc.). The virtual hard disk is used to present the guest system hard disks of a certain geometry. Every time a guest system writes or reads data to the hard disks the corresponding requests are redirected by the hypervisor to the virtual hard disk file.
- ARM ISA
- CPU operation modes
- CPU registers
- Dolphin Emulator - Issue describing an original hardware bug reproduced by the emulator
- Dolphin Emulator
- Intel VT-x
- Intel x86 Architectures Software Developer’s Manual
- Kernel of an operating system
- Open Container Initiative (OCI)
- Optimizing virtual machines using hybrid virtualization - Lin et al., 2012
- VMware ESXi
- VMware Workstation
- Virtual Private Network (VPN)
- Xen hypervisor
- x86 ISA
- x86-64 ISA