Virtualisation¶

Warning

This chapter is work-in-progress and may contain vastly incomplete information.

This section will provide a guide on how to provide virtual, isolated environments for users. But first, we will discuss the different environment types there exist and their up and downsides.

Types of Virtualisation¶

What is Virtualisation?¶

Roughly speaking, the idea of virtualisation is to provide an isolated environment (from now on called the Guest) where code which is has a trust level less-than-or-equal to the surrounding system (from now on called the Host). More abstractly, virtualisation provides a (more or less) well-defined Interface for a Guest to operate on. This interface may be PC hardware, the POSIX API or the System Call interface of the Linux kernel.

Seen under this light, one could consider the Linux kernel as a type of virtualisation itself: for each process running under the kernel, a well defined interface is provided. The processes cannot (except by well-defined means) interact with each other, and they do not need to care about the hardware specifics.

Full Virtualisation¶

Paravirtualisation¶

Containers¶

Containers are really a special form of Paravirtualisation where the interface which is virtualised is the kernel of the Host operating system. In case of Linux containers, this is obviously Linux.

I will go a bit more into detail on Linux here. On a Linux system, we already have well-separated processes, so some argue that’s all the isolation you need. However, there are shared resources which do well from being isolated too, which is why Linux Namespaces were invented. A Namespace is basically an isolated version of a resource space, for use only by processes which live in that Namespace.

There are namespaces available for Cgroups (which are by themselves a powerful tool to control resource usage of processes), Inter-Process Communication, Networking, the Filesystem tree (the Mount namespace), Process IDs, User- and Group IDs, and finally basic system information like the Hostname (UTS namespace).

By starting a process in a new namespace for each of these, we essentially get an isolated Guest Linux system which cannot interact with the Host system much. It is not that simple though, which is why there are tools which manage this type of container.

Trade-offs, or Deciding for a Type of Virtualisation¶

We have now discussed multiple forms of Virtualisation. Each has its merits and purpose in existence, but for a specific use-case, we have to decide which type of virtualisation to use. So here is a quick summary of what you can do with each type of virtualisation.

Feature	Full Virtualisation	Paravirtualisation	Linux Containers
Can support “any” Guest OS	yes	no (1)	no
Free choice of supported Guest OS by user	yes	yes	no
Overhead	very high	medium	low
Isolation level	highest (3)	high	it depends (2)

Notes:

Paravirtualisation can support many OSes, but it requires additional support. So it is unlikely that you will be able to run your 90ies copy of MS-DOS on virtio.
The isolation level highly depends on the configuration of the individual container.
Even with Full virtualisation, the isolation between Guest and Host, as well as between different Guests may be broken by (often performance-enhancing) technologies employed on the Host, for example Kernel Samepage Merging.

So as a rule of thumb: If you have the resources and you need strong isolation (for example to host virtual machines for arbitrary third-party users), use Full or Paravirtualisation. If you need to isolate individual services or simply need to provide a specific environment for a service to run in, Containers are the tool of choice. For anything in-between, for example if you need strong isolation but don’t have the resources for Paravirtualisation, you will have to consider the trade-off of having less isolation or having to save some resources elsewhere.

In general, the trade-off is between resources saved and isolation achieved. More isolation means more resources need to be expended, simply because more things need to be emulated instead of re-used.

The remainder of this section will deal with setting up virtual machines and containers on a modern Linux system.

Introduction in libvirt¶

libvirt is, in my opinion, the tool to manage both Containers and QEMU/KVM Virtual Machines on Linux. Before we go into details, let us first describe the low-level tools which are available to even create any type of virtualisation on Linux.

Tools for Virtualisaton on Linux¶

For Full and Paravirtualisation, there is QEMU/KVM. KVM stands for Kernel Virtual Machines and, as you might guess from the title, uses Kernel and hardware support to achieve fast virtualisation where supported for the specific Host and Guest platform combination. QEMU is a frontend and also a set of implementations for Full and Paravirtualisation of different platforms. There are full emulators for mips, Power PC, Sparc, ARM and of course x86, but in general you’re better off using the Paravirtualisation support based on KVM. KVM makes use of the platforms hardware-assisted virtualisation extensions, if available. As discussed, this greatly reduces the overhead induced by virtualisation.

For Containers, there is LXC. It manages the creation of namespaces and Cgroups to isolate the Linux-based operating systems running in your containers. There also is tooling to bootstrap different Distributions.

While the LXC user interface is slightly more convenient than QEMU/KVM, I prefer to have all my virtualisation managed by a single entity. This helps with setting up coherent and reusable networking and firewalling, which is why I am advocating the use of libvirt.

By the way, tools like LXC and QEMU which effectively run the virtualised Guest operating system are called Hypervisors. There are other Hypervisors for Linux, but they are at least partially proprietary or simply based on LXC or QEMU, which is why I will not go into detail on those.

What is libvirt?¶

libvirt is a set of tools and libraries which manage different Hypervisors with a single, XML-based interface. Virtual Machines (or Domains in the libvirt language), Networks between those and the Host as well as other management related to virtualisation is managed through XML definitions. Those are well-documented on the libvirt website.

Networking with libvirt¶

For the basic use-cases like simple NAT or routing, the network capabilities of libvirt will be sufficient to you. However, port forwarding in a NAT scenario is not available with libvirt—it needs to be implemented with iptables rules manually, and will is discussed in the section on Firewalling.

Example Use Cases¶

Note

On Debian 8 (Jessie), the packages required for libvirt with QEMU/KVM can be installed via:

apt install libvirt0 libvirt-bin

So now that we have discussed the available tooling, let us go through a few “simple” use cases.

Setting Up a Network with libvirt¶

Requirements¶

You have the 203.0.113.177/29 IPv4 network (8 addresses) and the 2001:db8:e2f3:e12d::/64 IPv6 network (see The Internet Protocol) routed to your Host.
You want to add virtual machines and/or containers to that network and assign them addresses.

Required steps¶

First, you would create a libvirt network in which the machines can live. We don’t need NAT here, so forwarding is trivial. A network could look like this (we assume that eth0 is the network interface to the internet):

<network>
  <name>for-all-vms</name>
  <!-- use forwarding mode here, no NAT required -->
  <forward dev='eth0' mode='route'>
    <interface dev='eth0'/>
  </forward>
  <!-- this defines the name for the bridge interface
       used for the guests -->
  <bridge name='guests1' stp='on' delay='0'/>
  <ip address='203.0.113.177' prefix='29'>
    <dhcp>
      <range start='203.0.113.178' end='203.0.113.182'/>
    </dhcp>
  </ip>
  <ip family='ipv6' address='2001:db8:e2f3:e12d::1' prefix='64'>
  </ip>
  <ip family='ipv6' address='fe80::1' prefix='64'>
  </ip>
</network>

We can define, start and autostart this network with libvirt by running (assuming the network is in a file called for-all-vms.xml):

# virsh net-define 'for-all-vms.xml'
## note: `for-all-vms` is the name defined in the <name/> element,
## it has nothing to do with the filename!
# virsh net-start 'for-all-vms'
# virsh net-autostart 'for-all-vms'

You should now see such a network:

# brctl show guests1
bridge name  bridge id               STP enabled     interfaces
guests1              8000.52540025b71a       yes             guests1-nic

# ip address show dev guests1
3: guests1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 52:54:00:25:b7:1a brd ff:ff:ff:ff:ff:ff
    inet 203.0.113.177/29 brd 203.0.113.183 scope global guests1
       valid_lft forever preferred_lft forever
    inet6 2001:db8:e2f3:e12d::1/64 scope global tentative
       valid_lft forever preferred_lft forever
    inet6 fe80::1/64 scope link tentative
       valid_lft forever preferred_lft forever

# ip route show dev guests1
203.0.113.176/29  proto kernel  scope link  src 203.0.113.177

As you can see, libvirt took care of setting up the bridge interface for us, setting up addresses and routes. It even has defined IPtables rules for forwarding:

# iptables-save
*mangle
:PREROUTING ACCEPT [389:27040]
:INPUT ACCEPT [389:27040]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [239:26500]
:POSTROUTING ACCEPT [239:26500]
-A POSTROUTING -o guests1 -p udp -m udp --dport 68 -j CHECKSUM --checksum-fill
COMMIT
*filter
:INPUT ACCEPT [389:27040]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [248:27444]
-A INPUT -i guests1 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i guests1 -p tcp -m tcp --dport 53 -j ACCEPT
-A INPUT -i guests1 -p udp -m udp --dport 67 -j ACCEPT
-A INPUT -i guests1 -p tcp -m tcp --dport 67 -j ACCEPT
-A FORWARD -d 203.0.113.176/29 -i eth0 -o guests1 -j ACCEPT
-A FORWARD -s 203.0.113.176/29 -i guests1 -o eth0 -j ACCEPT
-A FORWARD -i guests1 -o guests1 -j ACCEPT
-A FORWARD -o guests1 -j REJECT --reject-with icmp-port-unreachable
-A FORWARD -i guests1 -j REJECT --reject-with icmp-port-unreachable
-A OUTPUT -o guests1 -p udp -m udp --dport 68 -j ACCEPT
COMMIT

# ip6tables-save
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [6:636]
-A INPUT -i guests1 -p udp -m udp --dport 547 -j ACCEPT
-A INPUT -i guests1 -p udp -m udp --dport 53 -j ACCEPT
-A INPUT -i guests1 -p tcp -m tcp --dport 53 -j ACCEPT
-A FORWARD -d fe80::/64 -i eth0 -o guests1 -j ACCEPT
-A FORWARD -s fe80::/64 -i guests1 -o eth0 -j ACCEPT
-A FORWARD -d 2001:db8:e2f3:e12d::/64 -i eth0 -o guests1 -j ACCEPT
-A FORWARD -s 2001:db8:e2f3:e12d::/64 -i guests1 -o eth0 -j ACCEPT
-A FORWARD -i guests1 -o guests1 -j ACCEPT
-A FORWARD -o guests1 -j REJECT --reject-with icmp6-port-unreachable
-A FORWARD -i guests1 -j REJECT --reject-with icmp6-port-unreachable
COMMIT

This is quite handy, because now we have a network we can use for any virtual machine or container.

Host Virtual Machines for Third-Party Users¶

Requirements¶

You have the network from the previous use-case set up.
Each individual machine should be addressable by a single IPv4 and a single IPv6 address.
Users should be able to pick their own operating system, as long as it runs on x86_64.
Users should be able to start/stop/reset their virtual machine at will.
Users should be able to get a virtual console to debug when the network setup on their machine is broken or during install.

Required steps¶

First, you should talk to your users what they need. Important questions to ask your users and then yourself (do you want to allow that specific behaviour on your infrastructure?):

Which Operating System do they want to run?
How many virtual machines? And for each virtual machine:
- How many CPU cores?
- How much Memory?
- How much Disk Space?
- How many network interfaces?
  - How shall these network interfaces be connected?
Do they need Nested Virtualisation (i.e. do they want to run Paravirtualised Guests inside their Guests)?

Once you have answers to these questions, you can start writing a template for the virtual machines. It could look like this:

<domain type='kvm'>
  <name>MACHINENAME</name>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <vcpu placement='static'>8</vcpu>
  <resource>
    <partition>/machine/USERNAME</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-2.1'>hvm</type>
    <bios useserial='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='host-model'>
    <model fallback='allow'/>
  </cpu>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <pm>
    <suspend-to-mem enabled='yes'/>
    <suspend-to-disk enabled='yes'/>
  </pm>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw'/>
      <source dev='/dev/vg_main/LOGICAL_VOLUME'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <boot order='1'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/var/lib/libvirt/images/CDIMAGE'/>
      <backingStore/>
      <target dev='hda' bus='ide'/>
      <readonly/>
      <boot order='2'/>
    </disk>
    <interface type='bridge'>
      <mac address='52:54:00:0e:7c:MACSUFFIX'/>
      <source bridge='guests1'/>
      <target dev='INTERFACENAME'/>
      <model type='virtio'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <graphics type='spice' port='PORT' autoport='no' passwd='PASSWORD'>
      <listen type='address' address='127.0.0.1'/>
    </graphics>
  </devices>
</domain>

A few notes:

The amount of memory in the template is fixed, you can choose any value you like.
The ALL CAPS things need to be replaced for each VM.
The template uses LVM volumes for each guest as backing store; you may use any type of device supported by libvirt you like.
The CDIMAGE is supposed to be something bootable which boots the installer.
For installation, you will need to make it possible for users to connect to 127.0.0.1:PORT on the VM host. You can achieve that by granting them restricted SSH access for port forwarding. I would not trust SPICE or VNC to run unencrypted over the public internet, so this seems like a viable option. SSH access will come in handy for users to manage their VMs anyways.
The network definition as-is does not prevent any kind of ARP or DHCP spoofing. Do not use with malicious guests.
This is a Work-in-Progress.

Virtualisation¶

Types of Virtualisation¶

What is Virtualisation?¶

Full Virtualisation¶

Paravirtualisation¶

Containers¶

Trade-offs, or Deciding for a Type of Virtualisation¶

Introduction in libvirt¶

Tools for Virtualisaton on Linux¶

What is libvirt?¶

Networking with libvirt¶

Example Use Cases¶

Setting Up a Network with libvirt¶

Requirements¶

Required steps¶

Host Virtual Machines for Third-Party Users¶

Requirements¶

Required steps¶

Table Of Contents

Related Topics

This Page