diff --git a/docs/kbase.html.in b/docs/kbase.html.in index c156414c41df79c5dc1e27aad2569c4c674d5d89..db84b95b60975a1c8d24a6c0accc8700b5c1508c 100644 --- a/docs/kbase.html.in +++ b/docs/kbase.html.in @@ -29,6 +29,10 @@
Backing chain management
Explanation of how disk backing chain specification impacts libvirt's behaviour and basic troubleshooting steps of disk problems.
+ +
Security with QEMU passthrough
+
Examination of the security protections used for QEMU and how they need + configuring to allow use of QEMU passthrough with host files/devices.
diff --git a/docs/kbase/qemu-passthrough-security.rst b/docs/kbase/qemu-passthrough-security.rst new file mode 100644 index 0000000000000000000000000000000000000000..5f761cbfcb92001843eb4478220d8dd2ee589f49 --- /dev/null +++ b/docs/kbase/qemu-passthrough-security.rst @@ -0,0 +1,157 @@ +============================= +QEMU command-line passthrough +============================= + +.. contents:: + +Libvirt aims to provide explicit modelling of virtualization features in +the domain XML document schema. QEMU has a very broad range of features +and not all of these can be mapped to elements in the domain XML. Libvirt +would like to reduce the gap to QEMU, however, with finite resources there +will always be cases which aren't covered by the domain XML schema. + + +XML document additions +====================== + +To deal with the problem, libvirt introduced support for command-line +passthrough of QEMU arguments. This is achieved by supporting a custom +XML namespace, under which some QEMU driver specific elements are defined. + +The canonical place to declare the namespace is on the top level ```` +element. At the very end of the document, arbitrary command-line arguments +can now be added, using the namespace prefix ``qemu:`` + +:: + + + QEMUGuest1 + c7a5fdbd-edaf-9455-926a-d65c16db1809 + ... + + + + + + + + +Note that when an argument takes a value eg ``-newarg parameter``, the argument +and the value must be passed as separate ```` entries. + +Instead of declaring the XML namespace on the top level ```` it is also +possible to declare it at time of use, which is more convenient for humans +writing the XML documents manually. So the following example is functionally +identical: + +:: + + + QEMUGuest1 + c7a5fdbd-edaf-9455-926a-d65c16db1809 + ... + + + + + + + + +Note that when querying the XML from libvirt, it will have been translated into +the canonical syntax once more with the namespace on the top level element. + +Security confinement / sandboxing +================================= + +When libvirt launches a QEMU process it makes use of a number of security +technologies to confine QEMU and thus protect the host from malicious VM +breakouts. + +When configuring security protection, however, libvirt generally needs to know +exactly which host resources the VM is permitted to access. It gets this +information from the domain XML document. This only works for elements in the +regular schema, the arguments used with command-line passthrough are completely +opaque to libvirt. + +As a result, if command-line passthrough is used to expose a file on the host +to QEMU, the security protections will activate and either kill QEMU or deny it +access. + +There are two strategies for dealing with this problem, either figure out what +steps are needed to grant QEMU access to the device, or disable the security +protections. The former is harder, but more secure, while the latter is simple. + +Granting access per VM +---------------------- + +* SELinux - the file on the host needs an SELinux label that will grant access + to QEMU's ``svirt_t`` policy. + + - Read-only access - use the ``virt_content_t`` label + - Shared, write access - use the ``svirt_image_t:s0`` label (ie no Multi- + Category Security (MCS) value appended) + - Exclusive, write access - use the ``svirt_image_t:s0:MCS`` label for the VM. + The MCS is auto-generatd at boot time, so this may require re-configuring + the VM to have a fixed MCS label + +* Discretionary Access Control (DAC) - the file on the host needs to be + readable/writable to the ``qemu`` user or ``qemu`` group. This can be done + by changing the file ownership to ``qemu``, or relaxing the permissions to + allow world read, or adding file ACLs to allow access to ``qemu``. + +* Namespaces - a private ``mount`` namespace is used for QEMU by default + which populates a new ``/dev`` with only the device nodes needed by QEMU. + There is no way to augment the set of device nodes ahead of time. + +* Seccomp - libvirt launches QEMU with its built-in seccomp policy enabled with + ``obsolete=deny``, ``elevateprivileges=deny``, ``spawn=deny`` and + ``resourcecontrol=deny`` settings active. There is no way to change this + policy on a per VM basis. + +* Cgroups - a custom cgroup is created per VM and this will either use the + ``devices`` controller or an ``BPF`` rule to whitelist a set of device nodes. + There is no way to change this policy on a per VM basis. + +Disabling security protection per VM +------------------------------------ + +Some of the security protections can be disabled per-VM: + +* SELinux - in the domain XML the ```` model can be changed to + ``none`` instead of ``selinux``, which will make the VM run unconfined. + +* DAC - in the domain XML an ```` element with the ``dac`` model can + be added, configured with a user / group account of ``root`` to make QEMU run + with full privileges. + +* Namespaces - there is no way to disable this per VM. + +* Seccomp - there is no way to disable this per VM. + +* Cgroups - there is no way to disable this per VM. + +Disabling security protection host-wide +--------------------------------------- + +As a last resort it is possible to disable security protection host wide which +will affect all virtual machines. These settings are all made in +``/etc/libvirt/qemu.conf`` + +* SELinux - set ``security_default_confied = 0`` to make QEMU run unconfined by + default, while still allowing explicit opt-in to SELinux for VMs. + +* DAC - set ``user = root`` and ``group = root`` to make QEMU run as the root + account. + +* SELinux, DAC - set ``security_driver = []`` to entirely disable both the + SELinux and DAC security drivers. + +* Namespaces - set ``namespaces = []`` to disable use of the ``mount`` + namespaces, causing QEMU to see the normal fully popualated ``dev``. + +* Seccomp - set ``seccomp_sandbox = 0`` to disable use of the Seccomp sandboxing + in QEMU. + +* Cgroups - set ``cgroup_device_acl`` to include the desired device node, or + ``cgroup_controllers = [...]`` to exclude the ``devices`` controller.