提交 · a39f69d2bb5494d661be917956baa437d01a4d13 · openeuler / libvirt

18 7月, 2013 1 次提交

qemu: Set cpuset.cpus for domain process · a39f69d2

由 Osier Yang 提交于 5月 24, 2013

When either "cpuset" of <vcpu> is specified, or the "placement" of
<vcpu> is "auto", only setting the cpuset.mems might cause the guest
starting to fail. E.g. ("placement" of both <vcpu> and <numatune> is
"auto"):

1) Related XMLs
  <vcpu placement='auto'>4</vcpu>
  <numatune>
    <memory mode='strict' placement='auto'/>
  </numatune>

2) Host NUMA topology
  % numactl --hardware
  available: 8 nodes (0-7)
  node 0 cpus: 0 4 8 12 16 20 24 28
  node 0 size: 16374 MB
  node 0 free: 11899 MB
  node 1 cpus: 32 36 40 44 48 52 56 60
  node 1 size: 16384 MB
  node 1 free: 15318 MB
  node 2 cpus: 2 6 10 14 18 22 26 30
  node 2 size: 16384 MB
  node 2 free: 15766 MB
  node 3 cpus: 34 38 42 46 50 54 58 62
  node 3 size: 16384 MB
  node 3 free: 15347 MB
  node 4 cpus: 3 7 11 15 19 23 27 31
  node 4 size: 16384 MB
  node 4 free: 15041 MB
  node 5 cpus: 35 39 43 47 51 55 59 63
  node 5 size: 16384 MB
  node 5 free: 15202 MB
  node 6 cpus: 1 5 9 13 17 21 25 29
  node 6 size: 16384 MB
  node 6 free: 15197 MB
  node 7 cpus: 33 37 41 45 49 53 57 61
  node 7 size: 16368 MB
  node 7 free: 15669 MB

4) cpuset.cpus will be set as: (from debug log)

2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.cpus'
to '0-63'

5) The advisory nodeset got from querying numad (from debug log)

2013-05-09 16:50:17.295+0000: 417: debug : qemuProcessStart:3614 :
Nodeset returned from numad: 1

6) cpuset.mems will be set as: (from debug log)

2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.mems'
to '0-7'

I.E, the domain process's memory is restricted on the first NUMA node,
however, it can use all of the CPUs, which will likely cause the domain
process to fail to start because of the kernel fails to allocate
memory with the the memory policy as "strict".

% tail -n 20 /var/log/libvirt/qemu/toy.log
...
2013-05-09 05:53:32.972+0000: 7318: debug : virCommandHandshakeChild:377 :
Handshake with parent is done
char device redirected to /dev/pts/2 (label charserial0)
kvm_init_vcpu failed: Cannot allocate memory
...
Signed-off-by: NPeter Krempa <pkrempa@redhat.com>

a39f69d2

11 7月, 2013 1 次提交

Convert 'int i' to 'size_t i' in src/qemu files · 50760e2a

由 Daniel P. Berrange 提交于 7月 08, 2013

Convert the type of loop iterators named 'i', 'j', k',
'ii', 'jj', 'kk', to be 'size_t' instead of 'int' or
'unsigned int', also santizing 'ii', 'jj', 'kk' to use
the normal 'i', 'j', 'k' naming
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

50760e2a

10 7月, 2013 1 次提交
- M
  
  Adapt to VIR_ALLOC and virAsprintf in src/qemu/* · e987a30d
  由 Michal Privoznik 提交于 7月 04, 2013
  
  e987a30d
08 7月, 2013 1 次提交
- J
  
  qemu: Move memory limit computation to a reusable function · e0e438af
  由 Jiri Denemark 提交于 6月 28, 2013
  
  e0e438af
26 6月, 2013 1 次提交

pci: rename virPCIDeviceGetVFIOGroupDev to virPCIDeviceGetIOMMUGroupDev · 1d829e13

由 Laine Stump 提交于 6月 14, 2013

I realized after the fact that it's probably better in the long run to
give this function a name that matches the name of the link used in
sysfs to hold the group (iommu_group).

I'm changing it now because I'm about to add several more functions
that deal with iommu groups.

1d829e13

05 6月, 2013 1 次提交
- O
  
  qemu: Abstract code for the cpu controller setting into a helper · 8da9516a
  由 Osier Yang 提交于 5月 24, 2013
  
  8da9516a
23 5月, 2013 1 次提交
- M
  
  Adapt to VIR_STRDUP and VIR_STRNDUP in src/qemu/* · a88fb300
  由 Michal Privoznik 提交于 5月 20, 2013
  
  a88fb300
21 5月, 2013 2 次提交
- O
  
  src/qemu: Remove the whitespace before ';' · 66194f71
  由 Osier Yang 提交于 5月 21, 2013
  
  66194f71
- O
  qemu: Don't remove the "return 0" · 58f8e0cd
  由 Osier Yang 提交于 5月 21, 2013
```
Commit f60a50c7 intended to remove the warning only, but not with
the "return 0" together.
```
  58f8e0cd
20 5月, 2013 4 次提交
- O
  
  qemu: Abstract code for cpuset controller setting into a helper · 479d5991
  由 Osier Yang 提交于 5月 17, 2013
  
  479d5991
- O
  
  qemu: Abstract code for devices controller setting into a helper · 9f2455d3
  由 Osier Yang 提交于 5月 17, 2013
  
  9f2455d3
- O
  
  qemu: Abstract code for memory controller setting into a helper · f60a50c7
  由 Osier Yang 提交于 5月 20, 2013
  
  f60a50c7
- O
  
  qemu: Abstract the code for blkio controller setting into a helper · 2fd16df7
  由 Osier Yang 提交于 5月 17, 2013
  
  2fd16df7
17 5月, 2013 1 次提交

Fix failure to detect missing cgroup partitions · c2cf5f1c

由 Daniel P. Berrange 提交于 5月 16, 2013

Change bbe97ae9 caused the
QEMU driver to ignore ENOENT errors from cgroups, in order
to cope with missing /proc/cgroups. This is not good though
because many other things can cause ENOENT and should not
be ignored. The callers expect to see ENXIO when cgroups
are not present, so adjust the code to report that errno
when /proc/cgroups is missing
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

c2cf5f1c

13 5月, 2013 2 次提交

Fix starting domains when kernel has no cgroups support · bbe97ae9

由 Jim Fehlig 提交于 5月 10, 2013

Found that I was unable to start existing domains after updating
to a kernel with no cgroups support

  # zgrep CGROUP /proc/config.gz
  # CONFIG_CGROUPS is not set
  # virsh start test
  error: Failed to start domain test
  error: Unable to initialize /machine cgroup: Cannot allocate memory

virCgroupPartitionNeedsEscaping() correctly returns errno (ENOENT) when
attempting to open /proc/cgroups on such a system, but it was being
dropped in virCgroupSetPartitionSuffix().

Change virCgroupSetPartitionSuffix() to propagate errors returned by
its callees.  Also check for ENOENT in qemuInitCgroup() when determining
if cgroups support is available.

bbe97ae9

qemu: Allow the scsi-generic device in cgroup · 6eb42e38

由 Han Cheng 提交于 5月 04, 2013

This adds the scsi-generic device into the device controller's
whitelist, so that it's allowed to used by the qemu process.
Signed-off-by: NHan Cheng <hanc.fnst@cn.fujitsu.com>
Signed-off-by: NOsier Yang <jyang@redhat.com>

6eb42e38

04 5月, 2013 1 次提交

qemu: fix stupid typos in VFIO cgroup setup/teardown · 52ba0f6e

由 Laine Stump 提交于 5月 03, 2013

I must have looked at this a couple dozen times before I noticed it
had "!=" instead of "==". Not doing this setup prevented qemu from
doing anything with the vfio group device.

52ba0f6e

02 5月, 2013 1 次提交

virutil: Move string related functions to virstring.c · 7c9a2d88

由 Michal Privoznik 提交于 4月 03, 2013

The source code base needs to be adapted as well. Some files
include virutil.h just for the string related functions (here,
the include is substituted to match the new file), some include
virutil.h without any need (here, the include is removed), and
some require both.

7c9a2d88

30 4月, 2013 2 次提交

qemu: put usb cgroup setup in common function · 811143c0

由 Laine Stump 提交于 4月 29, 2013

The USB-specific cgroup setup had been inserted inline in
qemuDomainAttachHostUsbDevice and qemuSetupCgroup, but now there is a
common cgroup setup function called for all hostdevs, so it makes sens
to put the usb-specific setup there and just rely on that function
being called.

The one thing I'm uncertain of here (and a reason for not pushing
until after release) is that previously hostdev->missing was checked
only when starting a domain (and cgroup setup for the device skipped
if missing was true), but with this consolidation, it is now checked
in the case of hotplug as well. I don't know if this will have any
practical effect (does it make sense to hotplug a "missing" usb
device?)

811143c0

qemu: add vfio devices to cgroup ACL when appropriate · 6e13860c

由 Laine Stump 提交于 4月 29, 2013

PCIO device assignment using VFIO requires read/write access by the
qemu process to /dev/vfio/vfio, and /dev/vfio/nn, where "nn" is the
VFIO group number that the assigned device belongs to (and can be
found with the function virPCIDeviceGetVFIOGroupDev)

/dev/vfio/vfio can be accessible to any guest without danger
(according to vfio developers), so it is added to the static ACL.

The group device must be dynamically added to the cgroup ACL for each
vfio hostdev in two places:

1) for any devices in the persistent config when the domain is started
   (done during qemuSetupCgroup())

2) at device attach time for any hotplug devices (done in
   qemuDomainAttachHostDevice)

The group device must be removed from the ACL when a device it
"hot-unplugged" (in qemuDomainDetachHostDevice())

Note that USB devices are already doing their own cgroup setup and
teardown in the hostdev-usb specific function. I chose to make the new
functions generic and call them in a common location though. We can
then move the USB-specific code (which is duplicated in two locations)
to this single location. I'll be posting a followup patch to do that.

6e13860c

23 4月, 2013 1 次提交

Replace more cases of /system with /machine · 1e05073f

由 Daniel P. Berrange 提交于 4月 22, 2013

The change in commit aed49863
was incomplete, missing a couple of cases of /system. This
caused failure to start VMs.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

1e05073f

22 4月, 2013 1 次提交

Change default resource partition to /machine · aed49863

由 Daniel P. Berrange 提交于 4月 18, 2013

After discussions with systemd developers it was decided that
a better default policy for resource partitions is to have
3 default partitions at the top level

   /system   - system services
   /machine - virtual machines / containers
   /user    - user login session

This ensures that the default policy isolates guest from
user login sessions & system services, so a mis-behaving
guest can't consume 100% of CPU usage if other things are
contending for it.

Thus we change the default partition from /system to
/machine
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

aed49863

16 4月, 2013 5 次提交

Remove non-functional code for setting up non-root cgroups · 767596bd

由 Daniel P. Berrange 提交于 4月 04, 2013

The virCgroupNewDriver method had a 'bool privileged' param.
If a false value was ever passed in, it would simply not
work, since non-root users don't have any privileges to create
new cgroups. Just delete this broken code entirely and make
the QEMU driver skip cgroup setup in non-privileged mode
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

767596bd

Change default cgroup layout for QEMU/LXC and honour XML config · db44eb1b

由 Daniel P. Berrange 提交于 4月 03, 2013

Historically QEMU/LXC guests have been placed in a cgroup layout
that is

   $LOCATION-OF-LIBVIRTD/libvirt/{qemu,lxc}/$VMNAME

This is bad for a number of reasons

 - The cgroup hierarchy gets very deep which seriously
   impacts kernel performance due to cgroups scalability
   limitations.

 - It is hard to setup cgroup policies which apply across
   services and virtual machines, since all VMs are underneath
   the libvirtd service.

To address this the default cgroup location is changed to
be

    /system/$VMNAME.{lxc,qemu}.libvirt

This puts virtual machines at the same level in the hierarchy
as system services, allowing consistent policy to be setup
across all of them.

This also honours the new resource partition location from the
XML configuration, for example

  <resource>
    <partition>/virtualmachines/production</partitions>
  </resource>

will result in the VM being placed at

    /virtualmachines/production/$VMNAME.{lxc,qemu}.libvirt

NB, with the exception of the default, /system, path which
is intended to always exist, libvirt will not attempt to
auto-create the partitions in the XML. It is the responsibility
of the admin/app to configure the partitions. Later libvirt
APIs will provide a way todo this.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

db44eb1b

Add a new virCgroupNewPartition for setting up resource partitions · aa8604dd

由 Daniel P. Berrange 提交于 3月 28, 2013

A resource partition is an absolute cgroup path, ignoring the
current process placement. Expose a virCgroupNewPartition API
for constructing such cgroups
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

aa8604dd

Rename virCgroupForXXX to virCgroupNewXXX · 04c18d25

由 Daniel P. Berrange 提交于 3月 28, 2013

Rename all the virCgroupForXXX methods to use the form
virCgroupNewXXX since they are all constructors. Also
make sure the output parameter is the last one in the
list, and annotate all pointers as non-null. Fix up
all callers, and make sure they use true/false not 0/1
for the boolean parameters
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

04c18d25

Store a virCgroupPtr instance in qemuDomainObjPrivatePtr · 632f78ca

由 Daniel P. Berrange 提交于 3月 21, 2013

Instead of calling virCgroupForDomain every time we need
the virCgrouPtr instance, just do it once at Vm startup
and cache a reference to the object in qemuDomainObjPrivatePtr
until shutdown of the VM. Removing the virCgroupPtr from
the QEMU driver state also means we don't have stale mount
info, if someone mounts the cgroups filesystem after libvirtd
has been started
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

632f78ca

13 4月, 2013 1 次提交

QEMU Cgroup support for TPM passthrough · 22feb0d3

由 Stefan Berger 提交于 4月 12, 2013

Some refactoring for virDomainChrSourceDef type of devices so
we can use common code.
Signed-off-by: NStefan Berger <stefanb@linux.vnet.ibm.com>
Reviewed-by: NCorey Bryant <coreyb@linux.vnet.ibm.com>
Tested-by: NCorey Bryant <coreyb@linux.vnet.ibm.com>

22feb0d3

08 4月, 2013 1 次提交

Rename virCgroupMounted to virCgroupHasController & make it more robust · dca927c8

由 Daniel P. Berrange 提交于 3月 21, 2013

The virCgroupMounted method is badly named, since a controller can be
mounted, but disabled in the current object. Rename the method to be
virCgroupHasController. Also make it tolerant to a  NULL virCgroupPtr
and out-of-range controller index, to avoid duplication of these
checks in all callers
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

dca927c8

05 4月, 2013 1 次提交

Don't create dirs in cgroup controllers we don't want to use · 56f27b3b

由 Daniel P. Berrange 提交于 3月 21, 2013

Currently when getting an instance of virCgroupPtr we will
create the path in all cgroup controllers. Only at the virt
driver layer are we attempting to filter controllers. This
is bad because the mere act of creating the dirs in the
controllers can have a functional impact on the kernel,
particularly for performance.

Update the virCgroupForDriver() method to accept a bitmask
of controllers to use. Only create dirs in the controllers
that are requested. When creating cgroups for domains,
respect the active controller list from the parent cgroup
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

56f27b3b

20 3月, 2013 1 次提交

NUMA: cleanup for numa related codes · 45e9d27a

由 Gao feng 提交于 3月 20, 2013

Intend to reduce the redundant code,use virNumaSetupMemoryPolicy
to replace virLXCControllerSetupNUMAPolicy and
qemuProcessInitNumaMemoryPolicy.

This patch also moves the numa related codes to the
file virnuma.c and virnuma.h
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>

45e9d27a

28 2月, 2013 2 次提交

Don't try to add non-existant devices to ACL · 7f544a4c

由 Daniel P. Berrange 提交于 2月 27, 2013

The QEMU driver has a list of devices nodes that are whitelisted
for all guests. The kernel has recently started returning an
error if you try to whitelist a device which does not exist.
This causes a warning in libvirt logs and an audit error for
any missing devices. eg

2013-02-27 16:08:26.515+0000: 29625: warning : virDomainAuditCgroup:451 : success=no virt=kvm resrc=cgroup reason=allow vm="vm031714" uuid=9d8f1de0-44f4-a0b1-7d50-e41ee6cd897b cgroup="/sys/fs/cgroup/devices/libvirt/qemu/vm031714/" class=path path=/dev/kqemu rdev=? acl=rw
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

7f544a4c

Avoid spamming logs with cgroups warnings · 279336c5

由 Daniel P. Berrange 提交于 2月 27, 2013

The code for putting the emulator threads in a separate cgroup
would spam the logs with warnings

2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 3
2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 4
2013-02-27 16:08:26.732+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 6

This is because it has only created child cgroups for 3 of the
controllers, but was trying to move the processes from all the
controllers. The fix is to only try to move threads in the
controllers we actually created. Also remove the warning and
make it return a hard error to avoid such lazy callers in the
future.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

279336c5

22 2月, 2013 1 次提交

qemu: check backing chains even when cgroup is omitted · 82d5fe54

由 Eric Blake 提交于 2月 20, 2013

https://bugzilla.redhat.com/show_bug.cgi?id=896685 points out
a regression caused by commit 38c4a9cc - libvirt only labels
the backing chain if the backing chain cache is populated, but
the code to populate the cache was only conditionally performed
if cgroup labeling was necessary.

* src/qemu/qemu_cgroup.c (qemuSetupCgroup): Hoist cache setup...
* src/qemu/qemu_process.c (qemuProcessStart): ...earlier into
caller, where it is now unconditional.

82d5fe54

06 2月, 2013 2 次提交

D
Rename all USB device functions to have a standard name prefix · 77c3015f
由 Daniel P. Berrange 提交于 1月 14, 2013
```
Rename all the usbDeviceXXX and usbXXXDevice APIs to have a
fixed virUSBDevice name prefix
```
77c3015f

Fix leak of usbDevice struct when initializing cgroups · 3e86e8f3

由 Daniel P. Berrange 提交于 2月 05, 2013

When iterating over USB host devices to setup cgroups, the
usbDevice object was leaked in both LXC and QEMU driers
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

3e86e8f3

05 2月, 2013 1 次提交

Introduce a virQEMUDriverConfigPtr object · b090aa7d

由 Daniel P. Berrange 提交于 1月 10, 2013

Currently the virQEMUDriverPtr struct contains an wide variety
of data with varying access needs. Move all the static config
data into a dedicated virQEMUDriverConfigPtr object. The only
locking requirement is to hold the driver lock, while obtaining
an instance of virQEMUDriverConfigPtr. Once a reference is held
on the config object, it can be used completely lockless since
it is immutable.

NB, not all APIs correctly hold the driver lock while getting
a reference to the config object in this patch. This is safe
for now since the config is never updated on the fly. Later
patches will address this fully.
Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>

b090aa7d

10 1月, 2013 1 次提交

maint: fix comment typo · 70345318

由 Eric Blake 提交于 1月 09, 2013

While OOM can have knock-on effects that trash a system, generally
the first symptom is one of memory thrashing.

* src/qemu/qemu_cgroup.c (qemuSetupCgroup): Reword slightly.

70345318

08 1月, 2013 1 次提交

qemu: Relax hard RSS limit · 3c83df67

由 Michal Privoznik 提交于 1月 08, 2013

Currently, if there's no hard memory limit defined for a domain,
libvirt tries to calculate one, based on domain definition and magic
equation and set it upon the domain startup. The rationale behind was,
if there's a memory leak or exploit in qemu, we should prevent the
host system trashing. However, the equation was too tightening, as it
didn't reflect what the kernel counts into the memory used by a
process. Since many hosts do have a swap, nobody hasn't noticed
anything, because if hard memory limit is reached, process can
continue allocating memory on a swap. However, if there is no swap on
the host, the process gets killed by OOM killer. In our case, the qemu
process it is.

To prevent this, we need to relax the hard RSS limit. Moreover, we
should reflect more precisely the kernel way of accounting the memory
for process. That is, even the kernel caches are counted within the
memory used by a process (within cgroups at least). Hence the magic
equation has to be changed:

  limit = 1.5 * (domain memory + total video memory) + (32MB for cache
          per each disk) + 200MB

3c83df67

21 12月, 2012 1 次提交
- D
  
  Rename virterror.c virterror_internal.h to virerror.{c,h} · f24404a3
  由 Daniel P. Berrange 提交于 12月 13, 2012
  
  f24404a3