1. 18 7月, 2013 1 次提交
    • O
      qemu: Set cpuset.cpus for domain process · a39f69d2
      Osier Yang 提交于
      When either "cpuset" of <vcpu> is specified, or the "placement" of
      <vcpu> is "auto", only setting the cpuset.mems might cause the guest
      starting to fail. E.g. ("placement" of both <vcpu> and <numatune> is
      "auto"):
      
      1) Related XMLs
        <vcpu placement='auto'>4</vcpu>
        <numatune>
          <memory mode='strict' placement='auto'/>
        </numatune>
      
      2) Host NUMA topology
        % numactl --hardware
        available: 8 nodes (0-7)
        node 0 cpus: 0 4 8 12 16 20 24 28
        node 0 size: 16374 MB
        node 0 free: 11899 MB
        node 1 cpus: 32 36 40 44 48 52 56 60
        node 1 size: 16384 MB
        node 1 free: 15318 MB
        node 2 cpus: 2 6 10 14 18 22 26 30
        node 2 size: 16384 MB
        node 2 free: 15766 MB
        node 3 cpus: 34 38 42 46 50 54 58 62
        node 3 size: 16384 MB
        node 3 free: 15347 MB
        node 4 cpus: 3 7 11 15 19 23 27 31
        node 4 size: 16384 MB
        node 4 free: 15041 MB
        node 5 cpus: 35 39 43 47 51 55 59 63
        node 5 size: 16384 MB
        node 5 free: 15202 MB
        node 6 cpus: 1 5 9 13 17 21 25 29
        node 6 size: 16384 MB
        node 6 free: 15197 MB
        node 7 cpus: 33 37 41 45 49 53 57 61
        node 7 size: 16368 MB
        node 7 free: 15669 MB
      
      4) cpuset.cpus will be set as: (from debug log)
      
      2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
      Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.cpus'
      to '0-63'
      
      5) The advisory nodeset got from querying numad (from debug log)
      
      2013-05-09 16:50:17.295+0000: 417: debug : qemuProcessStart:3614 :
      Nodeset returned from numad: 1
      
      6) cpuset.mems will be set as: (from debug log)
      
      2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
      Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.mems'
      to '0-7'
      
      I.E, the domain process's memory is restricted on the first NUMA node,
      however, it can use all of the CPUs, which will likely cause the domain
      process to fail to start because of the kernel fails to allocate
      memory with the the memory policy as "strict".
      
      % tail -n 20 /var/log/libvirt/qemu/toy.log
      ...
      2013-05-09 05:53:32.972+0000: 7318: debug : virCommandHandshakeChild:377 :
      Handshake with parent is done
      char device redirected to /dev/pts/2 (label charserial0)
      kvm_init_vcpu failed: Cannot allocate memory
      ...
      Signed-off-by: NPeter Krempa <pkrempa@redhat.com>
      a39f69d2
  2. 11 7月, 2013 1 次提交
  3. 10 7月, 2013 1 次提交
  4. 08 7月, 2013 1 次提交
  5. 26 6月, 2013 1 次提交
  6. 05 6月, 2013 1 次提交
  7. 23 5月, 2013 1 次提交
  8. 21 5月, 2013 2 次提交
  9. 20 5月, 2013 4 次提交
  10. 17 5月, 2013 1 次提交
  11. 13 5月, 2013 2 次提交
    • J
      Fix starting domains when kernel has no cgroups support · bbe97ae9
      Jim Fehlig 提交于
      Found that I was unable to start existing domains after updating
      to a kernel with no cgroups support
      
        # zgrep CGROUP /proc/config.gz
        # CONFIG_CGROUPS is not set
        # virsh start test
        error: Failed to start domain test
        error: Unable to initialize /machine cgroup: Cannot allocate memory
      
      virCgroupPartitionNeedsEscaping() correctly returns errno (ENOENT) when
      attempting to open /proc/cgroups on such a system, but it was being
      dropped in virCgroupSetPartitionSuffix().
      
      Change virCgroupSetPartitionSuffix() to propagate errors returned by
      its callees.  Also check for ENOENT in qemuInitCgroup() when determining
      if cgroups support is available.
      bbe97ae9
    • H
      qemu: Allow the scsi-generic device in cgroup · 6eb42e38
      Han Cheng 提交于
      This adds the scsi-generic device into the device controller's
      whitelist, so that it's allowed to used by the qemu process.
      Signed-off-by: NHan Cheng <hanc.fnst@cn.fujitsu.com>
      Signed-off-by: NOsier Yang <jyang@redhat.com>
      6eb42e38
  12. 04 5月, 2013 1 次提交
  13. 02 5月, 2013 1 次提交
    • M
      virutil: Move string related functions to virstring.c · 7c9a2d88
      Michal Privoznik 提交于
      The source code base needs to be adapted as well. Some files
      include virutil.h just for the string related functions (here,
      the include is substituted to match the new file), some include
      virutil.h without any need (here, the include is removed), and
      some require both.
      7c9a2d88
  14. 30 4月, 2013 2 次提交
    • L
      qemu: put usb cgroup setup in common function · 811143c0
      Laine Stump 提交于
      The USB-specific cgroup setup had been inserted inline in
      qemuDomainAttachHostUsbDevice and qemuSetupCgroup, but now there is a
      common cgroup setup function called for all hostdevs, so it makes sens
      to put the usb-specific setup there and just rely on that function
      being called.
      
      The one thing I'm uncertain of here (and a reason for not pushing
      until after release) is that previously hostdev->missing was checked
      only when starting a domain (and cgroup setup for the device skipped
      if missing was true), but with this consolidation, it is now checked
      in the case of hotplug as well. I don't know if this will have any
      practical effect (does it make sense to hotplug a "missing" usb
      device?)
      811143c0
    • L
      qemu: add vfio devices to cgroup ACL when appropriate · 6e13860c
      Laine Stump 提交于
      PCIO device assignment using VFIO requires read/write access by the
      qemu process to /dev/vfio/vfio, and /dev/vfio/nn, where "nn" is the
      VFIO group number that the assigned device belongs to (and can be
      found with the function virPCIDeviceGetVFIOGroupDev)
      
      /dev/vfio/vfio can be accessible to any guest without danger
      (according to vfio developers), so it is added to the static ACL.
      
      The group device must be dynamically added to the cgroup ACL for each
      vfio hostdev in two places:
      
      1) for any devices in the persistent config when the domain is started
         (done during qemuSetupCgroup())
      
      2) at device attach time for any hotplug devices (done in
         qemuDomainAttachHostDevice)
      
      The group device must be removed from the ACL when a device it
      "hot-unplugged" (in qemuDomainDetachHostDevice())
      
      Note that USB devices are already doing their own cgroup setup and
      teardown in the hostdev-usb specific function. I chose to make the new
      functions generic and call them in a common location though. We can
      then move the USB-specific code (which is duplicated in two locations)
      to this single location. I'll be posting a followup patch to do that.
      6e13860c
  15. 23 4月, 2013 1 次提交
  16. 22 4月, 2013 1 次提交
    • D
      Change default resource partition to /machine · aed49863
      Daniel P. Berrange 提交于
      After discussions with systemd developers it was decided that
      a better default policy for resource partitions is to have
      3 default partitions at the top level
      
         /system   - system services
         /machine - virtual machines / containers
         /user    - user login session
      
      This ensures that the default policy isolates guest from
      user login sessions & system services, so a mis-behaving
      guest can't consume 100% of CPU usage if other things are
      contending for it.
      
      Thus we change the default partition from /system to
      /machine
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      aed49863
  17. 16 4月, 2013 5 次提交
    • D
      Remove non-functional code for setting up non-root cgroups · 767596bd
      Daniel P. Berrange 提交于
      The virCgroupNewDriver method had a 'bool privileged' param.
      If a false value was ever passed in, it would simply not
      work, since non-root users don't have any privileges to create
      new cgroups. Just delete this broken code entirely and make
      the QEMU driver skip cgroup setup in non-privileged mode
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      767596bd
    • D
      Change default cgroup layout for QEMU/LXC and honour XML config · db44eb1b
      Daniel P. Berrange 提交于
      Historically QEMU/LXC guests have been placed in a cgroup layout
      that is
      
         $LOCATION-OF-LIBVIRTD/libvirt/{qemu,lxc}/$VMNAME
      
      This is bad for a number of reasons
      
       - The cgroup hierarchy gets very deep which seriously
         impacts kernel performance due to cgroups scalability
         limitations.
      
       - It is hard to setup cgroup policies which apply across
         services and virtual machines, since all VMs are underneath
         the libvirtd service.
      
      To address this the default cgroup location is changed to
      be
      
          /system/$VMNAME.{lxc,qemu}.libvirt
      
      This puts virtual machines at the same level in the hierarchy
      as system services, allowing consistent policy to be setup
      across all of them.
      
      This also honours the new resource partition location from the
      XML configuration, for example
      
        <resource>
          <partition>/virtualmachines/production</partitions>
        </resource>
      
      will result in the VM being placed at
      
          /virtualmachines/production/$VMNAME.{lxc,qemu}.libvirt
      
      NB, with the exception of the default, /system, path which
      is intended to always exist, libvirt will not attempt to
      auto-create the partitions in the XML. It is the responsibility
      of the admin/app to configure the partitions. Later libvirt
      APIs will provide a way todo this.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      db44eb1b
    • D
      Add a new virCgroupNewPartition for setting up resource partitions · aa8604dd
      Daniel P. Berrange 提交于
      A resource partition is an absolute cgroup path, ignoring the
      current process placement. Expose a virCgroupNewPartition API
      for constructing such cgroups
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      aa8604dd
    • D
      Rename virCgroupForXXX to virCgroupNewXXX · 04c18d25
      Daniel P. Berrange 提交于
      Rename all the virCgroupForXXX methods to use the form
      virCgroupNewXXX since they are all constructors. Also
      make sure the output parameter is the last one in the
      list, and annotate all pointers as non-null. Fix up
      all callers, and make sure they use true/false not 0/1
      for the boolean parameters
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      04c18d25
    • D
      Store a virCgroupPtr instance in qemuDomainObjPrivatePtr · 632f78ca
      Daniel P. Berrange 提交于
      Instead of calling virCgroupForDomain every time we need
      the virCgrouPtr instance, just do it once at Vm startup
      and cache a reference to the object in qemuDomainObjPrivatePtr
      until shutdown of the VM. Removing the virCgroupPtr from
      the QEMU driver state also means we don't have stale mount
      info, if someone mounts the cgroups filesystem after libvirtd
      has been started
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      632f78ca
  18. 13 4月, 2013 1 次提交
  19. 08 4月, 2013 1 次提交
  20. 05 4月, 2013 1 次提交
    • D
      Don't create dirs in cgroup controllers we don't want to use · 56f27b3b
      Daniel P. Berrange 提交于
      Currently when getting an instance of virCgroupPtr we will
      create the path in all cgroup controllers. Only at the virt
      driver layer are we attempting to filter controllers. This
      is bad because the mere act of creating the dirs in the
      controllers can have a functional impact on the kernel,
      particularly for performance.
      
      Update the virCgroupForDriver() method to accept a bitmask
      of controllers to use. Only create dirs in the controllers
      that are requested. When creating cgroups for domains,
      respect the active controller list from the parent cgroup
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      56f27b3b
  21. 20 3月, 2013 1 次提交
  22. 28 2月, 2013 2 次提交
    • D
      Don't try to add non-existant devices to ACL · 7f544a4c
      Daniel P. Berrange 提交于
      The QEMU driver has a list of devices nodes that are whitelisted
      for all guests. The kernel has recently started returning an
      error if you try to whitelist a device which does not exist.
      This causes a warning in libvirt logs and an audit error for
      any missing devices. eg
      
      2013-02-27 16:08:26.515+0000: 29625: warning : virDomainAuditCgroup:451 : success=no virt=kvm resrc=cgroup reason=allow vm="vm031714" uuid=9d8f1de0-44f4-a0b1-7d50-e41ee6cd897b cgroup="/sys/fs/cgroup/devices/libvirt/qemu/vm031714/" class=path path=/dev/kqemu rdev=? acl=rw
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      7f544a4c
    • D
      Avoid spamming logs with cgroups warnings · 279336c5
      Daniel P. Berrange 提交于
      The code for putting the emulator threads in a separate cgroup
      would spam the logs with warnings
      
      2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 3
      2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 4
      2013-02-27 16:08:26.732+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 6
      
      This is because it has only created child cgroups for 3 of the
      controllers, but was trying to move the processes from all the
      controllers. The fix is to only try to move threads in the
      controllers we actually created. Also remove the warning and
      make it return a hard error to avoid such lazy callers in the
      future.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      279336c5
  23. 22 2月, 2013 1 次提交
  24. 06 2月, 2013 2 次提交
  25. 05 2月, 2013 1 次提交
    • D
      Introduce a virQEMUDriverConfigPtr object · b090aa7d
      Daniel P. Berrange 提交于
      Currently the virQEMUDriverPtr struct contains an wide variety
      of data with varying access needs. Move all the static config
      data into a dedicated virQEMUDriverConfigPtr object. The only
      locking requirement is to hold the driver lock, while obtaining
      an instance of virQEMUDriverConfigPtr. Once a reference is held
      on the config object, it can be used completely lockless since
      it is immutable.
      
      NB, not all APIs correctly hold the driver lock while getting
      a reference to the config object in this patch. This is safe
      for now since the config is never updated on the fly. Later
      patches will address this fully.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      b090aa7d
  26. 10 1月, 2013 1 次提交
    • E
      maint: fix comment typo · 70345318
      Eric Blake 提交于
      While OOM can have knock-on effects that trash a system, generally
      the first symptom is one of memory thrashing.
      
      * src/qemu/qemu_cgroup.c (qemuSetupCgroup): Reword slightly.
      70345318
  27. 08 1月, 2013 1 次提交
    • M
      qemu: Relax hard RSS limit · 3c83df67
      Michal Privoznik 提交于
      Currently, if there's no hard memory limit defined for a domain,
      libvirt tries to calculate one, based on domain definition and magic
      equation and set it upon the domain startup. The rationale behind was,
      if there's a memory leak or exploit in qemu, we should prevent the
      host system trashing. However, the equation was too tightening, as it
      didn't reflect what the kernel counts into the memory used by a
      process. Since many hosts do have a swap, nobody hasn't noticed
      anything, because if hard memory limit is reached, process can
      continue allocating memory on a swap. However, if there is no swap on
      the host, the process gets killed by OOM killer. In our case, the qemu
      process it is.
      
      To prevent this, we need to relax the hard RSS limit. Moreover, we
      should reflect more precisely the kernel way of accounting the memory
      for process. That is, even the kernel caches are counted within the
      memory used by a process (within cgroups at least). Hence the magic
      equation has to be changed:
      
        limit = 1.5 * (domain memory + total video memory) + (32MB for cache
                per each disk) + 200MB
      3c83df67
  28. 21 12月, 2012 1 次提交