1. 22 4月, 2013 1 次提交
    • D
      Change default resource partition to /machine · aed49863
      Daniel P. Berrange 提交于
      After discussions with systemd developers it was decided that
      a better default policy for resource partitions is to have
      3 default partitions at the top level
      
         /system   - system services
         /machine - virtual machines / containers
         /user    - user login session
      
      This ensures that the default policy isolates guest from
      user login sessions & system services, so a mis-behaving
      guest can't consume 100% of CPU usage if other things are
      contending for it.
      
      Thus we change the default partition from /system to
      /machine
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      aed49863
  2. 16 4月, 2013 5 次提交
    • D
      Remove non-functional code for setting up non-root cgroups · 767596bd
      Daniel P. Berrange 提交于
      The virCgroupNewDriver method had a 'bool privileged' param.
      If a false value was ever passed in, it would simply not
      work, since non-root users don't have any privileges to create
      new cgroups. Just delete this broken code entirely and make
      the QEMU driver skip cgroup setup in non-privileged mode
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      767596bd
    • D
      Change default cgroup layout for QEMU/LXC and honour XML config · db44eb1b
      Daniel P. Berrange 提交于
      Historically QEMU/LXC guests have been placed in a cgroup layout
      that is
      
         $LOCATION-OF-LIBVIRTD/libvirt/{qemu,lxc}/$VMNAME
      
      This is bad for a number of reasons
      
       - The cgroup hierarchy gets very deep which seriously
         impacts kernel performance due to cgroups scalability
         limitations.
      
       - It is hard to setup cgroup policies which apply across
         services and virtual machines, since all VMs are underneath
         the libvirtd service.
      
      To address this the default cgroup location is changed to
      be
      
          /system/$VMNAME.{lxc,qemu}.libvirt
      
      This puts virtual machines at the same level in the hierarchy
      as system services, allowing consistent policy to be setup
      across all of them.
      
      This also honours the new resource partition location from the
      XML configuration, for example
      
        <resource>
          <partition>/virtualmachines/production</partitions>
        </resource>
      
      will result in the VM being placed at
      
          /virtualmachines/production/$VMNAME.{lxc,qemu}.libvirt
      
      NB, with the exception of the default, /system, path which
      is intended to always exist, libvirt will not attempt to
      auto-create the partitions in the XML. It is the responsibility
      of the admin/app to configure the partitions. Later libvirt
      APIs will provide a way todo this.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      db44eb1b
    • D
      Add a new virCgroupNewPartition for setting up resource partitions · aa8604dd
      Daniel P. Berrange 提交于
      A resource partition is an absolute cgroup path, ignoring the
      current process placement. Expose a virCgroupNewPartition API
      for constructing such cgroups
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      aa8604dd
    • D
      Rename virCgroupForXXX to virCgroupNewXXX · 04c18d25
      Daniel P. Berrange 提交于
      Rename all the virCgroupForXXX methods to use the form
      virCgroupNewXXX since they are all constructors. Also
      make sure the output parameter is the last one in the
      list, and annotate all pointers as non-null. Fix up
      all callers, and make sure they use true/false not 0/1
      for the boolean parameters
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      04c18d25
    • D
      Store a virCgroupPtr instance in qemuDomainObjPrivatePtr · 632f78ca
      Daniel P. Berrange 提交于
      Instead of calling virCgroupForDomain every time we need
      the virCgrouPtr instance, just do it once at Vm startup
      and cache a reference to the object in qemuDomainObjPrivatePtr
      until shutdown of the VM. Removing the virCgroupPtr from
      the QEMU driver state also means we don't have stale mount
      info, if someone mounts the cgroups filesystem after libvirtd
      has been started
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      632f78ca
  3. 13 4月, 2013 1 次提交
  4. 08 4月, 2013 1 次提交
  5. 05 4月, 2013 1 次提交
    • D
      Don't create dirs in cgroup controllers we don't want to use · 56f27b3b
      Daniel P. Berrange 提交于
      Currently when getting an instance of virCgroupPtr we will
      create the path in all cgroup controllers. Only at the virt
      driver layer are we attempting to filter controllers. This
      is bad because the mere act of creating the dirs in the
      controllers can have a functional impact on the kernel,
      particularly for performance.
      
      Update the virCgroupForDriver() method to accept a bitmask
      of controllers to use. Only create dirs in the controllers
      that are requested. When creating cgroups for domains,
      respect the active controller list from the parent cgroup
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      56f27b3b
  6. 20 3月, 2013 1 次提交
  7. 28 2月, 2013 2 次提交
    • D
      Don't try to add non-existant devices to ACL · 7f544a4c
      Daniel P. Berrange 提交于
      The QEMU driver has a list of devices nodes that are whitelisted
      for all guests. The kernel has recently started returning an
      error if you try to whitelist a device which does not exist.
      This causes a warning in libvirt logs and an audit error for
      any missing devices. eg
      
      2013-02-27 16:08:26.515+0000: 29625: warning : virDomainAuditCgroup:451 : success=no virt=kvm resrc=cgroup reason=allow vm="vm031714" uuid=9d8f1de0-44f4-a0b1-7d50-e41ee6cd897b cgroup="/sys/fs/cgroup/devices/libvirt/qemu/vm031714/" class=path path=/dev/kqemu rdev=? acl=rw
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      7f544a4c
    • D
      Avoid spamming logs with cgroups warnings · 279336c5
      Daniel P. Berrange 提交于
      The code for putting the emulator threads in a separate cgroup
      would spam the logs with warnings
      
      2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 3
      2013-02-27 16:08:26.731+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 4
      2013-02-27 16:08:26.732+0000: 29624: warning : virCgroupMoveTask:887 : no vm cgroup in controller 6
      
      This is because it has only created child cgroups for 3 of the
      controllers, but was trying to move the processes from all the
      controllers. The fix is to only try to move threads in the
      controllers we actually created. Also remove the warning and
      make it return a hard error to avoid such lazy callers in the
      future.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      279336c5
  8. 22 2月, 2013 1 次提交
  9. 06 2月, 2013 2 次提交
  10. 05 2月, 2013 1 次提交
    • D
      Introduce a virQEMUDriverConfigPtr object · b090aa7d
      Daniel P. Berrange 提交于
      Currently the virQEMUDriverPtr struct contains an wide variety
      of data with varying access needs. Move all the static config
      data into a dedicated virQEMUDriverConfigPtr object. The only
      locking requirement is to hold the driver lock, while obtaining
      an instance of virQEMUDriverConfigPtr. Once a reference is held
      on the config object, it can be used completely lockless since
      it is immutable.
      
      NB, not all APIs correctly hold the driver lock while getting
      a reference to the config object in this patch. This is safe
      for now since the config is never updated on the fly. Later
      patches will address this fully.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      b090aa7d
  11. 10 1月, 2013 1 次提交
    • E
      maint: fix comment typo · 70345318
      Eric Blake 提交于
      While OOM can have knock-on effects that trash a system, generally
      the first symptom is one of memory thrashing.
      
      * src/qemu/qemu_cgroup.c (qemuSetupCgroup): Reword slightly.
      70345318
  12. 08 1月, 2013 1 次提交
    • M
      qemu: Relax hard RSS limit · 3c83df67
      Michal Privoznik 提交于
      Currently, if there's no hard memory limit defined for a domain,
      libvirt tries to calculate one, based on domain definition and magic
      equation and set it upon the domain startup. The rationale behind was,
      if there's a memory leak or exploit in qemu, we should prevent the
      host system trashing. However, the equation was too tightening, as it
      didn't reflect what the kernel counts into the memory used by a
      process. Since many hosts do have a swap, nobody hasn't noticed
      anything, because if hard memory limit is reached, process can
      continue allocating memory on a swap. However, if there is no swap on
      the host, the process gets killed by OOM killer. In our case, the qemu
      process it is.
      
      To prevent this, we need to relax the hard RSS limit. Moreover, we
      should reflect more precisely the kernel way of accounting the memory
      for process. That is, even the kernel caches are counted within the
      memory used by a process (within cgroups at least). Hence the magic
      equation has to be changed:
      
        limit = 1.5 * (domain memory + total video memory) + (32MB for cache
                per each disk) + 200MB
      3c83df67
  13. 21 12月, 2012 5 次提交
  14. 18 12月, 2012 1 次提交
  15. 29 11月, 2012 1 次提交
  16. 02 11月, 2012 1 次提交
  17. 24 10月, 2012 1 次提交
    • O
      qemu: Keep the affinity when creating cgroup for emulator thread · bb81021b
      Osier Yang 提交于
      When the cpu placement model is "auto", it sets the affinity for
      domain process with the advisory nodeset from numad, however,
      creating cgroup for the domain process (called emulator thread
      in some contexts) later overrides that with pinning it to all
      available pCPUs.
      
      How to reproduce:
      
        * Configure the domain with "auto" placement for <vcpu>, e.g.
          <vcpu placement='auto'>4</vcpu>
        * % virsh start dom
        * % cat /proc/$dompid/status
      
      Though the emulator cgroup cause conflicts, but we can't simply
      prohibit creating it, as other tunables are still useful, such
      as "emulator_period", which is used by API
      virDomainSetSchedulerParameter. So this patch doesn't prohibit
      creating the emulator cgroup, but inherit the nodeset from numad,
      and reset the affinity for domain process.
      
      * src/qemu/qemu_cgroup.h: Modify definition of qemuSetupCgroupForEmulator
                                to accept the passed nodenet
      * src/qemu/qemu_cgroup.c: Set the affinity with the passed nodeset
      bb81021b
  18. 20 10月, 2012 2 次提交
    • E
      blockjob: remove unused parameters after previous patch · 67aea3fb
      Eric Blake 提交于
      Minor cleanup made possible by previous simplifications.
      
      * src/qemu/qemu_cgroup.h (qemuSetupDiskCgroup)
      (qemuTeardownDiskCgroup): Alter signature.
      * src/qemu/qemu_cgroup.c (qemuSetupDiskCgroup)
      (qemuTeardownDiskCgroup, qemuSetupCgroup): Update all uses.
      * src/qemu/qemu_hotplug.c (qemuDomainDetachPciDiskDevice)
      (qemuDomainDetachDiskDevice): Likewise.
      * src/qemu/qemu_driver.c (qemuDomainAttachDeviceDiskLive)
      (qemuDomainChangeDiskMediaLive)
      (qemuDomainSnapshotCreateSingleDiskActive)
      (qemuDomainSnapshotUndoSingleDiskActive): Likewise.
      67aea3fb
    • E
      storage: use cache to walk backing chain · 38c4a9cc
      Eric Blake 提交于
      We used to walk the backing file chain at least twice per disk,
      once to set up cgroup device whitelisting, and once to set up
      security labeling.  Rather than walk the chain every iteration,
      which possibly includes calls to fork() in order to open root-squashed
      NFS files, we can exploit the cache of the previous patch.
      
      * src/conf/domain_conf.h (virDomainDiskDefForeachPath): Alter
      signature.
      * src/conf/domain_conf.c (virDomainDiskDefForeachPath): Require caller
      to supply backing chain via disk, if recursion is desired.
      * src/security/security_dac.c
      (virSecurityDACSetSecurityImageLabel): Adjust caller.
      * src/security/security_selinux.c
      (virSecuritySELinuxSetSecurityImageLabel): Likewise.
      * src/security/virt-aa-helper.c (get_files): Likewise.
      * src/qemu/qemu_cgroup.c (qemuSetupDiskCgroup)
      (qemuTeardownDiskCgroup): Likewise.
      (qemuSetupCgroup): Pre-populate chain.
      38c4a9cc
  19. 17 10月, 2012 1 次提交
    • M
      qemu: Pin the emulator when only cpuset is specified · ba63d8f7
      Martin Kletzander 提交于
      According to our recent changes (clarifications), we should be pinning
      qemu's emulator processes using the <vcpu> 'cpuset' attribute in case
      there is no <emulatorpin> specified.  This however doesn't work
      entirely as expected and this patch should resolve all the remaining
      issues.
      ba63d8f7
  20. 11 10月, 2012 1 次提交
  21. 21 9月, 2012 1 次提交
  22. 18 9月, 2012 2 次提交
  23. 12 9月, 2012 1 次提交
  24. 06 9月, 2012 1 次提交
    • M
      qemu: don't pin all the cpus · 9f86fb93
      Martin Kletzander 提交于
      This is another fix for the emulator-pin series. When going through
      the cputune pinning settings, the current code is trying to pin all
      the CPUs, even when not all of them are specified. This causes error
      in the subsequent function which, of course, cannot find the cpu to
      pin. Since it's enough to pass the correct VCPU ID to the function,
      the fix is trivial.
      9f86fb93
  25. 31 8月, 2012 1 次提交
    • J
      qemu: Don't ignore CPU tuning config if required cgroups are missing · 774eb45b
      Jiri Denemark 提交于
      When domain XML contains any of the elements for setting up CPU
      scheduling parameters (period, quota, emulator_period, or
      emulator_quota) we need cpu cgroup to enforce the configuration.
      However, the existing code would just ignore silently such settings if
      either cgroups were not available at all cpu cgroup was not available.
      Moreover, APIs for manipulating CPU scheduler parameters were already
      failing if cpu cgroup was not available. This patch makes cpu cgroup
      mandatory for all domains that use CPU scheduling elements in their XML.
      774eb45b
  26. 29 8月, 2012 1 次提交
    • J
      qemu: Fix starting domains with no cpu cgroup · 0c7cca36
      Jiri Denemark 提交于
      If cgroups are enabled in general but cpu cgroup is disabled in
      qemu.conf or not mounted at all, libvirt would refuse to start any
      domain even though scheduler parameters are not set in domain XML.
      
      This patch makes cpu cgroup mandatory only for domains that actually
      want to use it.
      0c7cca36
  27. 27 8月, 2012 1 次提交
    • M
      qemu: fix regression with pinning · 16ebec2b
      Martin Kletzander 提交于
      Commit 4b03d591 changed the pinning
      behavior in a way that makes some machines non-startable.
      
      The comment mentioning that we cannot control each vcpu when there is
      not VCPU<-> PID mapping available is true, however, this isn't
      necessarily an error, because this can be caused by old QEMU without
      support for "query-cpus" command as well as a software emulated
      machines that don't create more than one process.
      16ebec2b
  28. 22 8月, 2012 1 次提交