1. 22 5月, 2012 4 次提交
    • E
      virBuffer: add way to trim back extra text · cdb87b1c
      Eric Blake 提交于
      I'm tired of writing:
      
      bool sep = false;
      while (...) {
          if (sep)
             virBufferAddChar(buf, ',');
          sep = true;
          virBufferAdd(buf, str);
      }
      
      This makes it easier, allowing one to write:
      
      while (...)
          virBufferAsprintf(buf, "%s,", str);
      virBufferTrim(buf, ",", -1);
      
      to trim any remaining comma.
      
      * src/util/buf.h (virBufferTrim): Declare.
      * src/util/buf.c (virBufferTrim): New function.
      * tests/virbuftest.c (testBufTrim): Test it.
      cdb87b1c
    • W
      storage backend: Add RBD (RADOS Block Device) support · 74951ead
      Wido den Hollander 提交于
      This patch adds support for a new storage backend with RBD support.
      
      RBD is the RADOS Block Device and is part of the Ceph distributed storage
      system.
      
      It comes in two flavours: Qemu-RBD and Kernel RBD, this storage backend only
      supports Qemu-RBD, thus limiting the use of this storage driver to Qemu only.
      
      To function this backend relies on librbd and librados being present on the
      local system.
      
      The backend also supports Cephx authentication for safe authentication with
      the Ceph cluster.
      
      For storing credentials it uses the built-in secret mechanism of libvirt.
      Signed-off-by: NWido den Hollander <wido@widodh.nl>
      74951ead
    • E
      build: fix unused variable after last patch · b8e6021e
      Eric Blake 提交于
      The previous commit (2cb0899e) left a dead variable behind.
      
      * src/libxl/libxl_driver.c (libxlClose): Drop dead variable.
      b8e6021e
    • D
      Fix potential events deadlock when unref'ing virConnectPtr · 2cb0899e
      Daniel P. Berrange 提交于
      When the last reference to a virConnectPtr is released by
      libvirtd, it was possible for a deadlock to occur in the
      virDomainEventState functions. The virDomainEventStatePtr
      holds a reference on virConnectPtr for each registered
      callback. When removing a callback, the virUnrefConnect
      function is run. If this causes the last reference on the
      virConnectPtr to be released, then virReleaseConnect can
      be run, which in turns calls qemudClose. This function has
      a call to virDomainEventStateDeregisterConn which is intended
      to remove all callbacks associated with the virConnectPtr
      instance. This will try to grab a lock on virDomainEventState
      but this lock is already held. Deadlock ensues
      
      Thread 1 (Thread 0x7fcbb526a840 (LWP 23185)):
      
      Since each callback associated with a virConnectPtr holds a
      reference on virConnectPtr, it is impossible for the qemudClose
      method to be invoked while any callbacks are still registered.
      Thus the call to virDomainEventStateDeregisterConn must in fact
      be a no-op. Thus it is possible to just remove all trace of
      virDomainEventStateDeregisterConn and avoid the deadlock.
      
      * src/conf/domain_event.c, src/conf/domain_event.h,
        src/libvirt_private.syms: Delete virDomainEventStateDeregisterConn
      * src/libxl/libxl_driver.c, src/lxc/lxc_driver.c,
        src/qemu/qemu_driver.c, src/uml/uml_driver.c: Remove
        calls to virDomainEventStateDeregisterConn
      2cb0899e
  2. 21 5月, 2012 1 次提交
    • S
      nwfilter: Add support for ipset · a3f3ab4c
      Stefan Berger 提交于
      This patch adds support for the recent ipset iptables extension
      to libvirt's nwfilter subsystem. Ipset allows to maintain 'sets'
      of IP addresses, ports and other packet parameters and allows for
      faster lookup (in the order of O(1) vs. O(n)) and rule evaluation
      to achieve higher throughput than what can be achieved with
      individual iptables rules.
      
      On the command line iptables supports ipset using
      
      iptables ... -m set --match-set <ipset name> <flags> -j ...
      
      where 'ipset name' is the name of a previously created ipset and
      flags is a comma-separated list of up to 6 flags. Flags use 'src' and 'dst'
      for selecting IP addresses, ports etc. from the source or
      destination part of a packet. So a concrete example may look like this:
      
      iptables -A INPUT -m set --match-set test src,src -j ACCEPT
      
      Since ipset management is quite complex, the idea was to leave ipset 
      management outside of libvirt but still allow users to reference an ipset.
      The user would have to make sure the ipset is available once the VM is
      started so that the iptables rule(s) referencing the ipset can be created.
      
      Using XML to describe an ipset in an nwfilter rule would then look as
      follows:
      
        <rule action='accept' direction='in'>
          <all ipset='test' ipsetflags='src,src'/>
        </rule>
      
      The two parameters on the command line are also the two distinct XML attributes
      'ipset' and 'ipsetflags'.
      
      FYI: Here is the man page for ipset:
      
      https://ipset.netfilter.org/ipset.man.html
      
      Regards,
          Stefan
      a3f3ab4c
  3. 18 5月, 2012 5 次提交
  4. 17 5月, 2012 2 次提交
    • M
      qemu: Don't delete USB device on failed qemuPrepareHostdevUSBDevices · 9c484e3d
      Michal Privoznik 提交于
      If qemuPrepareHostdevUSBDevices fail it will roll back devices added
      to the driver list of used devices. However, if it may fail because
      the device is being used already. But then again - with roll back.
      Therefore don't try to remove a usb device manually if the function
      fail. Although, we want to remove the device if any operation
      performed afterwards fail.
      9c484e3d
    • D
      Add a virLogMessage alternative taking va_list args · e7df360d
      Daniel P. Berrange 提交于
      Allow the logging APIs to be called with a va_list for format
      args, instead of requiring var-args usage.
      
      * src/util/logging.h, src/util/logging.c: Add virLogVMessage
      e7df360d
  5. 16 5月, 2012 19 次提交
    • E
      build: fix recent syntax-check breakage · 3337ba6d
      Eric Blake 提交于
      The use of readlink() in lxc_container.c is intentional; we don't
      want an absolute pathname there.
      
      * src/util/cgroup.h (VIR_CGROUP_SYSFS_MOUNT): Indent properly.
      * cfg.mk (exclude_file_name_regexp--sc_prohibit_readlink): Add
      exemption.
      3337ba6d
    • M
      qemu: Rollback on used USB devices · 2f5fdc88
      Michal Privoznik 提交于
      One of our latest USB device handling patches
      05abd150 introduced a regression.
      That is, we first create a temporary list of all USB devices that
      are to be used by domain just starting up. Then we iterate over and
      check if a device from the list is in the global list of currently
      assigned devices (activeUsbHostdevs). If not, we add it there and
      continue with next iteration then. But if a device from temporary
      list is either taken already or adding to the activeUsbHostdevs fails,
      we remove all devices in temp list from the activeUsbHostdevs list.
      Therefore, if a device is already taken we remove it from
      activeUsbHostdevs even if we should not. Thus, next time we allow
      the device to be assigned to another domain.
      2f5fdc88
    • D
      Fix build compat with older libselinux for LXC · 7ba66ef2
      Daniel P. Berrange 提交于
      Most versions of libselinux do not contain the function
      selinux_lxc_contexts_path() that the security driver
      recently started using for LXC. We must add a conditional
      check for it in configure and then disable the LXC security
      driver for builds where libselinux lacks this function.
      
      * configure.ac: Check for selinux_lxc_contexts_path
      * src/security/security_selinux.c: Disable LXC security
        if selinux_lxc_contexts_path() is missing
      7ba66ef2
    • D
      Remount cgroups controllers after setting up new /sys in LXC · a8c0b2fe
      Daniel P. Berrange 提交于
      Normal practice is for cgroups controllers to be mounted at
      /sys/fs/cgroup. When setting up a container, /sys is mounted
      with a new sysfs instance, thus we must re-mount all the
      cgroups controllers. The complexity is that we must mount
      them in the same layout as the host OS. ie if 'cpu' and 'cpuacct'
      were mounted at the same location in the host we must preserve
      this in the container. Also if any controllers are co-located
      we must setup symlinks from the individual controller name to
      the co-located mount-point
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      a8c0b2fe
    • D
      Trim /proc & /sys subtrees before mounting new instances · c529b47a
      Daniel P. Berrange 提交于
      Both /proc and /sys may have sub-mounts in them from the host
      OS. We must explicitly unmount them all before mounting the
      new instance over that location. If we don't then /proc/mounts
      will show the sub-mounts as existing, even though nothing will
      be able to access them, due to the over-mount.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      c529b47a
    • D
      Avoid LXC pivot root in the root source is still / · c16b4c43
      Daniel P. Berrange 提交于
      If the LXC config has a filesystem
      
        <filesystem>
           <source dir='/'/>
           <target dir='/'/>
        </filesystem>
      
      then there is no need to go down the pivot root codepath.
      We can simply use the existing root as needed.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      c16b4c43
    • D
      Mount fresh instance of sysfs/selinux in LXC · e8639920
      Daniel P. Berrange 提交于
      Currently to make sysfs readonly, we remount the existing
      instance and then bind it readonly. Unfortunately this means
      sysfs is still showing device objects wrt the host OS namespace.
      We need it to reflect the container namespace, so we must mount
      a completely new instance of it. Do the same for selinuxfs since
      there is no benefit to bind mounting & this lets us simplify
      the code.
      
      * src/lxc/lxc_container.c: Mount fresh sysfs instance
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      e8639920
    • D
      Convert the LXC driver to use the security driver API for mount options · 8dd5794f
      Daniel Walsh 提交于
      Instead of hardcoding use of SELinux contexts in the LXC driver,
      switch over to using the official security driver API.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      8dd5794f
    • D
      Add security driver APIs for getting mount options · abf2ebbd
      Daniel Walsh 提交于
      Some security drivers require special options to be passed to
      the mount system call. Add a security driver API for handling
      this data.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      abf2ebbd
    • D
      Add support for LXC specific SELinux configuration · 6844cead
      Daniel Walsh 提交于
      The SELinux policy for LXC uses a different configuration file
      than the traditional svirt one. Thus we need to load
      /etc/selinux/targeted/contexts/lxc_contexts which contains
      something like this:
      
       process = "system_u:system_r:svirt_lxc_net_t:s0"
       file = "system_u:object_r:svirt_lxc_file_t:s0"
       content = "system_u:object_r:virt_var_lib_t:s0"
      
      cleverly designed to be parsable by virConfPtr
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      6844cead
    • D
      Use private data struct in SELinux driver · fa5e68ff
      Daniel Walsh 提交于
      Currently the SELinux driver stores its state in a set of global
      variables. This switches it to use a private data struct instead.
      This will enable different instances to have their own data.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      fa5e68ff
    • D
      Don't enable the AppArmour security driver with LXC · cf36c23b
      Daniel Walsh 提交于
      The AppArmour driver does not currently have support for LXC
      so ensure that when probing, it claims to be disabled
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      cf36c23b
    • D
      Pass the virt driver name into security drivers · 73580c60
      Daniel Walsh 提交于
      To allow the security drivers to apply different configuration
      information per hypervisor, pass the virtualization driver name
      into the security manager constructor.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      73580c60
    • J
      qemu: Add support for -no-user-config · 63b42436
      Jiri Denemark 提交于
      Thanks to this new option we are now able to use modern CPU models (such
      as Westmere) defined in external configuration file.
      
      The qemu-1.1{,-device} data files for qemuhelptest are filled in with
      qemu-1.1-rc2 output for now. I will update those files with real
      qemu-1.1 output once it is released.
      63b42436
    • D
      Set a sensible default master start port for ehci companion controllers · 03b804a2
      Daniel P. Berrange 提交于
      The uhci1, uhci2, uhci3 companion controllers for ehci1 must
      have a master start port set. Since this value is predictable
      we should set it automatically if the app does not supply it
      03b804a2
    • D
      Fix logic for assigning PCI addresses to USB2 companion controllers · 1ebd52cb
      Daniel P. Berrange 提交于
      Currently each USB2 companion controller gets put on a separate
      PCI slot. Not only is this wasteful of PCI slots, but it is not
      in compliance with the spec for USB2 controllers. The master
      echi1 and all companion controllers should be in the same slot,
      with echi1 in function 7, and uhci1-3 in functions 0-2 respectively.
      
      * src/qemu/qemu_command.c: Special case handling of USB2 controllers
        to apply correct pci slot assignment
      * tests/qemuxml2argvdata/qemuxml2argv-usb-ich9-ehci-addr.args,
        tests/qemuxml2argvdata/qemuxml2argv-usb-ich9-ehci-addr.xml: Expand
        test to cover automatic slot assignment
      1ebd52cb
    • D
      Fix virDomainDeviceInfoIsSet() to check all struct fields · 2c195fdb
      Daniel P. Berrange 提交于
      The virDomainDeviceInfoIsSet API was only checking if an
      address or alias was set in the struct. Thus if only a
      rom bar setting / filename, boot index, or USB master
      value was set, they could be accidentally dropped when
      formatting XML
      2c195fdb
    • D
      Remove redundant trailing slash in user dir paths · b3567ef3
      Daniel P. Berrange 提交于
      Callers of virGetUser{Config,Runtime,Cache}Directory all
      append further path component. We should not be
      adding a trailing slash in the return path otherwise we
      get paths containing '//'
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      b3567ef3
    • D
      Allow stack traces to be included with log messages · 54856395
      Daniel P. Berrange 提交于
      Sometimes it is useful to see the callpath for log messages.
      This change enhances the log filter syntax so that stack traces
      can be show by setting '1:+NAME' instead of '1:NAME'.
      
      This results in output like:
      
      2012-05-09 14:18:45.136+0000: 13314: debug : virInitialize:414 : register drivers
      /home/berrange/src/virt/libvirt/src/.libs/libvirt.so.0(virInitialize+0xd6)[0x7f89188ebe86]
      /home/berrange/src/virt/libvirt/tools/.libs/lt-virsh[0x431921]
      /lib64/libc.so.6(__libc_start_main+0xf5)[0x3a21e21735]
      /home/berrange/src/virt/libvirt/tools/.libs/lt-virsh[0x40a279]
      
      2012-05-09 14:18:45.136+0000: 13314: debug : virRegisterDriver:775 : driver=0x7f8918d02760 name=Test
      /home/berrange/src/virt/libvirt/src/.libs/libvirt.so.0(virRegisterDriver+0x6b)[0x7f89188ec717]
      /home/berrange/src/virt/libvirt/src/.libs/libvirt.so.0(+0x11b3ad)[0x7f891891e3ad]
      /home/berrange/src/virt/libvirt/src/.libs/libvirt.so.0(virInitialize+0xf3)[0x7f89188ebea3]
      /home/berrange/src/virt/libvirt/tools/.libs/lt-virsh[0x431921]
      /lib64/libc.so.6(__libc_start_main+0xf5)[0x3a21e21735]
      /home/berrange/src/virt/libvirt/tools/.libs/lt-virsh[0x40a279]
      
      * docs/logging.html.in: Document new syntax
      * configure.ac: Check for execinfo.h
      * src/util/logging.c, src/util/logging.h: Add support for
        stack traces
      * tests/testutils.c: Adapt to API change
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      54856395
  6. 15 5月, 2012 9 次提交
    • D
      Move user libvirtd socket out of abstract namespace · 905be03d
      Daniel P. Berrange 提交于
      The current unprivileged user libvirtd sockets are in the abstract
      namespace. This has a number of problems
      
       - You can't connect to them remotely using the nc/ssh tunnel
       - This is not portable for OS-X, BSD & probably others
       - Parent directory permissions don't apply
      905be03d
    • G
      openvz: determine kb/pages only once · 80fd8367
      Guido Günther 提交于
      to save some syscalls (as suggested by Eric Blake)
      80fd8367
    • O
      nodeinfo: Get the correct CPU number on AMD Magny Cours platform · 10d9038b
      Osier Yang 提交于
      "Instead of developing one CPU with 12 cores, the Magny Cours is
      actually two 6 core “Bulldozer” CPUs combined in to one package"
      
      I.e, each package has two NUMA nodes, and the two numa nodes share
      the same core ID set (0-6), which means parsing the cores number
      from sysfs doesn't work in this case.
      
      And the wrong CPU number could cause three problems for libvirt:
      
      1) performance lost
      
        A domain without "cpuset" or "placement='auto'" (to drive numad)
      specified will be only pinned to part of the CPUs.
      
      2) domain can be started
      
        If a domain uses numad, and the advisory nodeset returned from
      numad contains node which exceeds the range of wrong total CPU
      number. The domain will fail to start, as the bitmask passed to
      sched_setaffinity could be fully filled with zero.
      
      3) wrong CPU number affects lots of stuffs.
      
        E.g. for command "virsh vcpuinfo", "virsh vcpupin", it will always
      output with the truncated CPU list.
      
      For more details:
      
      https://www.redhat.com/archives/libvir-list/2012-May/msg00607.html
      
      This patch is to fix the problem by parsing /proc/cpuinfo to get
      the value of field "cpu cores", and use it as nodeinfo->cores if
      it's greater than the cores number from sysfs.
      10d9038b
    • O
      qemu: Set memory policy using cgroup if placement is auto · be9f6ecb
      Osier Yang 提交于
      Like for 'static' placement, when the memory policy mode is
      'strict', set the memory policy by writing the advisory nodeset
      returned from numad to cgroup file cpuset.mems,
      be9f6ecb
    • O
      qemu: Use the CPU index in capabilities to map NUMA node to cpu list. · d1bdeca8
      Osier Yang 提交于
      On some of the NUMA platforms, the CPU index in each NUMA node
      grows non-consecutive. While on other platforms, it can be inconsecutive,
      E.g.
      
      % numactl --hardware
      available: 4 nodes (0-3)
      node 0 cpus: 0 4 8 12 16 20 24 28
      node 0 size: 131058 MB
      node 0 free: 86531 MB
      node 1 cpus: 1 5 9 13 17 21 25 29
      node 1 size: 131072 MB
      node 1 free: 127070 MB
      node 2 cpus: 2 6 10 14 18 22 26 30
      node 2 size: 131072 MB
      node 2 free: 127758 MB
      node 3 cpus: 3 7 11 15 19 23 27 31
      node 3 size: 131072 MB
      node 3 free: 127226 MB
      node distances:
      node   0   1   2   3
        0:  10  20  20  20
        1:  20  10  20  20
        2:  20  20  10  20
        3:  20  20  20  10
      
      This patch is to fix the problem by using the CPU index in
      caps->host.numaCell[i]->cpus[i] to set the bitmask instead of
      assuming the CPU index of the NUMA nodes are always sequential.
      d1bdeca8
    • L
      Assign spapr-vio bus address to ibmvscsi controller · bb725ac1
      Li Zhang 提交于
      For pseries guest, the default controller model is
      ibmvscsi controller, this controller only can work
      on spapr-vio address.
      
      This patch is to assign spapr-vio address type to
      ibmvscsi controller and correct vscsi test case.
      Signed-off-by: NLi Zhang <zhlcindy@linux.vnet.ibm.com>
      bb725ac1
    • D
      sanlock: fix locking for readonly devices · b8012ce9
      David Weber 提交于
      Add ignore param for readonly and shared disk in sanlock
      b8012ce9
    • E
      nodeinfo: avoid probing host filesystem during test · 2b366b46
      Eric Blake 提交于
      We had previously weakened our nodeinfotest in order to ignore parsed
      node values, because the parse function was mistakenly relying on
      host files.  A better fix is to avoid using the numactl library, but
      to instead parse the same files that numactl would read, all while
      allowing the files to be relative to our choice of directory.
      
      * src/nodeinfo.c (CPU_SYS_PATH, NODE_SYS_PATH): Replace with...
      (SYSFS_SYSTEM_PATH): ...parent directory.
      (linuxNodeInfoCPUPopulate): Check NUMA nodes from requested
      directory (by inlining numactl code).
      (nodeGetCPUmap, nodeGetMemoryStats): Adjust macro use.
      * tests/nodeinfotest.c (linuxTestCompareFiles, linuxTestNodeInfo):
      Update test to match.
      2b366b46
    • E
      nodeinfo: drop static variable · 88f12a36
      Eric Blake 提交于
      We were wasting time to malloc a copy of a constant string, then
      copy it into static storage, for every call to nodeGetInfo.  At
      least we were lucky that it was a constant source, and thus not
      subject to even worse issues with one thread clobbering the static
      storage while another was using it.  This gets rid of the waste,
      by passing the string through the stack instead, as well as renaming
      internal functions to better match our conventions.
      
      * src/nodeinfo.c (sysfs_path): Delete.
      (get_cpu_value, count_thread_siblings, parse_socket): Add
      parameter, and rename...
      (virNodeGetCpuValue, virNodeCountThreadSiblings)
      (virNodeParseSocket): ... into a common namespace.
      (cpu_online, parse_core): Inline into callers.
      (linuxNodeInfoCPUPopulate): Update caller.
      (nodeGetInfo): Drop a useless malloc.
      88f12a36