1. 15 8月, 2010 1 次提交
  2. 13 8月, 2010 1 次提交
  3. 10 8月, 2010 2 次提交
  4. 02 8月, 2010 2 次提交
    • L
      Add iptables rule to fixup DHCP response checksum. · fd5b15ff
      Laine Stump 提交于
      This patch attempts to take advantage of a newly added netfilter
      module to correct for a problem with some guest DHCP client
      implementations when used in conjunction with a DHCP server run on the
      host systems with packet checksum offloading enabled.
      
      The problem is that, when the guest uses a RAW socket to read the DHCP
      response packets, the checksum hasn't yet been fixed by the IP stack,
      so it is incorrect.
      
      The fix implemented here is to add a rule to the POSTROUTING chain of
      the mangle table in iptables that fixes up the checksum for packets on
      the virtual network's bridge that are destined for the bootpc port (ie
      "dhcpc", ie port 68) port on the guest.
      
      Only very new versions of iptables will have this support (it will be
      in the next upstream release), so a failure to add this rule only
      results in a warning message. The iptables patch is here:
      
        http://patchwork.ozlabs.org/patch/58525/
      
      A corresponding kernel module patch is also required (the backend of
      the iptables patch) and that will be in the next release of the
      kernel.
      fd5b15ff
    • C
      Fix the ACS checking in the PCI code. · 86b043ad
      Chris Lalancette 提交于
      When trying to assign a PCI device to a guest, we have
      to check that all bridges upstream of that device support
      ACS.  That means that we have to find the parent bridge of
      the current device, check for ACS, then find the parent bridge
      of that device, check for ACS, etc.  As it currently stands,
      the code to do this iterates through all PCI devices on the
      system, looking for a device that has a range of busses that
      included the current device's bus.
      
      That check is not restrictive enough, though.  Depending on
      how we iterated through the list of PCI devices, we could first
      find the *topmost* bridge in the system; since it necessarily had
      a range of busses including the current device's bus, we
      would only ever check the topmost bridge, and not check
      any of the intermediate bridges.
      
      Note that this also caused a fairly serious bug in the
      secondary bus reset code, where we could erroneously
      find and reset the topmost bus instead of the inner bus.
      
      This patch changes pciGetParentDevice() so that it first
      checks if a bridge device's secondary bus exactly matches
      the bus of the device we are looking for.  If it does, we've
      found the correct parent bridge and we are done.  If it does not,
      then we check to see if this bridge device's busses *include* the
      bus of the device we care about.  If so, we mark this bridge device
      as best, and go on.  If we later find another bridge device whose
      busses include this device, but is more restrictive, then we
      free up the previous best and mark the new one as best.  This
      algorithm ensures that in the normal case we find the direct
      parent, but in the case that the parent bridge secondary bus
      is not exactly the same as the device, we still find the
      correct bridge.
      
      This patch was tested by me on a 4-port NIC with a
      bridge without ACS (where assignment failed), a 4-port
      NIC with a bridge with ACS (where assignment succeeded),
      and a 2-port NIC with no bridges (where assignment
      succeeded).
      Signed-off-by: NChris Lalancette <clalance@redhat.com>
      86b043ad
  5. 30 7月, 2010 1 次提交
    • C
      Fix DMI uuid parsing. · 435fa6d7
      Chris Lalancette 提交于
      valgrind was complaining that virUUIDParse was depending on
      an uninitialized value.  Indeed it was; virSetHostUUIDStr()
      didn't initialize the dmiuuid buffer to 0's, meaning that
      anything after the string read from /sys was uninitialized.
      Clear out the dmiuuid buffer before use, and make sure to
      always leave a \0 at the end.
      Signed-off-by: NChris Lalancette <clalance@redhat.com>
      435fa6d7
  6. 29 7月, 2010 1 次提交
    • C
      Fix a potential race in pciInitDevice. · 56b40823
      Chris Lalancette 提交于
      If detecting the FLR flag of a pci device fails, then we
      could run into the situation of trying to close a file
      descriptor twice, once in pciInitDevice() and once in pciFreeDevice().
      Fix that by removing the pciCloseConfig() in pciInitDevice() and
      just letting pciFreeDevice() handle it.
      
      Thanks to Chris Wright for pointing out this problem.
      
      While we are at it, fix an error check.  While it would actually
      work as-is (since success returns 0), it's still more clear to
      check for < 0 (as the rest of the code does).
      Signed-off-by: NChris Lalancette <clalance@redhat.com>
      56b40823
  7. 28 7月, 2010 1 次提交
    • G
      fix handling of PORT_PROFILE_RESPONSE_INPROGRESS netlink message · e4fb6a3c
      Gerhard Stenzel 提交于
      During function test of the 802.1Qbg implementation in lldpad we came
      across a small problem in the handling of the netlink message
      corresponding to PORT_PROFILE_RESPONSE_INPROGRESS. This should not
      result in returning the default rc=1.
      
      - src/util/macvtap.c: fix getPortProfileStatus() to return 0 in that
        case and also fix an indentation problem
      e4fb6a3c
  8. 27 7月, 2010 2 次提交
    • C
      Force FLR on for buggy SR-IOV devices. · 71e92a15
      Chris Lalancette 提交于
      Some buggy PCI devices actually support FLR, but
      forget to advertise that fact in their PCI config space.
      However, Virtual Functions on SR-IOV devices are
      *required* to support FLR by the spec, so force has_flr
      on if this is a virtual function.
      Signed-off-by: NChris Lalancette <clalance@redhat.com>
      71e92a15
    • C
      pciResetDevice: use inactive devices to determine safe reset · 46bcdb96
      Chris Wright 提交于
      When doing a PCI secondary bus reset, we must be sure that there are no
      active devices on the same bus segment.  The active device tracking is
      designed to only track host devices that are active in use by guests.
      This ignores host devices that are actively in use by the host.  So the
      current logic will reset host devices.
      
      Switch this logic around and allow sbus reset when we are assigning all
      devices behind a bridge to the same guest at guest startup or as a result
      of a single attach-device command.
      
      * src/util/pci.h: change signature of pciResetDevice to add an
        inactive devices list
      * src/qemu/qemu_driver.c src/xen/xen_driver.c: use (or not) the new
        functionality of pciResetDevice() depending on the place of use
      * src/util/pci.c: implement the interface and logic changes
      46bcdb96
  9. 23 7月, 2010 2 次提交
    • C
      pciSharesBusWithActive fails to find multiple devices on bus · f4828ca3
      Chris Wright 提交于
      The first conditional is always true which means the iterator will
      never find another device on the same bus.
      
          if (dev->domain != check->domain ||
              dev->bus != check->bus ||
        ----> (check->slot == check->slot &&
               check->function == check->function)) <-----
      
      The goal of that check is to verify that the device is either:
      
        in a different pci domain
        on a different bus
        is the same identical device
      
      This means libvirt may issue a secondary bus reset when there are
      devices
      on that bus that actively in use by the host or another guest.
      
      * src/util/pci.c: fix a bogus test in pciSharesBusWithActive()
      f4828ca3
    • D
      Set a stable & high MAC addr for guest TAP devices on host · 6ea90b84
      Daniel P. Berrange 提交于
      A Linux software bridge will assume the MAC address of the enslaved
      interface with the numerically lowest MAC addr. When the bridge
      changes MAC address there is a period of network blackout, so a
      change should be avoided. The kernel gives TAP devices a completely
      random MAC address. Occassionally the random TAP device MAC is lower
      than that of the physical interface (eth0, eth1etc) that is enslaved,
      causing the bridge to change its MAC.
      
      This change sets an explicit MAC address for all TAP devices created
      using the configured MAC from the XML, but with the high byte set
      to 0xFE. This should ensure TAP device MACs are higher than any
      physical interface MAC.
      
      * src/qemu/qemu_conf.c, src/uml/uml_conf.c: Pass in a MAC addr
        for the TAP device with high byte set to 0xFE
      * src/util/bridge.c, src/util/bridge.h: Set a MAC when creating
        the TAP device to override random MAC
      6ea90b84
  10. 22 7月, 2010 2 次提交
    • L
      Change virDirCreate to return -errno on failure. · 3e0f05fc
      Laine Stump 提交于
      virDirCreate also previously returned 0 on success and errno on
      failure. This makes it fit the recommended convention of returning 0
      on success, -errno (ie a negative number) on failure.
      3e0f05fc
    • L
      Change virFileOperation to return -errno (ie < 0) on error. · 2ad04f78
      Laine Stump 提交于
      virFileOperation previously returned 0 on success, or the value of
      errno on failure. Although there are other functions in libvirt that
      use this convention, the preferred (and more common) convention is to
      return 0 on success and -errno (or simply -1 in some cases) on
      failure. This way the check for failure is always (ret < 0).
      
      * src/util/util.c - change virFileOperation and virFileOperationNoFork to
                          return -errno on failure.
      
      * src/storage/storage_backend.c, src/qemu/qemu_driver.c
        - change the hook functions passed to virFileOperation to return
          -errno on failure.
      2ad04f78
  11. 20 7月, 2010 1 次提交
    • D
      Require format to be passed into virStorageFileGetMetadata · bf80fc68
      Daniel P. Berrange 提交于
      Require the disk image to be passed into virStorageFileGetMetadata.
      If this is set to VIR_STORAGE_FILE_AUTO, then the format will be
      resolved using probing. This makes it easier to control when
      probing will be used
      
      * src/qemu/qemu_driver.c, src/qemu/qemu_security_dac.c,
        src/security/security_selinux.c, src/security/virt-aa-helper.c:
        Set VIR_STORAGE_FILE_AUTO when calling virStorageFileGetMetadata.
      * src/storage/storage_backend_fs.c: Probe for disk format before
        calling virStorageFileGetMetadata.
      * src/util/storage_file.h, src/util/storage_file.c: Remove format
        from virStorageFileMeta struct & require it to be passed into
        method.
      bf80fc68
  12. 19 7月, 2010 4 次提交
    • D
      Refactor virStorageFileGetMetadataFromFD to separate functionality · c70cb0f4
      Daniel P. Berrange 提交于
      The virStorageFileGetMetadataFromFD did two jobs in one. First
      it probed for storage type, then it extracted metadata for the
      type. It is desirable to be able to separate these jobs, allowing
      probing without querying metadata, and querying metadata without
      probing.
      
      To prepare for this, split out probing code into a new pair of
      methods
      
        virStorageFileProbeFormatFromFD
        virStorageFileProbeFormat
      
      * src/util/storage_file.c, src/util/storage_file.h,
        src/libvirt_private.syms: Introduce virStorageFileProbeFormat
        and virStorageFileProbeFormatFromFD
      c70cb0f4
    • D
      Remove 'type' field from FileTypeInfo struct · 779b6ea7
      Daniel P. Berrange 提交于
      Instead of including a field in FileTypeInfo struct for the
      disk format, rely on the array index matching the format.
      Use verify() to assert the correct number of elements in the
      array.
      
      * src/util/storage_file.c: remove type field from FileTypeInfo
      779b6ea7
    • D
      Extract the backing store format as well as name, if available · a93402d4
      Daniel P. Berrange 提交于
      When QEMU opens a backing store for a QCow2 file, it will
      normally auto-probe for the format of the backing store,
      rather than assuming it has the same format as the referencing
      file. There is a QCow2 extension that allows an explicit format
      for the backing store to be embedded in the referencing file.
      This closes the auto-probing security hole in QEMU.
      
      This backing store format can be useful for libvirt users
      of virStorageFileGetMetadata, so extract this data and report
      it.
      
      QEMU does not require disk image backing store files to be in
      the same format the file linkee. It will auto-probe the disk
      format for the backing store when opening it. If the backing
      store was intended to be a raw file this could be a security
      hole, because a guest may have written data into its disk that
      then makes the backing store look like a qcow2 file. If it can
      trick QEMU into thinking the raw file is a qcow2 file, it can
      access arbitrary files on the host by adding further backing
      store links.
      
      To address this, callers of virStorageFileGetMeta need to be
      told of the backing store format. If no format is declared,
      they can make a decision whether to allow format probing or
      not.
      a93402d4
    • D
      CVE-2010-2242 Apply a source port mapping to virtual network masquerading · c5678530
      Daniel P. Berrange 提交于
      IPtables will seek to preserve the source port unchanged when
      doing masquerading, if possible. NFS has a pseudo-security
      option where it checks for the source port <= 1023 before
      allowing a mount request. If an admin has used this to make the
      host OS trusted for mounts, the default iptables behaviour will
      potentially allow NAT'd guests access too. This needs to be
      stopped.
      
      With this change, the iptables -t nat -L -n -v rules for the
      default network will be
      
      Chain POSTROUTING (policy ACCEPT 95 packets, 9163 bytes)
       pkts bytes target     prot opt in     out     source               destination
         14   840 MASQUERADE  tcp  --  *      *       192.168.122.0/24    !192.168.122.0/24    masq ports: 1024-65535
         75  5752 MASQUERADE  udp  --  *      *       192.168.122.0/24    !192.168.122.0/24    masq ports: 1024-65535
          0     0 MASQUERADE  all  --  *      *       192.168.122.0/24    !192.168.122.0/24
      
      * src/network/bridge_driver.c: Add masquerade rules for TCP
        and UDP protocols
      * src/util/iptables.c, src/util/iptables.c: Add source port
        mappings for TCP & UDP protocols when masquerading.
      c5678530
  13. 02 7月, 2010 1 次提交
    • C
      util: virExec: Dispatch all errors raised after fork · e190754e
      Cole Robinson 提交于
      Any error message raised after the process has forked needs
      to be followed by virDispatchError, otherwise we have no chance of
      ever seeing it. This was selectively done for hook functions in the past,
      but really applies to all post-fork errors.
      e190754e
  14. 30 6月, 2010 2 次提交
  15. 29 6月, 2010 2 次提交
  16. 28 6月, 2010 1 次提交
    • L
      Enhance virStorageFileIsSharedFS · fb457c5c
      Laine Stump 提交于
      virStorageFileIsSharedFS would previously only work if the entire path
      in question was stat'able by the uid of the libvirtd process. This
      patch changes it to crawl backwards up the path retrying the statfs
      call until it gets to a partial path that *can* be stat'ed.
      
      This is necessary to use the function to learn the fstype for files
      stored as a different user (and readable only by that user) on a
      root-squashed remote filesystem.
      fb457c5c
  17. 25 6月, 2010 1 次提交
    • D
      Set labelling for character devices in security drivers · 2bad82f7
      Daniel P. Berrange 提交于
      When configuring serial, parallel, console or channel devices
      with a file, dev or pipe backend type, it is necessary to label
      the file path in the security drivers. For char devices of type
      file, it is neccessary to pre-create (touch) the file if it does
      not already exist since QEMU won't be allowed todo so itself.
      dev/pipe configs already require the admin to pre-create before
      starting the guest.
      
      * src/qemu/qemu_security_dac.c: set file ownership for character
        devices
      * src/security/security_selinux.c: Set file labeling for character
        devices
      * src/qemu/qemu_driver.c: Add character devices to cgroup ACL
      2bad82f7
  18. 24 6月, 2010 2 次提交
    • R
      cgroup: Enable memory.use_hierarchy of cgroup for domain · 4a4eb13e
      Ryota Ozaki 提交于
      Through conversation with Kumar L Srikanth-B22348, I found
      that the function of getting memory usage (e.g., virsh dominfo)
      doesn't work for lxc with ns subsystem of cgroup enabled.
      
      This is because of features of ns and memory subsystems.
      Ns creates child cgroup on every process fork and as a result
      processes in a container are not assigned in a cgroup for
      domain (e.g., libvirt/lxc/test1/). For example, libvirt_lxc
      and init (or somewhat specified in XML) are assigned into
      libvirt/lxc/test1/8839/ and libvirt/lxc/test1/8839/8849/,
      respectively. On the other hand, memory subsystem accounts
      memory usage within a group of processes by default, i.e.,
      it does not take any child (and descendant) groups into
      account. With the two features, virsh dominfo which just
      checks memory usage of a cgroup for domain always returns
      zero because the cgroup has no process.
      
      Setting memory.use_hierarchy of a group allows to account
      (and limit) memory usage of every descendant groups of the group.
      By setting it of a cgroup for domain, we can get proper memory
      usage of lxc with ns subsystem enabled. (To be exact, the
      setting is required only when memory and ns subsystems are
      enabled at the same time, e.g., mount -t cgroup none /cgroup.)
      4a4eb13e
    • R
      cgroup: Change virCgroupRemove to remove all descendant groups at first · 842b51ff
      Ryota Ozaki 提交于
      As same as normal directories, a cgroup cannot be removed if it
      contains sub groups. This patch changes virCgroupRemove to remove
      all descendant groups (subdirectories) of a target group before
      removing the target group.
      
      The handling is required when we run lxc with ns subsystem of cgroup.
      Ns subsystem automatically creates child cgroups on every process
      forks, but unfortunately the groups are not removed on process exits,
      so we have to remove them by ourselves.
      
      With this patch, such child (and descendant) groups are surely removed
      at lxc shutdown, i.e., lxcVmCleanup which calls virCgroupRemove.
      842b51ff
  19. 23 6月, 2010 1 次提交
    • D
      Improve some error messages about unsupported APIs/URIs · 9b0244ae
      Daniel P. Berrange 提交于
      If there is no driver for a URI we report
      
        "no hypervisor driver available"
      
      This is bad because not all virt drivers are hypervisors (ie container
      based virt).
      
      If there is no driver support for an API we report
      
        "this function is not supported by the hypervisor"
      
      This is bad for the same reason, and additionally because it is
      also used for the network, interface & storage drivers.
      
      * src/util/virterror.c: Improve error messages
      9b0244ae
  20. 18 6月, 2010 2 次提交
  21. 17 6月, 2010 1 次提交
    • S
      macvtap: work-around for 2.6.32 and older kernels · 045a5722
      Stefan Berger 提交于
      This patch works around a recent extension of the netlink driver I had made use of when building the netlink messages. Unfortunately older kernels don't accept IFLA_IFNAME + name of interface as a replacement for the interface's index, so this patch now gets the interface index ifindex if it's not provided (ifindex <= 0).
      045a5722
  22. 11 6月, 2010 1 次提交
  23. 10 6月, 2010 1 次提交
    • E
      build: avoid pthreads-win32 on mingw · 6e5a04f0
      Eric Blake 提交于
      * src/util/threads.c (includes) [WIN32]: On mingw, favor native
      threading over pthreads-win32 library.
      * src/util/thread.h [WIN32] Likewise.
      Suggested by Daniel P. Berrange.
      6e5a04f0
  24. 08 6月, 2010 1 次提交
    • D
      Enable probing of VPC disk format type · f4365c73
      Daniel P. Berrange 提交于
      A look at the QEMU source revealed the missing bits of info about
      the VPC file format, so we can enable this now
      
      * src/util/storage_file.c: Enable VPC format, providing version
        and disk size offset fields
      f4365c73
  25. 03 6月, 2010 1 次提交
    • S
      add 802.1Qbh and 802.1Qbg handling · ca3b22bb
      Stefan Berger 提交于
      This patch that adds support for configuring 802.1Qbg and 802.1Qbh
      switches. The 802.1Qbh part has been successfully tested with real
      hardware. The 802.1Qbg part has only been tested with a (dummy)
      server that 'behaves' similarly to how we expect lldpad to 'behave'.
      
      The following changes were made during the development of this patch:
      
       - Merging Scott's v13-pre1 patch
       - Fixing endptr related bug while using virStrToLong_ui() pointed out
         by Jim Meyering
       - Addressing Jim Meyering's comments to v11
       - requiring mac address to the vpDisassociateProfileId() function to
         pass it further to the 802.1Qbg disassociate part (802.1Qbh untouched)
       - determining pid of lldpad daemon by reading it from /var/run/libvirt.pid
         (hardcode as is hardcode alson in lldpad sources)
       - merging netlink send code for kernel target and user space target
         (lldpad) using one function nlComm() to send the messages
       - adding a select() after the sending and before the reading of the
         netlink response in case lldpad doesn't respond and so we don't hang
       - when reading the port status, in case of 802.1Qbg, no status may be
         received while things are 'in progress' and only at the end a status
         will be there.
       - when reading the port status, use the given instanceId and vf to pick
         the right IFLA_VF_PORT among those nested under IFLA_VF_PORTS.
       - never sending nor parsing IFLA_PORT_SELF type of messages in the
         802.1Qbg case
       - iterating over the elements in a IFLA_VF_PORTS to pick the right
         IFLA_VF_PORT by either IFLA_PORT_PROFILE and given profileId
         (802.1Qbh) or IFLA_PORT_INSTANCE_UUID and given instanceId (802.1Qbg)
         and reading the current status in IFLA_PORT_RESPONSE.
       - recycling a previous patch that adds functionality to interface.c to
         - get the vlan identifier on an interface
         - get the flags of an interface and some convenience function to
           check whether an interface is 'up' or not (not currently used here)
       - adding function to determine the root physical interface of an
         interface. For example if a macvtap is linked to eth0.100, it will
         find eth0. Also adding a function that finds the vlan on the 'way to
         the root physical interface'
       - conveying the root physical interface name and index in case of 802.1Qbg
       - conveying mac address of macvlan device and vlan identifier in
         IFLA_VFINFO_LIST[ IFLA_VF_INFO[ IFLA_VF_MAC(mac), IFLA_VF_VLAN(vlan) ] ]
         to (future) lldpad via netlink
        - To enable build with --without-macvtap rename the
          [dis|]associatePortProfileId functions, prepend 'vp' before their
          name and make them non-static functions.
        - Renaming variable multicast to nltarget_kernel and inverting
          the logic
        - Addressing Jim Meyering's comments; this also touches existing
          code for example for correcting indentation of break statements or
          simplification of switch statements.
        - Renamed occurrencvirVirtualPortProfileDef to virVirtualPortProfileParamses
        - 802.1Qbg part prepared for sending a RTM_SETLINK and getting
          processing status back plus a subsequent RTM_GETLINK to
          get IFLA_PORT_RESPONSE.
          Note: This interface for 802.1Qbg may still change
        - [David Allan] move getPhysfn inside IFLA_VF_PORT_MAX to avoid
      compiler
          warning when latest if_link.h isn't available
        - move from Stefan's 802.1Qb{g|h} XML v8 to v9
        - move hostuuid and vf index calcs to inside doPortProfileOp8021Qbh
        - remove debug fprintfs
        - use virGetHostUUID (thanks Stefan!)
        - fix compile issue when latest if_link.h isn't available
        - change poll timeout to 10s, at 1/8 intervals
           - if polling times out, log msg and return -ETIMEDOUT
        - Add Stefan's code for getPortProfileStatus
        - Poll for up to 2 secs for port-profile status, at 1/8 sec intervals:
           - if status indicates error, abort openMacvtapTap
           - if status indicates success, exit polling
           - if status is "in-progress" after 2 secs of polling, exit
             polling loop silently, without error
      
      My patch finishes out the 802.1Qbh parts, which Stefan had mostly complete.
      I've tested using the recent kernel updates for VF_PORT netlink msgs and
      enic for Cisco's 10G Ethernet NIC.  I tested many VMs, each with several
      direct interfaces, each configured with a port-profile per the XML.  VM-to-VM,
      and VM-to-external work as expected.  VM-to-VM on same host (using same NIC)
      works same as VM-to-VM where VMs are on diff hosts.  I'm able to change
      settings on the port-profile while the VM is running to change the virtual
      port behaviour.  For example, adjusting a QoS setting like rate limit.  All
      VMs with interfaces using that port-profile immediatly see the effect of the
      change to the port-profile.
      
      I don't have a SR-IOV device to test so source dev is a non-SR-IOV device,
      but most of the code paths include support for specifing the source dev and
      VF index.  We'll need to complete this by discovering the PF given the VF
      linkdev.  Once we have the PF, we'll also have the VF index.  All this info-
      mation is available from sysfs.
      ca3b22bb
  26. 02 6月, 2010 2 次提交
  27. 28 5月, 2010 1 次提交