1. 01 11月, 2016 26 次提交
  2. 31 10月, 2016 14 次提交
    • Y
      vfio: Add support for mmapping sub-page MMIO BARs · 95251725
      Yongji Xie 提交于
      Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap
      sub-page MMIO BARs if the mmio page is exclusive") allows VFIO
      to mmap sub-page BARs. This is the corresponding QEMU patch.
      With those patches applied, we could passthrough sub-page BARs
      to guest, which can help to improve IO performance for some devices.
      
      In this patch, we expand MemoryRegions of these sub-page
      MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that
      the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION
      with a valid size. The expanding size will be recovered when
      the base address of sub-page BAR is changed and not page aligned
      any more in guest. And we also set the priority of these BARs'
      memory regions to zero in case of overlap with BARs which share
      the same page with sub-page BARs in guest.
      Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      95251725
    • I
      vfio/pci: fix out-of-sync BAR information on reset · a52a4c47
      Ido Yariv 提交于
      When a PCI device is reset, pci_do_device_reset resets all BAR addresses
      in the relevant PCIDevice's config buffer.
      
      The VFIO configuration space stays untouched, so the guest OS may choose
      to skip restoring the BAR addresses as they would seem intact. The PCI
      device may be left non-operational.
      One example of such a scenario is when the guest exits S3.
      
      Fix this by resetting the BAR addresses in the VFIO configuration space
      as well.
      Signed-off-by: NIdo Yariv <ido@wizery.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      a52a4c47
    • A
      vfio: Handle zero-length sparse mmap ranges · 24acf72b
      Alex Williamson 提交于
      As reported in the link below, user has a PCI device with a 4KB BAR
      which contains the MSI-X table.  This seems to hit a corner case in
      the kernel where the region reports being mmap capable, but the sparse
      mmap information reports a zero sized range.  It's not entirely clear
      that the kernel is incorrect in doing this, but regardless, we need
      to handle it.  To do this, fill our mmap array only with non-zero
      sized sparse mmap entries and add an error return from the function
      so we can tell the difference between nr_mmaps being zero based on
      sparse mmap info vs lack of sparse mmap info.
      
      NB, this doesn't actually change the behavior of the device, it only
      removes the scary "Failed to mmap ... Performance may be slow" error
      message.  We cannot currently create an mmap over the MSI-X table.
      
      Link: http://lists.nongnu.org/archive/html/qemu-discuss/2016-10/msg00009.htmlSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
      24acf72b
    • A
      memory: Don't use memcpy for ram_device regions · 4a2e242b
      Alex Williamson 提交于
      With a vfio assigned device we lay down a base MemoryRegion registered
      as an IO region, giving us read & write accessors.  If the region
      supports mmap, we lay down a higher priority sub-region MemoryRegion
      on top of the base layer initialized as a RAM device pointer to the
      mmap.  Finally, if we have any quirks for the device (ie. address
      ranges that need additional virtualization support), we put another IO
      sub-region on top of the mmap MemoryRegion.  When this is flattened,
      we now potentially have sub-page mmap MemoryRegions exposed which
      cannot be directly mapped through KVM.
      
      This is as expected, but a subtle detail of this is that we end up
      with two different access mechanisms through QEMU.  If we disable the
      mmap MemoryRegion, we make use of the IO MemoryRegion and service
      accesses using pread and pwrite to the vfio device file descriptor.
      If the mmap MemoryRegion is enabled and results in one of these
      sub-page gaps, QEMU handles the access as RAM, using memcpy to the
      mmap.  Using either pread/pwrite or the mmap directly should be
      correct, but using memcpy causes us problems.  I expect that not only
      does memcpy not necessarily honor the original width and alignment in
      performing a copy, but it potentially also uses processor instructions
      not intended for MMIO spaces.  It turns out that this has been a
      problem for Realtek NIC assignment, which has such a quirk that
      creates a sub-page mmap MemoryRegion access.
      
      To resolve this, we disable memory_access_is_direct() for ram_device
      regions since QEMU assumes that it can use memcpy for those regions.
      Instead we access through MemoryRegionOps, which replaces the memcpy
      with simple de-references of standard sizes to the host memory.
      
      With this patch we attempt to provide unrestricted access to the RAM
      device, allowing byte through qword access as well as unaligned
      access.  The assumption here is that accesses initiated by the VM are
      driven by a device specific driver, which knows the device
      capabilities.  If unaligned accesses are not supported by the device,
      we don't want them to work in a VM by performing multiple aligned
      accesses to compose the unaligned access.  A down-side of this
      philosophy is that the xp command from the monitor attempts to use
      the largest available access weidth, unaware of the underlying
      device.  Using memcpy had this same restriction, but at least now an
      operator can dump individual registers, even if blocks of device
      memory may result in access widths beyond the capabilities of a
      given device (RTL NICs only support up to dword).
      Reported-by: NThorsten Kohfeldt <thorsten.kohfeldt@gmx.de>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a2e242b
    • A
      memory: Replace skip_dump flag with "ram_device" · 21e00fa5
      Alex Williamson 提交于
      Setting skip_dump on a MemoryRegion allows us to modify one specific
      code path, but the restriction we're trying to address encompasses
      more than that.  If we have a RAM MemoryRegion backed by a physical
      device, it not only restricts our ability to dump that region, but
      also affects how we should manipulate it.  Here we recognize that
      MemoryRegions do not change to sometimes allow dumps and other times
      not, so we replace setting the skip_dump flag with a new initializer
      so that we know exactly the type of region to which we're applying
      this behavior.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      21e00fa5
    • A
      qapi: allow blockdev-add for NFS · aa2623d8
      Ashijeet Acharya 提交于
      Introduce new object 'BlockdevOptionsNFS' in qapi/block-core.json to
      support blockdev-add for NFS network protocol driver. Also make a new
      struct NFSServer to support tcp connection.
      Signed-off-by: NAshijeet Acharya <ashijeetacharya@gmail.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      aa2623d8
    • A
      block/nfs: Introduce runtime_opts in NFS · 94d6a7a7
      Ashijeet Acharya 提交于
      Make NFS block driver use various fine grained runtime_opts.
      Set .bdrv_parse_filename() to nfs_parse_filename() and introduce two
      new functions nfs_parse_filename() and nfs_parse_uri() to help parsing
      the URI.
      Add a new option "server" which then accepts a new struct NFSServer.
      Signed-off-by: NAshijeet Acharya <ashijeetacharya@gmail.com>
      [ kwolf: Fixed client->path ]
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      94d6a7a7
    • E
      block: Mention replication in BlockdevDriver enum docs · 68875e9f
      Eric Blake 提交于
      Missed in commit 82ac5543.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      68875e9f
    • T
    • T
      raw_bsd: add offset and size options · 2fdc7045
      Tomáš Golembiovský 提交于
      Added two new options 'offset' and 'size'. This makes it possible to use
      only part of the file as a device. This can be used e.g. to limit the
      access only to single partition in a disk image or use a disk inside a
      tar archive (like OVA).
      
      When 'size' is specified we do our best to honour it.
      Signed-off-by: NTomáš Golembiovský <tgolembi@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      2fdc7045
    • A
      qemu-iotests: Test the 'base-node' parameter of 'block-stream' · 7eb13c9d
      Alberto Garcia 提交于
      The block-stream command has traditionally used the 'base' parameter
      to indicate the image to copy the data from. This test checks that the
      'base-node' parameter can also be used for the same purpose.
      Signed-off-by: NAlberto Garcia <berto@igalia.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      7eb13c9d
    • A
      block: Add 'base-node' parameter to the 'block-stream' command · 312fe09c
      Alberto Garcia 提交于
      The way to specify the node from which to copy data in the
      block-stream operation is by using the 'base' parameter. This
      parameter however takes a file name, not a node name.
      
      Since we want to be able to perform this operation using only node
      names, this patch adds a new 'base-node' parameter.
      Signed-off-by: NAlberto Garcia <berto@igalia.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      312fe09c
    • A
      qemu-iotests: Test streaming to a Quorum child · 48361afb
      Alberto Garcia 提交于
      Quorum children are special in the sense that they're not directly
      attached to a block backend but they're not used as backing images
      either. However the intermediate block streaming code supports
      streaming to them. This is a test case for that scenario.
      Signed-off-by: NAlberto Garcia <berto@igalia.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      48361afb
    • A
      qemu-iotests: Add iotests.supports_quorum() · b0f90495
      Alberto Garcia 提交于
      There's many tests that need Quorum support in order to run. At the
      moment each test implements its own check to see if Quorum is
      enabled. This patch centralizes all those checks in a new function
      called iotests.supports_quorum().
      Signed-off-by: NAlberto Garcia <berto@igalia.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      b0f90495