1. 27 10月, 2016 1 次提交
    • V
      vfio/pci: Fix integer overflows, bitmask check · 05692d70
      Vlad Tsyrklevich 提交于
      The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize
      user-supplied integers, potentially allowing memory corruption. This
      patch adds appropriate integer overflow checks, checks the range bounds
      for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element
      in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set.
      VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in
      vfio_pci_set_irqs_ioctl().
      
      Furthermore, a kzalloc is changed to a kcalloc because the use of a
      kzalloc with an integer multiplication allowed an integer overflow
      condition to be reached without this patch. kcalloc checks for overflow
      and should prevent a similar occurrence.
      Signed-off-by: NVlad Tsyrklevich <vlad@tsyrklevich.net>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      05692d70
  2. 30 9月, 2016 1 次提交
  3. 27 9月, 2016 2 次提交
    • A
      vfio-pci: Disable INTx after MSI/X teardown · c93a97ee
      Alex Williamson 提交于
      The MSI/X shutdown path can gratuitously enable INTx, which is not
      something we want to happen if we're dealing with broken INTx device.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      c93a97ee
    • A
      vfio-pci: Virtualize PCIe & AF FLR · ddf9dc0e
      Alex Williamson 提交于
      We use a BAR restore trick to try to detect when a user has performed
      a device reset, possibly through FLR or other backdoors, to put things
      back into a working state.  This is important for backdoor resets, but
      we can actually just virtualize the "front door" resets provided via
      PCIe and AF FLR.  Set these bits as virtualized + writable, allowing
      the default write to set them in vconfig, then we can simply check the
      bit, perform an FLR of our own, and clear the bit.  We don't actually
      have the granularity in PCI to specify the type of reset we want to
      do, but generally devices don't implement both PCIe and AF FLR and
      we'll favor these over other types of reset, so we should generally
      lineup.  We do test whether the device provides the requested FLR type
      to stay consistent with hardware capabilities though.
      
      This seems to fix several instance of devices getting into bad states
      with userspace drivers, like dpdk, running inside a VM.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NGreg Rose <grose@lightfleet.com>
      ddf9dc0e
  4. 30 8月, 2016 1 次提交
  5. 09 8月, 2016 1 次提交
    • A
      vfio/pci: Fix NULL pointer oops in error interrupt setup handling · c8952a70
      Alex Williamson 提交于
      There are multiple cases in vfio_pci_set_ctx_trigger_single() where
      we assume we can safely read from our data pointer without actually
      checking whether the user has passed any data via the count field.
      VFIO_IRQ_SET_DATA_NONE in particular is entirely broken since we
      attempt to pull an int32_t file descriptor out before even checking
      the data type.  The other data types assume the data pointer contains
      one element of their type as well.
      
      In part this is good news because we were previously restricted from
      doing much sanitization of parameters because it was missed in the
      past and we didn't want to break existing users.  Clearly DATA_NONE
      is completely broken, so it must not have any users and we can fix
      it up completely.  For DATA_BOOL and DATA_EVENTFD, we'll just
      protect ourselves, returning error when count is zero since we
      previously would have oopsed.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Reported-by: NChris Thompson <the_cartographer@hotmail.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      c8952a70
  6. 09 7月, 2016 1 次提交
    • Y
      vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive · 05f0c03f
      Yongji Xie 提交于
      Current vfio-pci implementation disallows to mmap
      sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio
      page may be shared with other BARs. This will cause some
      performance issues when we passthrough a PCI device with
      this kind of BARs. Guest will be not able to handle the mmio
      accesses to the BARs which leads to mmio emulations in host.
      
      However, not all sub-page BARs will share page with other BARs.
      We should allow to mmap the sub-page MMIO BARs which we can
      make sure will not share page with other BARs.
      
      This patch adds support for this case. And we try to add a
      dummy resource to reserve the remainder of the page which
      hot-add device's BAR might be assigned into. But it's not
      necessary to handle the case when the BAR is not page aligned.
      Because we can't expect the BAR will be assigned into the same
      location in a page in guest when we passthrough the BAR. And
      it's hard to access this BAR in userspace because we have
      no way to get the BAR's location in a page.
      Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      05f0c03f
  7. 01 6月, 2016 1 次提交
  8. 30 5月, 2016 1 次提交
  9. 20 5月, 2016 1 次提交
    • A
      vfio_pci: Test for extended capabilities if config space > 256 bytes · f7055280
      Alexey Kardashevskiy 提交于
      PCI-Express spec says that reading 4 bytes at offset 100h should return
      zero if there is no extended capability so VFIO reads this dword to
      know if there are extended capabilities.
      
      However it is not always possible to access the extended space so
      generic PCI code in pci_cfg_space_size_ext() checks if
      pci_read_config_dword() can read beyond 100h and if the check fails,
      it sets the config space size to 100h.
      
      VFIO does its own extended capabilities check by reading at offset 100h
      which may produce 0xffffffff which VFIO treats as the extended config
      space presense and calls vfio_ecap_init() which fails to parse
      capabilities (which is expected) but right before the exit, it writes
      zero at offset 100h which is beyond the buffer allocated for
      vdev->vconfig (which is 256 bytes) which leads to random memory
      corruption.
      
      This makes VFIO only check for the extended capabilities if
      the discovered config size is more than 256 bytes.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      f7055280
  10. 29 4月, 2016 2 次提交
    • A
      vfio/pci: Add test for BAR restore · dc928109
      Alex Williamson 提交于
      If a device is reset without the memory or i/o bits enabled in the
      command register we may not detect it, potentially leaving the device
      without valid BAR programming.  Add an additional test to check the
      BARs on each write to the command register.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      dc928109
    • A
      vfio/pci: Hide broken INTx support from user · 45074405
      Alex Williamson 提交于
      INTx masking has two components, the first is that we need the ability
      to prevent the device from continuing to assert INTx.  This is
      provided via the DisINTx bit in the command register and is the only
      thing we can really probe for when testing if INTx masking is
      supported.  The second component is that the device needs to indicate
      if INTx is asserted via the interrupt status bit in the device status
      register.  With these two features we can generically determine if one
      of the devices we own is asserting INTx, signal the user, and mask the
      interrupt while the user services the device.
      
      Generally if one or both of these components is broken we resort to
      APIC level interrupt masking, which requires an exclusive interrupt
      since we have no way to determine the source of the interrupt in a
      shared configuration.  This often makes it difficult or impossible to
      configure the system for userspace use of the device, for an interrupt
      mode that the user may not need.
      
      One possible configuration of broken INTx masking is that the DisINTx
      support is fully functional, but the interrupt status bit never
      signals interrupt assertion.  In this case we do have the ability to
      prevent the device from asserting INTx, but lack the ability to
      identify the interrupt source.  For this case we can simply pretend
      that the device lacks INTx support entirely, keeping DisINTx set on
      the physical device, virtualizing this bit for the user, and
      virtualizing the interrupt pin register to indicate no INTx support.
      We already support virtualization of the DisINTx bit and already
      virtualize the interrupt pin for platforms without INTx support.  By
      tying these components together, setting DisINTx on open and reset,
      and identifying devices broken in this particular way, we can provide
      support for them w/o the handicap of APIC level INTx masking.
      
      Intel i40e (XL710/X710) 10/20/40GbE NICs have been identified as being
      broken in this specific way.  We leave the vfio-pci.nointxmask option
      as a mechanism to bypass this support, enabling INTx on the device
      with all the requirements of APIC level masking.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Cc: John Ronciak <john.ronciak@intel.com>
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      45074405
  11. 28 2月, 2016 1 次提交
  12. 26 2月, 2016 1 次提交
  13. 23 2月, 2016 7 次提交
  14. 22 12月, 2015 1 次提交
    • A
      vfio: Include No-IOMMU mode · 03a76b60
      Alex Williamson 提交于
      There is really no way to safely give a user full access to a DMA
      capable device without an IOMMU to protect the host system.  There is
      also no way to provide DMA translation, for use cases such as device
      assignment to virtual machines.  However, there are still those users
      that want userspace drivers even under those conditions.  The UIO
      driver exists for this use case, but does not provide the degree of
      device access and programming that VFIO has.  In an effort to avoid
      code duplication, this introduces a No-IOMMU mode for VFIO.
      
      This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
      the "enable_unsafe_noiommu_mode" option on the vfio driver.  This
      should make it very clear that this mode is not safe.  Additionally,
      CAP_SYS_RAWIO privileges are necessary to work with groups and
      containers using this mode.  Groups making use of this support are
      named /dev/vfio/noiommu-$GROUP and can only make use of the special
      VFIO_NOIOMMU_IOMMU for the container.  Use of this mode, specifically
      binding a device without a native IOMMU group to a VFIO bus driver
      will taint the kernel and should therefore not be considered
      supported.  This patch includes no-iommu support for the vfio-pci bus
      driver only.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      03a76b60
  15. 04 12月, 2015 1 次提交
  16. 20 11月, 2015 1 次提交
  17. 09 11月, 2015 1 次提交
    • D
      vfio/pci: make an array larger · 222e684c
      Dan Carpenter 提交于
      Smatch complains about a possible out of bounds error:
      
      	drivers/vfio/pci/vfio_pci_config.c:1241 vfio_cap_init()
      	error: buffer overflow 'pci_cap_length' 20 <= 20
      
      The problem is that pci_cap_length[] was defined as large enough to
      hold "PCI_CAP_ID_AF + 1" elements.  The code in vfio_cap_init() assumes
      it has PCI_CAP_ID_MAX + 1 elements.  Originally, PCI_CAP_ID_AF and
      PCI_CAP_ID_MAX were the same but then we introduced PCI_CAP_ID_EA in
      commit f80b0ba9 ("PCI: Add Enhanced Allocation register entries")
      so now the array is too small.
      
      Let's fix this by making the array size PCI_CAP_ID_MAX + 1.  And let's
      make a similar change to pci_ext_cap_length[] for consistency.  Also
      both these arrays can be made const.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      222e684c
  18. 05 11月, 2015 1 次提交
    • A
      vfio: Include No-IOMMU mode · 033291ec
      Alex Williamson 提交于
      There is really no way to safely give a user full access to a DMA
      capable device without an IOMMU to protect the host system.  There is
      also no way to provide DMA translation, for use cases such as device
      assignment to virtual machines.  However, there are still those users
      that want userspace drivers even under those conditions.  The UIO
      driver exists for this use case, but does not provide the degree of
      device access and programming that VFIO has.  In an effort to avoid
      code duplication, this introduces a No-IOMMU mode for VFIO.
      
      This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
      the "enable_unsafe_noiommu_mode" option on the vfio driver.  This
      should make it very clear that this mode is not safe.  Additionally,
      CAP_SYS_RAWIO privileges are necessary to work with groups and
      containers using this mode.  Groups making use of this support are
      named /dev/vfio/noiommu-$GROUP and can only make use of the special
      VFIO_NOIOMMU_IOMMU for the container.  Use of this mode, specifically
      binding a device without a native IOMMU group to a VFIO bus driver
      will taint the kernel and should therefore not be considered
      supported.  This patch includes no-iommu support for the vfio-pci bus
      driver only.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      033291ec
  19. 28 10月, 2015 1 次提交
    • A
      vfio/pci: Use kernel VPD access functions · 4e1a6355
      Alex Williamson 提交于
      The PCI VPD capability operates on a set of window registers in PCI
      config space.  Writing to the address register triggers either a read
      or write, depending on the setting of the PCI_VPD_ADDR_F bit within
      the address register.  The data register provides either the source
      for writes or the target for reads.
      
      This model is susceptible to being broken by concurrent access, for
      which the kernel has adopted a set of access functions to serialize
      these registers.  Additionally, commits like 932c435c ("PCI: Add
      dev_flags bit to access VPD through function 0") and 7aa6ca4d
      ("PCI: Add VPD function 0 quirk for Intel Ethernet devices") indicate
      that VPD registers can be shared between functions on multifunction
      devices creating dependencies between otherwise independent devices.
      
      Fortunately it's quite easy to emulate the VPD registers, simply
      storing copies of the address and data registers in memory and
      triggering a VPD read or write on writes to the address register.
      This allows vfio users to avoid seeing spurious register changes from
      accesses on other devices and enables the use of shared quirks in the
      host kernel.  We can theoretically still race with access through
      sysfs, but the window of opportunity is much smaller.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NMark Rustad <mark.d.rustad@intel.com>
      4e1a6355
  20. 01 10月, 2015 1 次提交
  21. 10 6月, 2015 1 次提交
    • A
      vfio/pci: Fix racy vfio_device_get_from_dev() call · 20f30017
      Alex Williamson 提交于
      Testing the driver for a PCI device is racy, it can be all but
      complete in the release path and still report the driver as ours.
      Therefore we can't trust drvdata to be valid.  This race can sometimes
      be seen when one port of a multifunction device is being unbound from
      the vfio-pci driver while another function is being released by the
      user and attempting a bus reset.  The device in the remove path is
      found as a dependent device for the bus reset of the release path
      device, the driver is still set to vfio-pci, but the drvdata has
      already been cleared, resulting in a null pointer dereference.
      
      To resolve this, fix vfio_device_get_from_dev() to not take the
      dev_get_drvdata() shortcut and instead traverse through the
      iommu_group, vfio_group, vfio_device path to get a reference we
      can trust.  Once we have that reference, we know the device isn't
      in transition and we can test to make sure the driver is still what
      we expect, so that we don't interfere with devices we don't own.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      20f30017
  22. 02 5月, 2015 1 次提交
  23. 08 4月, 2015 6 次提交
  24. 17 3月, 2015 4 次提交