1. 08 5月, 2019 1 次提交
    • L
      vfio/pci: use correct format characters · 4d043d3d
      Louis Taylor 提交于
      [ Upstream commit 426b046b748d1f47e096e05bdcc6fb4172791307 ]
      
      When compiling with -Wformat, clang emits the following warnings:
      
      drivers/vfio/pci/vfio_pci.c:1601:5: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                      ^~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1601:13: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                              ^~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1601:21: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                                      ^~~~~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1601:32: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                                                 ^~~~~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1605:5: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                      ^~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1605:13: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                              ^~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1605:21: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                                      ^~~~~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1605:32: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                                                 ^~~~~~~~~
      The types of these arguments are unconditionally defined, so this patch
      updates the format character to the correct ones for unsigned ints.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: NLouis Taylor <louis@kragniz.eu>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4d043d3d
  2. 07 8月, 2018 2 次提交
  3. 20 7月, 2018 2 次提交
  4. 19 7月, 2018 1 次提交
  5. 19 6月, 2018 1 次提交
  6. 27 3月, 2018 3 次提交
  7. 22 3月, 2018 1 次提交
  8. 21 12月, 2017 3 次提交
    • A
      vfio-pci: Allow mapping MSIX BAR · a32295c6
      Alexey Kardashevskiy 提交于
      By default VFIO disables mapping of MSIX BAR to the userspace as
      the userspace may program it in a way allowing spurious interrupts;
      instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl.
      In order to eliminate guessing from the userspace about what is
      mmapable, VFIO also advertises a sparse list of regions allowed to mmap.
      
      This works fine as long as the system page size equals to the MSIX
      alignment requirement which is 4KB. However with a bigger page size
      the existing code prohibits mapping non-MSIX parts of a page with MSIX
      structures so these parts have to be emulated via slow reads/writes on
      a VFIO device fd. If these emulated bits are accessed often, this has
      serious impact on performance.
      
      This allows mmap of the entire BAR containing MSIX vector table.
      
      This removes the sparse capability for PCI devices as it becomes useless.
      
      As the userspace needs to know for sure whether mmapping of the MSIX
      vector containing data can succeed, this adds a new capability -
      VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - which explicitly tells the userspace
      that the entire BAR can be mmapped.
      
      This does not touch the MSIX mangling in the BAR read/write handlers as
      we are doing this just to enable direct access to non MSIX registers.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw - fixup whitespace, trim function name]
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      a32295c6
    • A
      vfio: Simplify capability helper · dda01f78
      Alex Williamson 提交于
      The vfio_info_add_capability() helper requires the caller to pass a
      capability ID, which it then uses to fill in header fields, assuming
      hard coded versions.  This makes for an awkward and rigid interface.
      The only thing we want this helper to do is allocate sufficient
      space in the caps buffer and chain this capability into the list.
      Reduce it to that simple task.
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NZhenyu Wang <zhenyuw@linux.intel.com>
      Reviewed-by: NKirti Wankhede <kwankhede@nvidia.com>
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      dda01f78
    • A
      vfio-pci: Mask INTx if a device is not capabable of enabling it · 2170dd04
      Alexey Kardashevskiy 提交于
      At the moment VFIO rightfully assumes that INTx is supported if
      the interrupt pin is not set to zero in the device config space.
      However if that is not the case (the pin is not zero but pdev->irq is),
      vfio_intx_enable() fails.
      
      In order to prevent the userspace from trying to enable INTx when we know
      that it cannot work, let's mask the PCI_INTERRUPT_PIN register.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      2170dd04
  9. 03 10月, 2017 2 次提交
    • A
      vfio/pci: Virtualize Maximum Read Request Size · cf0d53ba
      Alex Williamson 提交于
      MRRS defines the maximum read request size a device is allowed to
      make.  Drivers will often increase this to allow more data transfer
      with a single request.  Completions to this request are bound by the
      MPS setting for the bus.  Aside from device quirks (none known), it
      doesn't seem to make sense to set an MRRS value less than MPS, yet
      this is a likely scenario given that user drivers do not have a
      system-wide view of the PCI topology.  Virtualize MRRS such that the
      user can set MRRS >= MPS, but use MPS as the floor value that we'll
      write to hardware.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      cf0d53ba
    • A
      vfio/pci: Virtualize Maximum Payload Size · 52318497
      Alex Williamson 提交于
      With virtual PCI-Express chipsets, we now see userspace/guest drivers
      trying to match the physical MPS setting to a virtual downstream port.
      Of course a lone physical device surrounded by virtual interconnects
      cannot make a correct decision for a proper MPS setting.  Instead,
      let's virtualize the MPS control register so that writes through to
      hardware are disallowed.  Userspace drivers like QEMU assume they can
      write anything to the device and we'll filter out anything dangerous.
      Since mismatched MPS can lead to AER and other faults, let's add it
      to the kernel side rather than relying on userspace virtualization to
      handle it.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      52318497
  10. 28 7月, 2017 1 次提交
  11. 27 7月, 2017 1 次提交
  12. 13 6月, 2017 1 次提交
  13. 04 1月, 2017 1 次提交
  14. 30 12月, 2016 1 次提交
  15. 13 12月, 2016 1 次提交
  16. 19 11月, 2016 1 次提交
  17. 17 11月, 2016 2 次提交
  18. 27 10月, 2016 1 次提交
    • V
      vfio/pci: Fix integer overflows, bitmask check · 05692d70
      Vlad Tsyrklevich 提交于
      The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize
      user-supplied integers, potentially allowing memory corruption. This
      patch adds appropriate integer overflow checks, checks the range bounds
      for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element
      in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set.
      VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in
      vfio_pci_set_irqs_ioctl().
      
      Furthermore, a kzalloc is changed to a kcalloc because the use of a
      kzalloc with an integer multiplication allowed an integer overflow
      condition to be reached without this patch. kcalloc checks for overflow
      and should prevent a similar occurrence.
      Signed-off-by: NVlad Tsyrklevich <vlad@tsyrklevich.net>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      05692d70
  19. 30 9月, 2016 1 次提交
  20. 27 9月, 2016 2 次提交
    • A
      vfio-pci: Disable INTx after MSI/X teardown · c93a97ee
      Alex Williamson 提交于
      The MSI/X shutdown path can gratuitously enable INTx, which is not
      something we want to happen if we're dealing with broken INTx device.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      c93a97ee
    • A
      vfio-pci: Virtualize PCIe & AF FLR · ddf9dc0e
      Alex Williamson 提交于
      We use a BAR restore trick to try to detect when a user has performed
      a device reset, possibly through FLR or other backdoors, to put things
      back into a working state.  This is important for backdoor resets, but
      we can actually just virtualize the "front door" resets provided via
      PCIe and AF FLR.  Set these bits as virtualized + writable, allowing
      the default write to set them in vconfig, then we can simply check the
      bit, perform an FLR of our own, and clear the bit.  We don't actually
      have the granularity in PCI to specify the type of reset we want to
      do, but generally devices don't implement both PCIe and AF FLR and
      we'll favor these over other types of reset, so we should generally
      lineup.  We do test whether the device provides the requested FLR type
      to stay consistent with hardware capabilities though.
      
      This seems to fix several instance of devices getting into bad states
      with userspace drivers, like dpdk, running inside a VM.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NGreg Rose <grose@lightfleet.com>
      ddf9dc0e
  21. 30 8月, 2016 1 次提交
  22. 09 8月, 2016 1 次提交
    • A
      vfio/pci: Fix NULL pointer oops in error interrupt setup handling · c8952a70
      Alex Williamson 提交于
      There are multiple cases in vfio_pci_set_ctx_trigger_single() where
      we assume we can safely read from our data pointer without actually
      checking whether the user has passed any data via the count field.
      VFIO_IRQ_SET_DATA_NONE in particular is entirely broken since we
      attempt to pull an int32_t file descriptor out before even checking
      the data type.  The other data types assume the data pointer contains
      one element of their type as well.
      
      In part this is good news because we were previously restricted from
      doing much sanitization of parameters because it was missed in the
      past and we didn't want to break existing users.  Clearly DATA_NONE
      is completely broken, so it must not have any users and we can fix
      it up completely.  For DATA_BOOL and DATA_EVENTFD, we'll just
      protect ourselves, returning error when count is zero since we
      previously would have oopsed.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Reported-by: NChris Thompson <the_cartographer@hotmail.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      c8952a70
  23. 09 7月, 2016 1 次提交
    • Y
      vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive · 05f0c03f
      Yongji Xie 提交于
      Current vfio-pci implementation disallows to mmap
      sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio
      page may be shared with other BARs. This will cause some
      performance issues when we passthrough a PCI device with
      this kind of BARs. Guest will be not able to handle the mmio
      accesses to the BARs which leads to mmio emulations in host.
      
      However, not all sub-page BARs will share page with other BARs.
      We should allow to mmap the sub-page MMIO BARs which we can
      make sure will not share page with other BARs.
      
      This patch adds support for this case. And we try to add a
      dummy resource to reserve the remainder of the page which
      hot-add device's BAR might be assigned into. But it's not
      necessary to handle the case when the BAR is not page aligned.
      Because we can't expect the BAR will be assigned into the same
      location in a page in guest when we passthrough the BAR. And
      it's hard to access this BAR in userspace because we have
      no way to get the BAR's location in a page.
      Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      05f0c03f
  24. 01 6月, 2016 1 次提交
  25. 30 5月, 2016 1 次提交
  26. 20 5月, 2016 1 次提交
    • A
      vfio_pci: Test for extended capabilities if config space > 256 bytes · f7055280
      Alexey Kardashevskiy 提交于
      PCI-Express spec says that reading 4 bytes at offset 100h should return
      zero if there is no extended capability so VFIO reads this dword to
      know if there are extended capabilities.
      
      However it is not always possible to access the extended space so
      generic PCI code in pci_cfg_space_size_ext() checks if
      pci_read_config_dword() can read beyond 100h and if the check fails,
      it sets the config space size to 100h.
      
      VFIO does its own extended capabilities check by reading at offset 100h
      which may produce 0xffffffff which VFIO treats as the extended config
      space presense and calls vfio_ecap_init() which fails to parse
      capabilities (which is expected) but right before the exit, it writes
      zero at offset 100h which is beyond the buffer allocated for
      vdev->vconfig (which is 256 bytes) which leads to random memory
      corruption.
      
      This makes VFIO only check for the extended capabilities if
      the discovered config size is more than 256 bytes.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      f7055280
  27. 29 4月, 2016 2 次提交
    • A
      vfio/pci: Add test for BAR restore · dc928109
      Alex Williamson 提交于
      If a device is reset without the memory or i/o bits enabled in the
      command register we may not detect it, potentially leaving the device
      without valid BAR programming.  Add an additional test to check the
      BARs on each write to the command register.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      dc928109
    • A
      vfio/pci: Hide broken INTx support from user · 45074405
      Alex Williamson 提交于
      INTx masking has two components, the first is that we need the ability
      to prevent the device from continuing to assert INTx.  This is
      provided via the DisINTx bit in the command register and is the only
      thing we can really probe for when testing if INTx masking is
      supported.  The second component is that the device needs to indicate
      if INTx is asserted via the interrupt status bit in the device status
      register.  With these two features we can generically determine if one
      of the devices we own is asserting INTx, signal the user, and mask the
      interrupt while the user services the device.
      
      Generally if one or both of these components is broken we resort to
      APIC level interrupt masking, which requires an exclusive interrupt
      since we have no way to determine the source of the interrupt in a
      shared configuration.  This often makes it difficult or impossible to
      configure the system for userspace use of the device, for an interrupt
      mode that the user may not need.
      
      One possible configuration of broken INTx masking is that the DisINTx
      support is fully functional, but the interrupt status bit never
      signals interrupt assertion.  In this case we do have the ability to
      prevent the device from asserting INTx, but lack the ability to
      identify the interrupt source.  For this case we can simply pretend
      that the device lacks INTx support entirely, keeping DisINTx set on
      the physical device, virtualizing this bit for the user, and
      virtualizing the interrupt pin register to indicate no INTx support.
      We already support virtualization of the DisINTx bit and already
      virtualize the interrupt pin for platforms without INTx support.  By
      tying these components together, setting DisINTx on open and reset,
      and identifying devices broken in this particular way, we can provide
      support for them w/o the handicap of APIC level INTx masking.
      
      Intel i40e (XL710/X710) 10/20/40GbE NICs have been identified as being
      broken in this specific way.  We leave the vfio-pci.nointxmask option
      as a mechanism to bypass this support, enabling INTx on the device
      with all the requirements of APIC level masking.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Cc: John Ronciak <john.ronciak@intel.com>
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      45074405
  28. 28 2月, 2016 1 次提交
  29. 26 2月, 2016 1 次提交
  30. 23 2月, 2016 1 次提交