1. 27 5月, 2020 1 次提交
    • Q
      vfio/pci: fix memory leaks of eventfd ctx · 1518ac27
      Qian Cai 提交于
      Finished a qemu-kvm (-device vfio-pci,host=0001:01:00.0) triggers a few
      memory leaks after a while because vfio_pci_set_ctx_trigger_single()
      calls eventfd_ctx_fdget() without the matching eventfd_ctx_put() later.
      Fix it by calling eventfd_ctx_put() for those memory in
      vfio_pci_release() before vfio_device_release().
      
      unreferenced object 0xebff008981cc2b00 (size 128):
        comm "qemu-kvm", pid 4043, jiffies 4294994816 (age 9796.310s)
        hex dump (first 32 bytes):
          01 00 00 00 6b 6b 6b 6b 00 00 00 00 ad 4e ad de  ....kkkk.....N..
          ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
        backtrace:
          [<00000000917e8f8d>] slab_post_alloc_hook+0x74/0x9c
          [<00000000df0f2aa2>] kmem_cache_alloc_trace+0x2b4/0x3d4
          [<000000005fcec025>] do_eventfd+0x54/0x1ac
          [<0000000082791a69>] __arm64_sys_eventfd2+0x34/0x44
          [<00000000b819758c>] do_el0_svc+0x128/0x1dc
          [<00000000b244e810>] el0_sync_handler+0xd0/0x268
          [<00000000d495ef94>] el0_sync+0x164/0x180
      unreferenced object 0x29ff008981cc4180 (size 128):
        comm "qemu-kvm", pid 4043, jiffies 4294994818 (age 9796.290s)
        hex dump (first 32 bytes):
          01 00 00 00 6b 6b 6b 6b 00 00 00 00 ad 4e ad de  ....kkkk.....N..
          ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff  ....kkkk........
        backtrace:
          [<00000000917e8f8d>] slab_post_alloc_hook+0x74/0x9c
          [<00000000df0f2aa2>] kmem_cache_alloc_trace+0x2b4/0x3d4
          [<000000005fcec025>] do_eventfd+0x54/0x1ac
          [<0000000082791a69>] __arm64_sys_eventfd2+0x34/0x44
          [<00000000b819758c>] do_el0_svc+0x128/0x1dc
          [<00000000b244e810>] el0_sync_handler+0xd0/0x268
          [<00000000d495ef94>] el0_sync+0x164/0x180
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      1518ac27
  2. 18 5月, 2020 2 次提交
    • A
      vfio-pci: Invalidate mmaps and block MMIO access on disabled memory · abafbc55
      Alex Williamson 提交于
      Accessing the disabled memory space of a PCI device would typically
      result in a master abort response on conventional PCI, or an
      unsupported request on PCI express.  The user would generally see
      these as a -1 response for the read return data and the write would be
      silently discarded, possibly with an uncorrected, non-fatal AER error
      triggered on the host.  Some systems however take it upon themselves
      to bring down the entire system when they see something that might
      indicate a loss of data, such as this discarded write to a disabled
      memory space.
      
      To avoid this, we want to try to block the user from accessing memory
      spaces while they're disabled.  We start with a semaphore around the
      memory enable bit, where writers modify the memory enable state and
      must be serialized, while readers make use of the memory region and
      can access in parallel.  Writers include both direct manipulation via
      the command register, as well as any reset path where the internal
      mechanics of the reset may both explicitly and implicitly disable
      memory access, and manipulation of the MSI-X configuration, where the
      MSI-X vector table resides in MMIO space of the device.  Readers
      include the read and write file ops to access the vfio device fd
      offsets as well as memory mapped access.  In the latter case, we make
      use of our new vma list support to zap, or invalidate, those memory
      mappings in order to force them to be faulted back in on access.
      
      Our semaphore usage will stall user access to MMIO spaces across
      internal operations like reset, but the user might experience new
      behavior when trying to access the MMIO space while disabled via the
      PCI command register.  Access via read or write while disabled will
      return -EIO and access via memory maps will result in a SIGBUS.  This
      is expected to be compatible with known use cases and potentially
      provides better error handling capabilities than present in the
      hardware, while avoiding the more readily accessible and severe
      platform error responses that might otherwise occur.
      
      Fixes: CVE-2020-12888
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      abafbc55
    • A
      vfio-pci: Fault mmaps to enable vma tracking · 11c4cd07
      Alex Williamson 提交于
      Rather than calling remap_pfn_range() when a region is mmap'd, setup
      a vm_ops handler to support dynamic faulting of the range on access.
      This allows us to manage a list of vmas actively mapping the area that
      we can later use to invalidate those mappings.  The open callback
      invalidates the vma range so that all tracking is inserted in the
      fault handler and removed in the close handler.
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      11c4cd07
  3. 24 3月, 2020 6 次提交
    • A
      vfio/pci: Cleanup .probe() exit paths · b66574a3
      Alex Williamson 提交于
      The cleanup is getting a tad long.
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Reviewed-by: NKevin Tian <kevin.tian@intel.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      b66574a3
    • A
      vfio/pci: Remove dev_fmt definition · 959e1b75
      Alex Williamson 提交于
      It currently results in messages like:
      
       "vfio-pci 0000:03:00.0: vfio_pci: ..."
      
      Which is quite a bit redundant.
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Reviewed-by: NKevin Tian <kevin.tian@intel.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      959e1b75
    • A
      vfio/pci: Add sriov_configure support · 137e5531
      Alex Williamson 提交于
      With the VF Token interface we can now expect that a vfio userspace
      driver must be in collaboration with the PF driver, an unwitting
      userspace driver will not be able to get past the GET_DEVICE_FD step
      in accessing the device.  We can now move on to actually allowing
      SR-IOV to be enabled by vfio-pci on the PF.  Support for this is not
      enabled by default in this commit, but it does provide a module option
      for this to be enabled (enable_sriov=1).  Enabling VFs is rather
      straightforward, except we don't want to risk that a VF might get
      autoprobed and bound to other drivers, so a bus notifier is used to
      "capture" VFs to vfio-pci using the driver_override support.  We
      assume any later action to bind the device to other drivers is
      condoned by the system admin and allow it with a log warning.
      
      vfio-pci will disable SR-IOV on a PF before releasing the device,
      allowing a VF driver to be assured other drivers cannot take over the
      PF and that any other userspace driver must know the shared VF token.
      This support also does not provide a mechanism for the PF userspace
      driver itself to manipulate SR-IOV through the vfio API.  With this
      patch SR-IOV can only be enabled via the host sysfs interface and the
      PF driver user cannot create or remove VFs.
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Reviewed-by: NKevin Tian <kevin.tian@intel.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      137e5531
    • A
      vfio: Introduce VFIO_DEVICE_FEATURE ioctl and first user · 43eeeecc
      Alex Williamson 提交于
      The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device
      agnostic ioctl for setting, retrieving, and probing device features.
      This implementation provides a 16-bit field for specifying a feature
      index, where the data porition of the ioctl is determined by the
      semantics for the given feature.  Additional flag bits indicate the
      direction and nature of the operation; SET indicates user data is
      provided into the device feature, GET indicates the device feature is
      written out into user data.  The PROBE flag augments determining
      whether the given feature is supported, and if provided, whether the
      given operation on the feature is supported.
      
      The first user of this ioctl is for setting the vfio-pci VF token,
      where the user provides a shared secret key (UUID) on a SR-IOV PF
      device, which users must provide when opening associated VF devices.
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Reviewed-by: NKevin Tian <kevin.tian@intel.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      43eeeecc
    • A
      vfio/pci: Introduce VF token · cc20d799
      Alex Williamson 提交于
      If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not
      fully isolated from the PF.  The PF can always cause a denial of service
      to the VF, even if by simply resetting itself.  The degree to which a PF
      can access the data passed through a VF or interfere with its operation
      is dependent on a given SR-IOV implementation.  Therefore we want to
      avoid a scenario where an existing vfio-pci based userspace driver might
      assume the PF driver is trusted, for example assigning a PF to one VM
      and VF to another with some expectation of isolation.  IOMMU grouping
      could be a solution to this, but imposes an unnecessarily strong
      relationship between PF and VF drivers if they need to operate with the
      same IOMMU context.  Instead we introduce a "VF token", which is
      essentially just a shared secret between PF and VF drivers, implemented
      as a UUID.
      
      The VF token can be set by a vfio-pci based PF driver and must be known
      by the vfio-pci based VF driver in order to gain access to the device.
      This allows the degree to which this VF token is considered secret to be
      determined by the applications and environment.  For example a VM might
      generate a random UUID known only internally to the hypervisor while a
      userspace networking appliance might use a shared, or even well know,
      UUID among the application drivers.
      
      To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD interface is
      extended to accept key=value pairs in addition to the device name.  This
      allows us to most easily deny user access to the device without risk
      that existing userspace drivers assume region offsets, IRQs, and other
      device features, leading to more elaborate error paths.  The format of
      these options are expected to take the form:
      
      "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2"
      
      Where the device name is always provided first for compatibility and
      additional options are specified in a space separated list.  The
      relation between and requirements for the additional options will be
      vfio bus driver dependent, however unknown or unused option within this
      schema should return error.  This allow for future use of unknown
      options as well as a positive indication to the user that an option is
      used.
      
      An example VF token option would take this form:
      
      "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258"
      
      When accessing a VF where the PF is making use of vfio-pci, the user
      MUST provide the current vf_token.  When accessing a PF, the user MUST
      provide the current vf_token IF there are active VF users or MAY provide
      a vf_token in order to set the current VF token when no VF users are
      active.  The former requirement assures VF users that an unassociated
      driver cannot usurp the PF device.  These semantics also imply that a
      VF token MUST be set by a PF driver before VF drivers can access their
      device, the default token is random and mechanisms to read the token are
      not provided in order to protect the VF token of previous users.  Use of
      the vf_token option outside of these cases will return an error, as
      discussed above.
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Reviewed-by: NKevin Tian <kevin.tian@intel.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      cc20d799
    • A
      vfio/pci: Implement match ops · 467c084f
      Alex Williamson 提交于
      This currently serves the same purpose as the default implementation
      but will be expanded for additional functionality.
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Reviewed-by: NKevin Tian <kevin.tian@intel.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      467c084f
  4. 14 10月, 2019 1 次提交
  5. 23 8月, 2019 1 次提交
  6. 19 6月, 2019 1 次提交
  7. 23 4月, 2019 1 次提交
  8. 04 4月, 2019 1 次提交
    • L
      vfio/pci: use correct format characters · 426b046b
      Louis Taylor 提交于
      When compiling with -Wformat, clang emits the following warnings:
      
      drivers/vfio/pci/vfio_pci.c:1601:5: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                      ^~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1601:13: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                              ^~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1601:21: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                                      ^~~~~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1601:32: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                                                 ^~~~~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1605:5: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                      ^~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1605:13: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                              ^~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1605:21: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                                      ^~~~~~~~~
      
      drivers/vfio/pci/vfio_pci.c:1605:32: warning: format specifies type
            'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                      vendor, device, subvendor, subdevice,
                                                                 ^~~~~~~~~
      The types of these arguments are unconditionally defined, so this patch
      updates the format character to the correct ones for unsigned ints.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: NLouis Taylor <louis@kragniz.eu>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      426b046b
  9. 19 2月, 2019 2 次提交
    • E
      vfio_pci: Enable memory accesses before calling pci_map_rom · 0cfd027b
      Eric Auger 提交于
      pci_map_rom/pci_get_rom_size() performs memory access in the ROM.
      In case the Memory Space accesses were disabled, readw() is likely
      to trigger a synchronous external abort on some platforms.
      
      In case memory accesses were disabled, re-enable them before the
      call and disable them back again just after.
      
      Fixes: 89e1f7d4 ("vfio: Add PCI device driver")
      Signed-off-by: NEric Auger <eric.auger@redhat.com>
      Suggested-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      0cfd027b
    • A
      vfio/pci: Restore device state on PM transition · 51ef3a00
      Alex Williamson 提交于
      PCI core handles save and restore of device state around reset, but
      when using pci_set_power_state() we can unintentionally trigger a soft
      reset of the device, where PCI core only restores the BAR state.  If
      we're using vfio-pci's idle D3 support to try to put devices into low
      power when unused, this might trigger a reset when the device is woken
      for use.  Also power state management by the user, or within a guest,
      can put the device into D3 power state with potentially limited
      ability to restore the device if it should undergo a reset.  The PCI
      spec does not define the extent of a soft reset and many devices
      reporting soft reset on D3->D0 transition do not undergo a PCI config
      space reset.  It's therefore assumed safe to unconditionally restore
      the remainder of the state if the device indicates soft reset
      support, even on a user initiated wakeup.
      
      Implement a wrapper in vfio-pci to tag devices reporting PM reset
      support, save their state on transitions into D3 and restore on
      transitions back to D0.
      Reported-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      51ef3a00
  10. 21 12月, 2018 3 次提交
    • A
      vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver · 7f928917
      Alexey Kardashevskiy 提交于
      POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not
      pluggable PCIe devices but still have PCIe links which are used
      for config space and MMIO. In addition to that the GPUs have 6 NVLinks
      which are connected to other GPUs and the POWER9 CPU. POWER9 chips
      have a special unit on a die called an NPU which is an NVLink2 host bus
      adapter with p2p connections to 2 to 3 GPUs, 3 or 2 NVLinks to each.
      These systems also support ATS (address translation services) which is
      a part of the NVLink2 protocol. Such GPUs also share on-board RAM
      (16GB or 32GB) to the system via the same NVLink2 so a CPU has
      cache-coherent access to a GPU RAM.
      
      This exports GPU RAM to the userspace as a new VFIO device region. This
      preregisters the new memory as device memory as it might be used for DMA.
      This inserts pfns from the fault handler as the GPU memory is not onlined
      until the vendor driver is loaded and trained the NVLinks so doing this
      earlier causes low level errors which we fence in the firmware so
      it does not hurt the host system but still better be avoided; for the same
      reason this does not map GPU RAM into the host kernel (usual thing for
      emulated access otherwise).
      
      This exports an ATSD (Address Translation Shootdown) register of NPU which
      allows TLB invalidations inside GPU for an operating system. The register
      conveniently occupies a single 64k page. It is also presented to
      the userspace as a new VFIO device region. One NPU has 8 ATSD registers,
      each of them can be used for TLB invalidation in a GPU linked to this NPU.
      This allocates one ATSD register per an NVLink bridge allowing passing
      up to 6 registers. Due to the host firmware bug (just recently fixed),
      only 1 ATSD register per NPU was actually advertised to the host system
      so this passes that alone register via the first NVLink bridge device in
      the group which is still enough as QEMU collects them all back and
      presents to the guest via vPHB to mimic the emulated NPU PHB on the host.
      
      In order to provide the userspace with the information about GPU-to-NVLink
      connections, this exports an additional capability called "tgt"
      (which is an abbreviated host system bus address). The "tgt" property
      tells the GPU its own system address and allows the guest driver to
      conglomerate the routing information so each GPU knows how to get directly
      to the other GPUs.
      
      For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to
      know LPID (a logical partition ID or a KVM guest hardware ID in other
      words) and PID (a memory context ID of a userspace process, not to be
      confused with a linux pid). This assigns a GPU to LPID in the NPU and
      this is why this adds a listener for KVM on an IOMMU group. A PID comes
      via NVLink from a GPU and NPU uses a PID wildcard to pass it through.
      
      This requires coherent memory and ATSD to be available on the host as
      the GPU vendor only supports configurations with both features enabled
      and other configurations are known not to work. Because of this and
      because of the ways the features are advertised to the host system
      (which is a device tree with very platform specific properties),
      this requires enabled POWERNV platform.
      
      The V100 GPUs do not advertise any of these capabilities via the config
      space and there are more than just one device ID so this relies on
      the platform to tell whether these GPUs have special abilities such as
      NVLinks.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7f928917
    • A
      vfio_pci: Allow regions to add own capabilities · c2c0f1cd
      Alexey Kardashevskiy 提交于
      VFIO regions already support region capabilities with a limited set of
      fields. However the subdriver might have to report to the userspace
      additional bits.
      
      This adds an add_capability() hook to vfio_pci_regops.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c2c0f1cd
    • A
      vfio_pci: Allow mapping extra regions · a15b1883
      Alexey Kardashevskiy 提交于
      So far we only allowed mapping of MMIO BARs to the userspace. However
      there are GPUs with on-board coherent RAM accessible via side
      channels which we also want to map to the userspace. The first client
      for this is NVIDIA V100 GPU with NVLink2 direct links to a POWER9
      NPU-enabled CPU; such GPUs have 16GB RAM which is coherently mapped
      to the system address space, we are going to export these as an extra
      PCI region.
      
      We already support extra PCI regions and this adds support for mapping
      them to the userspace.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a15b1883
  11. 13 12月, 2018 1 次提交
  12. 26 9月, 2018 1 次提交
    • A
      vfio/pci: Mask buggy SR-IOV VF INTx support · db04264f
      Alex Williamson 提交于
      The SR-IOV spec requires that VFs must report zero for the INTx pin
      register as VFs are precluded from INTx support.  It's much easier for
      the host kernel to understand whether a device is a VF and therefore
      whether a non-zero pin register value is bogus than it is to do the
      same in userspace.  Override the INTx count for such devices and
      virtualize the pin register to provide a consistent view of the device
      to the user.
      
      As this is clearly a spec violation, warn about it to support hardware
      validation, but also provide a known whitelist as it doesn't do much
      good to continue complaining if the hardware vendor doesn't plan to
      fix it.
      
      Known devices with this issue: 8086:270c
      Tested-by: NGage Eads <gage.eads@intel.com>
      Reviewed-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      db04264f
  13. 07 8月, 2018 2 次提交
  14. 20 7月, 2018 2 次提交
  15. 19 7月, 2018 1 次提交
  16. 27 3月, 2018 1 次提交
    • A
      vfio/pci: Add ioeventfd support · 30656177
      Alex Williamson 提交于
      The ioeventfd here is actually irqfd handling of an ioeventfd such as
      supported in KVM.  A user is able to pre-program a device write to
      occur when the eventfd triggers.  This is yet another instance of
      eventfd-irqfd triggering between KVM and vfio.  The impetus for this
      is high frequency writes to pages which are virtualized in QEMU.
      Enabling this near-direct write path for selected registers within
      the virtualized page can improve performance and reduce overhead.
      Specifically this is initially targeted at NVIDIA graphics cards where
      the driver issues a write to an MMIO register within a virtualized
      region in order to allow the MSI interrupt to re-trigger.
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      30656177
  17. 22 3月, 2018 1 次提交
  18. 21 12月, 2017 3 次提交
    • A
      vfio-pci: Allow mapping MSIX BAR · a32295c6
      Alexey Kardashevskiy 提交于
      By default VFIO disables mapping of MSIX BAR to the userspace as
      the userspace may program it in a way allowing spurious interrupts;
      instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl.
      In order to eliminate guessing from the userspace about what is
      mmapable, VFIO also advertises a sparse list of regions allowed to mmap.
      
      This works fine as long as the system page size equals to the MSIX
      alignment requirement which is 4KB. However with a bigger page size
      the existing code prohibits mapping non-MSIX parts of a page with MSIX
      structures so these parts have to be emulated via slow reads/writes on
      a VFIO device fd. If these emulated bits are accessed often, this has
      serious impact on performance.
      
      This allows mmap of the entire BAR containing MSIX vector table.
      
      This removes the sparse capability for PCI devices as it becomes useless.
      
      As the userspace needs to know for sure whether mmapping of the MSIX
      vector containing data can succeed, this adds a new capability -
      VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - which explicitly tells the userspace
      that the entire BAR can be mmapped.
      
      This does not touch the MSIX mangling in the BAR read/write handlers as
      we are doing this just to enable direct access to non MSIX registers.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [aw - fixup whitespace, trim function name]
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      a32295c6
    • A
      vfio: Simplify capability helper · dda01f78
      Alex Williamson 提交于
      The vfio_info_add_capability() helper requires the caller to pass a
      capability ID, which it then uses to fill in header fields, assuming
      hard coded versions.  This makes for an awkward and rigid interface.
      The only thing we want this helper to do is allocate sufficient
      space in the caps buffer and chain this capability into the list.
      Reduce it to that simple task.
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NZhenyu Wang <zhenyuw@linux.intel.com>
      Reviewed-by: NKirti Wankhede <kwankhede@nvidia.com>
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      dda01f78
    • A
      vfio-pci: Mask INTx if a device is not capabable of enabling it · 2170dd04
      Alexey Kardashevskiy 提交于
      At the moment VFIO rightfully assumes that INTx is supported if
      the interrupt pin is not set to zero in the device config space.
      However if that is not the case (the pin is not zero but pdev->irq is),
      vfio_intx_enable() fails.
      
      In order to prevent the userspace from trying to enable INTx when we know
      that it cannot work, let's mask the PCI_INTERRUPT_PIN register.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      2170dd04
  19. 27 7月, 2017 1 次提交
  20. 13 6月, 2017 1 次提交
  21. 04 1月, 2017 1 次提交
  22. 17 11月, 2016 2 次提交
  23. 27 10月, 2016 1 次提交
    • V
      vfio/pci: Fix integer overflows, bitmask check · 05692d70
      Vlad Tsyrklevich 提交于
      The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize
      user-supplied integers, potentially allowing memory corruption. This
      patch adds appropriate integer overflow checks, checks the range bounds
      for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element
      in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set.
      VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in
      vfio_pci_set_irqs_ioctl().
      
      Furthermore, a kzalloc is changed to a kcalloc because the use of a
      kzalloc with an integer multiplication allowed an integer overflow
      condition to be reached without this patch. kcalloc checks for overflow
      and should prevent a similar occurrence.
      Signed-off-by: NVlad Tsyrklevich <vlad@tsyrklevich.net>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      05692d70
  24. 09 7月, 2016 1 次提交
    • Y
      vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive · 05f0c03f
      Yongji Xie 提交于
      Current vfio-pci implementation disallows to mmap
      sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio
      page may be shared with other BARs. This will cause some
      performance issues when we passthrough a PCI device with
      this kind of BARs. Guest will be not able to handle the mmio
      accesses to the BARs which leads to mmio emulations in host.
      
      However, not all sub-page BARs will share page with other BARs.
      We should allow to mmap the sub-page MMIO BARs which we can
      make sure will not share page with other BARs.
      
      This patch adds support for this case. And we try to add a
      dummy resource to reserve the remainder of the page which
      hot-add device's BAR might be assigned into. But it's not
      necessary to handle the case when the BAR is not page aligned.
      Because we can't expect the BAR will be assigned into the same
      location in a page in guest when we passthrough the BAR. And
      it's hard to access this BAR in userspace because we have
      no way to get the BAR's location in a page.
      Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      05f0c03f
  25. 29 4月, 2016 1 次提交
    • A
      vfio/pci: Hide broken INTx support from user · 45074405
      Alex Williamson 提交于
      INTx masking has two components, the first is that we need the ability
      to prevent the device from continuing to assert INTx.  This is
      provided via the DisINTx bit in the command register and is the only
      thing we can really probe for when testing if INTx masking is
      supported.  The second component is that the device needs to indicate
      if INTx is asserted via the interrupt status bit in the device status
      register.  With these two features we can generically determine if one
      of the devices we own is asserting INTx, signal the user, and mask the
      interrupt while the user services the device.
      
      Generally if one or both of these components is broken we resort to
      APIC level interrupt masking, which requires an exclusive interrupt
      since we have no way to determine the source of the interrupt in a
      shared configuration.  This often makes it difficult or impossible to
      configure the system for userspace use of the device, for an interrupt
      mode that the user may not need.
      
      One possible configuration of broken INTx masking is that the DisINTx
      support is fully functional, but the interrupt status bit never
      signals interrupt assertion.  In this case we do have the ability to
      prevent the device from asserting INTx, but lack the ability to
      identify the interrupt source.  For this case we can simply pretend
      that the device lacks INTx support entirely, keeping DisINTx set on
      the physical device, virtualizing this bit for the user, and
      virtualizing the interrupt pin register to indicate no INTx support.
      We already support virtualization of the DisINTx bit and already
      virtualize the interrupt pin for platforms without INTx support.  By
      tying these components together, setting DisINTx on open and reset,
      and identifying devices broken in this particular way, we can provide
      support for them w/o the handicap of APIC level INTx masking.
      
      Intel i40e (XL710/X710) 10/20/40GbE NICs have been identified as being
      broken in this specific way.  We leave the vfio-pci.nointxmask option
      as a mechanism to bypass this support, enabling INTx on the device
      with all the requirements of APIC level masking.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Cc: John Ronciak <john.ronciak@intel.com>
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      45074405
  26. 28 2月, 2016 1 次提交