提交 · 20c8ccb1975b8d5639789d1025ad6ada38bd6f48 · openeuler / Kernel

23 4月, 2019 1 次提交

vfio: Use dev_printk() when possible · a88a7b3e

由 Bjorn Helgaas 提交于 3月 30, 2019

Use dev_printk() when possible to make messages consistent with other
device-related messages.
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

a88a7b3e

04 4月, 2019 1 次提交

vfio/pci: use correct format characters · 426b046b

由 Louis Taylor 提交于 4月 03, 2019

When compiling with -Wformat, clang emits the following warnings:

drivers/vfio/pci/vfio_pci.c:1601:5: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                ^~~~~~

drivers/vfio/pci/vfio_pci.c:1601:13: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                        ^~~~~~

drivers/vfio/pci/vfio_pci.c:1601:21: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                                ^~~~~~~~~

drivers/vfio/pci/vfio_pci.c:1601:32: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                                           ^~~~~~~~~

drivers/vfio/pci/vfio_pci.c:1605:5: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                ^~~~~~

drivers/vfio/pci/vfio_pci.c:1605:13: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                        ^~~~~~

drivers/vfio/pci/vfio_pci.c:1605:21: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                                ^~~~~~~~~

drivers/vfio/pci/vfio_pci.c:1605:32: warning: format specifies type
      'unsigned short' but the argument has type 'unsigned int' [-Wformat]
                                vendor, device, subvendor, subdevice,
                                                           ^~~~~~~~~
The types of these arguments are unconditionally defined, so this patch
updates the format character to the correct ones for unsigned ints.

Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: NLouis Taylor <louis@kragniz.eu>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

426b046b

19 2月, 2019 2 次提交

vfio_pci: Enable memory accesses before calling pci_map_rom · 0cfd027b

由 Eric Auger 提交于 2月 15, 2019

pci_map_rom/pci_get_rom_size() performs memory access in the ROM.
In case the Memory Space accesses were disabled, readw() is likely
to trigger a synchronous external abort on some platforms.

In case memory accesses were disabled, re-enable them before the
call and disable them back again just after.

Fixes: 89e1f7d4 ("vfio: Add PCI device driver")
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Suggested-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

0cfd027b

vfio/pci: Restore device state on PM transition · 51ef3a00

由 Alex Williamson 提交于 2月 09, 2019

PCI core handles save and restore of device state around reset, but
when using pci_set_power_state() we can unintentionally trigger a soft
reset of the device, where PCI core only restores the BAR state. If
we're using vfio-pci's idle D3 support to try to put devices into low
power when unused, this might trigger a reset when the device is woken
for use. Also power state management by the user, or within a guest,
can put the device into D3 power state with potentially limited
ability to restore the device if it should undergo a reset. The PCI
spec does not define the extent of a soft reset and many devices
reporting soft reset on D3->D0 transition do not undergo a PCI config
space reset. It's therefore assumed safe to unconditionally restore
the remainder of the state if the device indicates soft reset
support, even on a user initiated wakeup.

Implement a wrapper in vfio-pci to tag devices reporting PM reset
support, save their state on transitions into D3 and restore on
transitions back to D0.
Reported-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

51ef3a00

21 12月, 2018 3 次提交

vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver · 7f928917

由 Alexey Kardashevskiy 提交于 12月 20, 2018

POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not
pluggable PCIe devices but still have PCIe links which are used
for config space and MMIO. In addition to that the GPUs have 6 NVLinks
which are connected to other GPUs and the POWER9 CPU. POWER9 chips
have a special unit on a die called an NPU which is an NVLink2 host bus
adapter with p2p connections to 2 to 3 GPUs, 3 or 2 NVLinks to each.
These systems also support ATS (address translation services) which is
a part of the NVLink2 protocol. Such GPUs also share on-board RAM
(16GB or 32GB) to the system via the same NVLink2 so a CPU has
cache-coherent access to a GPU RAM.

This exports GPU RAM to the userspace as a new VFIO device region. This
preregisters the new memory as device memory as it might be used for DMA.
This inserts pfns from the fault handler as the GPU memory is not onlined
until the vendor driver is loaded and trained the NVLinks so doing this
earlier causes low level errors which we fence in the firmware so
it does not hurt the host system but still better be avoided; for the same
reason this does not map GPU RAM into the host kernel (usual thing for
emulated access otherwise).

This exports an ATSD (Address Translation Shootdown) register of NPU which
allows TLB invalidations inside GPU for an operating system. The register
conveniently occupies a single 64k page. It is also presented to
the userspace as a new VFIO device region. One NPU has 8 ATSD registers,
each of them can be used for TLB invalidation in a GPU linked to this NPU.
This allocates one ATSD register per an NVLink bridge allowing passing
up to 6 registers. Due to the host firmware bug (just recently fixed),
only 1 ATSD register per NPU was actually advertised to the host system
so this passes that alone register via the first NVLink bridge device in
the group which is still enough as QEMU collects them all back and
presents to the guest via vPHB to mimic the emulated NPU PHB on the host.

In order to provide the userspace with the information about GPU-to-NVLink
connections, this exports an additional capability called "tgt"
(which is an abbreviated host system bus address). The "tgt" property
tells the GPU its own system address and allows the guest driver to
conglomerate the routing information so each GPU knows how to get directly
to the other GPUs.

For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to
know LPID (a logical partition ID or a KVM guest hardware ID in other
words) and PID (a memory context ID of a userspace process, not to be
confused with a linux pid). This assigns a GPU to LPID in the NPU and
this is why this adds a listener for KVM on an IOMMU group. A PID comes
via NVLink from a GPU and NPU uses a PID wildcard to pass it through.

This requires coherent memory and ATSD to be available on the host as
the GPU vendor only supports configurations with both features enabled
and other configurations are known not to work. Because of this and
because of the ways the features are advertised to the host system
(which is a device tree with very platform specific properties),
this requires enabled POWERNV platform.

The V100 GPUs do not advertise any of these capabilities via the config
space and there are more than just one device ID so this relies on
the platform to tell whether these GPUs have special abilities such as
NVLinks.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7f928917

vfio_pci: Allow regions to add own capabilities · c2c0f1cd

由 Alexey Kardashevskiy 提交于 12月 19, 2018

VFIO regions already support region capabilities with a limited set of
fields. However the subdriver might have to report to the userspace
additional bits.

This adds an add_capability() hook to vfio_pci_regops.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

c2c0f1cd

vfio_pci: Allow mapping extra regions · a15b1883

由 Alexey Kardashevskiy 提交于 12月 19, 2018

So far we only allowed mapping of MMIO BARs to the userspace. However
there are GPUs with on-board coherent RAM accessible via side
channels which we also want to map to the userspace. The first client
for this is NVIDIA V100 GPU with NVLink2 direct links to a POWER9
NPU-enabled CPU; such GPUs have 16GB RAM which is coherently mapped
to the system address space, we are going to export these as an extra
PCI region.

We already support extra PCI regions and this adds support for mapping
them to the userspace.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

a15b1883

13 12月, 2018 1 次提交

vfio/pci: Parallelize device open and release · e309df5b

由 Alex Williamson 提交于 12月 12, 2018

In commit 61d79256 ("vfio-pci: Use mutex around open, release, and
remove") a mutex was added to freeze the refcnt for a device so that
we can handle errors and perform bus resets on final close. However,
bus resets can be rather slow and a global mutex here is undesirable.
Evaluating the potential locking granularity, a per-device mutex
provides the best resolution but with multiple devices on a bus all
released concurrently, they'll race to acquire each other's mutex,
likely resulting in no reset at all if we use trylock. We therefore
lock at the granularity of the bus/slot reset as we're only attempting
a single reset for this group of devices anyway. This allows much
greater scaling as we're bounded in the number of devices protected by
a single reflck object.
Reported-by: NChristian Ehrhardt <christian.ehrhardt@canonical.com>
Tested-by: NChristian Ehrhardt <christian.ehrhardt@canonical.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

e309df5b

26 9月, 2018 1 次提交

vfio/pci: Mask buggy SR-IOV VF INTx support · db04264f

由 Alex Williamson 提交于 9月 25, 2018

The SR-IOV spec requires that VFs must report zero for the INTx pin
register as VFs are precluded from INTx support.  It's much easier for
the host kernel to understand whether a device is a VF and therefore
whether a non-zero pin register value is bogus than it is to do the
same in userspace.  Override the INTx count for such devices and
virtualize the pin register to provide a consistent view of the device
to the user.

As this is clearly a spec violation, warn about it to support hardware
validation, but also provide a known whitelist as it doesn't do much
good to continue complaining if the hardware vendor doesn't plan to
fix it.

Known devices with this issue: 8086:270c
Tested-by: NGage Eads <gage.eads@intel.com>
Reviewed-by: NAshok Raj <ashok.raj@intel.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

db04264f

07 8月, 2018 2 次提交

vfio-pci: Disable binding to PFs with SR-IOV enabled · 0dd0e297

由 Alex Williamson 提交于 7月 12, 2018

We expect to receive PFs with SR-IOV disabled, however some host
drivers leave SR-IOV enabled at unbind. This puts us in a state where
we can potentially assign both the PF and the VF, leading to both
functionality as well as security concerns due to lack of managing the
SR-IOV state as well as vendor dependent isolation from the PF to VF.
If we were to attempt to actively disable SR-IOV on driver probe, we
risk VF bound drivers blocking, potentially risking live lock
scenarios. Therefore simply refuse to bind to PFs with SR-IOV enabled
with a warning message indicating the issue. Users can resolve this
by re-binding to the host driver and disabling SR-IOV before
attempting to use the device with vfio-pci.
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

0dd0e297

vfio: Mark expected switch fall-throughs · 544c05a6

由 Gustavo A. R. Silva 提交于 7月 09, 2018

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

544c05a6

20 7月, 2018 2 次提交

PCI: Rename pci_try_reset_bus() to pci_reset_bus() · c6a44ba9

由 Sinan Kaya 提交于 7月 19, 2018

Now that the old implementation of pci_reset_bus() is gone, replace
pci_try_reset_bus() with pci_reset_bus().

Compared to the old implementation, new code will fail immmediately with
-EAGAIN if object lock cannot be obtained.
Signed-off-by: NSinan Kaya <okaya@codeaurora.org>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

c6a44ba9

PCI: Unify try slot and bus reset API · 811c5cb3

由 Sinan Kaya 提交于 7月 19, 2018

Drivers are expected to call pci_try_reset_slot() or pci_try_reset_bus() by
querying if a system supports hotplug or not. A survey showed that most
drivers don't do this and we are leaking hotplug capability to the user.

Hide pci_try_slot_reset() from drivers and embed into pci_try_bus_reset().
Change pci_try_reset_bus() parameter from struct pci_bus to struct pci_dev.
Signed-off-by: NSinan Kaya <okaya@codeaurora.org>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

811c5cb3

19 7月, 2018 1 次提交

vfio/pci: Fix potential Spectre v1 · 0e714d27

由 Gustavo A. R. Silva 提交于 7月 17, 2018

info.index can be indirectly controlled by user-space, hence leading
to a potential exploitation of the Spectre variant 1 vulnerability.

This issue was detected with the help of Smatch:

drivers/vfio/pci/vfio_pci.c:734 vfio_pci_ioctl()
warn: potential spectre issue 'vdev->region'

Fix this by sanitizing info.index before indirectly using it to index
vdev->region

Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].

[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2

Cc: stable@vger.kernel.org
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

0e714d27

27 3月, 2018 1 次提交

vfio/pci: Add ioeventfd support · 30656177

由 Alex Williamson 提交于 3月 21, 2018

The ioeventfd here is actually irqfd handling of an ioeventfd such as
supported in KVM. A user is able to pre-program a device write to
occur when the eventfd triggers. This is yet another instance of
eventfd-irqfd triggering between KVM and vfio. The impetus for this
is high frequency writes to pages which are virtualized in QEMU.
Enabling this near-direct write path for selected registers within
the virtualized page can improve performance and reduce overhead.
Specifically this is initially targeted at NVIDIA graphics cards where
the driver issues a write to an MMIO register within a virtualized
region in order to allow the MSI interrupt to re-trigger.
Reviewed-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

30656177

22 3月, 2018 1 次提交

Revert: "vfio-pci: Mask INTx if a device is not capabable of enabling it" · 834814e8

由 Alex Williamson 提交于 3月 21, 2018

This reverts commit 2170dd04

The intent of commit 2170dd04 ("vfio-pci: Mask INTx if a device is
not capabable of enabling it") was to disallow the user from seeing
that the device supports INTx if the platform is incapable of enabling
it. The detection of this case however incorrectly includes devices
which natively do not support INTx, such as SR-IOV VFs, and further
discussions reveal gaps even for the target use case.
Reported-by: NArjun Vynipadath <arjun@chelsio.com>
Fixes: 2170dd04 ("vfio-pci: Mask INTx if a device is not capabable of enabling it")
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

834814e8

21 12月, 2017 3 次提交

vfio-pci: Allow mapping MSIX BAR · a32295c6

由 Alexey Kardashevskiy 提交于 12月 13, 2017

By default VFIO disables mapping of MSIX BAR to the userspace as
the userspace may program it in a way allowing spurious interrupts;
instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl.
In order to eliminate guessing from the userspace about what is
mmapable, VFIO also advertises a sparse list of regions allowed to mmap.

This works fine as long as the system page size equals to the MSIX
alignment requirement which is 4KB. However with a bigger page size
the existing code prohibits mapping non-MSIX parts of a page with MSIX
structures so these parts have to be emulated via slow reads/writes on
a VFIO device fd. If these emulated bits are accessed often, this has
serious impact on performance.

This allows mmap of the entire BAR containing MSIX vector table.

This removes the sparse capability for PCI devices as it becomes useless.

As the userspace needs to know for sure whether mmapping of the MSIX
vector containing data can succeed, this adds a new capability -
VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - which explicitly tells the userspace
that the entire BAR can be mmapped.

This does not touch the MSIX mangling in the BAR read/write handlers as
we are doing this just to enable direct access to non MSIX registers.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
[aw - fixup whitespace, trim function name]
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

a32295c6

vfio: Simplify capability helper · dda01f78

由 Alex Williamson 提交于 12月 12, 2017

The vfio_info_add_capability() helper requires the caller to pass a
capability ID, which it then uses to fill in header fields, assuming
hard coded versions.  This makes for an awkward and rigid interface.
The only thing we want this helper to do is allocate sufficient
space in the caps buffer and chain this capability into the list.
Reduce it to that simple task.
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: NZhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: NKirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

dda01f78

vfio-pci: Mask INTx if a device is not capabable of enabling it · 2170dd04

由 Alexey Kardashevskiy 提交于 12月 07, 2017

At the moment VFIO rightfully assumes that INTx is supported if
the interrupt pin is not set to zero in the device config space.
However if that is not the case (the pin is not zero but pdev->irq is),
vfio_intx_enable() fails.

In order to prevent the userspace from trying to enable INTx when we know
that it cannot work, let's mask the PCI_INTERRUPT_PIN register.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

2170dd04

27 7月, 2017 1 次提交

vfio/pci: Use pci_try_reset_function() on initial open · 9f478035

由 Alex Williamson 提交于 7月 26, 2017

Device lock bites again; if a device .remove() callback races a user
calling ioctl(VFIO_GROUP_GET_DEVICE_FD), the unbind request will hold
the device lock, but the user ioctl may have already taken a vfio_device
reference. In the case of a PCI device, the initial open will attempt
to reset the device, which again attempts to get the device lock,
resulting in deadlock. Use the trylock PCI reset interface and return
error on the open path if reset fails due to lock contention.

Link: https://lkml.org/lkml/2017/7/25/381Reported-by: NWen Congyang <wencongyang2@huawei.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

9f478035

13 6月, 2017 1 次提交

vfio/pci: Add Intel XXV710 to hidden INTx devices · 7d57e5e9

由 Alex Williamson 提交于 6月 13, 2017

XXV710 has the same broken INTx behavior as the rest of the X/XL710
series, the interrupt status register is not wired to report pending
INTx interrupts, thus we never associate the interrupt to the device.
Extend the device IDs to include these so that we hide that the
device supports INTx at all to the user.
Reported-by: NStefan Assmann <sassmann@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Acked-by: NJesse Brandeburg <jesse.brandeburg@intel.com>

7d57e5e9

04 1月, 2017 1 次提交

vfio-pci: Handle error from pci_iomap · e19f32da

由 Arvind Yadav 提交于 1月 03, 2017

Here, pci_iomap can fail, handle this case release selected
pci regions and return -ENOMEM.
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

e19f32da

17 11月, 2016 2 次提交

vfio_pci: Updated to use vfio_set_irqs_validate_and_prepare() · ef198aaa

由 Kirti Wankhede 提交于 11月 17, 2016

Updated vfio_pci.c file to use vfio_set_irqs_validate_and_prepare()
Signed-off-by: NKirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: NNeo Jia <cjia@nvidia.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

ef198aaa

vfio_pci: Update vfio_pci to use vfio_info_add_capability() · c535d345

由 Kirti Wankhede 提交于 11月 17, 2016

Update msix_sparse_mmap_cap() to use vfio_info_add_capability()
Update region type capability to use vfio_info_add_capability()
Signed-off-by: NKirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: NNeo Jia <cjia@nvidia.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

c535d345

27 10月, 2016 1 次提交

vfio/pci: Fix integer overflows, bitmask check · 05692d70

由 Vlad Tsyrklevich 提交于 10月 12, 2016

The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize
user-supplied integers, potentially allowing memory corruption. This
patch adds appropriate integer overflow checks, checks the range bounds
for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element
in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set.
VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in
vfio_pci_set_irqs_ioctl().

Furthermore, a kzalloc is changed to a kcalloc because the use of a
kzalloc with an integer multiplication allowed an integer overflow
condition to be reached without this patch. kcalloc checks for overflow
and should prevent a similar occurrence.
Signed-off-by: NVlad Tsyrklevich <vlad@tsyrklevich.net>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

05692d70

09 7月, 2016 1 次提交

vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive · 05f0c03f

由 Yongji Xie 提交于 6月 30, 2016

Current vfio-pci implementation disallows to mmap
sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio
page may be shared with other BARs. This will cause some
performance issues when we passthrough a PCI device with
this kind of BARs. Guest will be not able to handle the mmio
accesses to the BARs which leads to mmio emulations in host.

However, not all sub-page BARs will share page with other BARs.
We should allow to mmap the sub-page MMIO BARs which we can
make sure will not share page with other BARs.

This patch adds support for this case. And we try to add a
dummy resource to reserve the remainder of the page which
hot-add device's BAR might be assigned into. But it's not
necessary to handle the case when the BAR is not page aligned.
Because we can't expect the BAR will be assigned into the same
location in a page in guest when we passthrough the BAR. And
it's hard to access this BAR in userspace because we have
no way to get the BAR's location in a page.
Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

05f0c03f

29 4月, 2016 1 次提交

vfio/pci: Hide broken INTx support from user · 45074405

由 Alex Williamson 提交于 3月 24, 2016

INTx masking has two components, the first is that we need the ability
to prevent the device from continuing to assert INTx. This is
provided via the DisINTx bit in the command register and is the only
thing we can really probe for when testing if INTx masking is
supported. The second component is that the device needs to indicate
if INTx is asserted via the interrupt status bit in the device status
register. With these two features we can generically determine if one
of the devices we own is asserting INTx, signal the user, and mask the
interrupt while the user services the device.

Generally if one or both of these components is broken we resort to
APIC level interrupt masking, which requires an exclusive interrupt
since we have no way to determine the source of the interrupt in a
shared configuration. This often makes it difficult or impossible to
configure the system for userspace use of the device, for an interrupt
mode that the user may not need.

One possible configuration of broken INTx masking is that the DisINTx
support is fully functional, but the interrupt status bit never
signals interrupt assertion. In this case we do have the ability to
prevent the device from asserting INTx, but lack the ability to
identify the interrupt source. For this case we can simply pretend
that the device lacks INTx support entirely, keeping DisINTx set on
the physical device, virtualizing this bit for the user, and
virtualizing the interrupt pin register to indicate no INTx support.
We already support virtualization of the DisINTx bit and already
virtualize the interrupt pin for platforms without INTx support. By
tying these components together, setting DisINTx on open and reset,
and identifying devices broken in this particular way, we can provide
support for them w/o the handicap of APIC level INTx masking.

Intel i40e (XL710/X710) 10/20/40GbE NICs have been identified as being
broken in this specific way. We leave the vfio-pci.nointxmask option
as a mechanism to bypass this support, enabling INTx on the device
with all the requirements of APIC level masking.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Cc: John Ronciak <john.ronciak@intel.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>

45074405

28 2月, 2016 1 次提交

vfio: fix ioctl error handling · 8160c4e4

由 Michael S. Tsirkin 提交于 2月 28, 2016

Calling return copy_to_user(...) in an ioctl will not
do the right thing if there's a pagefault:
copy_to_user returns the number of bytes not copied
in this case.

Fix up vfio to do
	return copy_to_user(...)) ?
		-EFAULT : 0;

everywhere.

Cc: stable@vger.kernel.org
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

8160c4e4

26 2月, 2016 1 次提交

vfio/pci: return -EFAULT if copy_to_user fails · c4aec310

由 Dan Carpenter 提交于 2月 25, 2016

The copy_to_user() function returns the number of bytes that were not
copied but we want to return -EFAULT on error here.

Fixes: 188ad9d6 ('vfio/pci: Include sparse mmap capability for MSI-X table regions')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

c4aec310

23 2月, 2016 5 次提交

vfio/pci: Expose shadow ROM as PCI option ROM · a13b6459

由 Alex Williamson 提交于 2月 22, 2016

Integrated graphics may have their ROM shadowed at 0xc0000 rather than
implement a PCI option ROM.  Make this ROM appear to the user using
the ROM BAR.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

a13b6459

vfio/pci: Intel IGD host and LCP bridge config space access · f572a960

由 Alex Williamson 提交于 2月 22, 2016

Provide read-only access to PCI config space of the PCI host bridge
and LPC bridge through device specific regions. This may be used to
configure a VM with matching register contents to satisfy driver
requirements. Providing this through the vfio file descriptor removes
an additional userspace requirement for access through pci-sysfs and
removes the CAP_SYS_ADMIN requirement that doesn't appear to apply to
the specific devices we're accessing.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

f572a960

vfio/pci: Intel IGD OpRegion support · 5846ff54

由 Alex Williamson 提交于 2月 22, 2016

This is the first consumer of vfio device specific resource support,
providing read-only access to the OpRegion for Intel graphics devices.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

5846ff54

vfio/pci: Add infrastructure for additional device specific regions · 28541d41

由 Alex Williamson 提交于 2月 22, 2016

Add support for additional regions with indexes started after the
already defined fixed regions. Device specific code can register
these regions with the new vfio_pci_register_dev_region() function.
The ops structure per region currently only includes read/write
access and a release function, allowing automatic cleanup when the
device is closed. mmap support is only missing here because it's
not needed by the first user queued for this support.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

28541d41

vfio/pci: Include sparse mmap capability for MSI-X table regions · 188ad9d6

由 Alex Williamson 提交于 2月 22, 2016

vfio-pci has never allowed the user to directly mmap the MSI-X vector
table, but we've always relied on implicit knowledge of the user that
they cannot do this. Now that we have capability chains that we can
expose in the region info ioctl and a sparse mmap capability that
represents the sub-areas within the region that can be mmap'd, we can
make the mmap constraints more explicit.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

188ad9d6

22 12月, 2015 1 次提交

vfio: Include No-IOMMU mode · 03a76b60

由 Alex Williamson 提交于 12月 21, 2015

There is really no way to safely give a user full access to a DMA
capable device without an IOMMU to protect the host system. There is
also no way to provide DMA translation, for use cases such as device
assignment to virtual machines. However, there are still those users
that want userspace drivers even under those conditions. The UIO
driver exists for this use case, but does not provide the degree of
device access and programming that VFIO has. In an effort to avoid
code duplication, this introduces a No-IOMMU mode for VFIO.

This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
the "enable_unsafe_noiommu_mode" option on the vfio driver. This
should make it very clear that this mode is not safe. Additionally,
CAP_SYS_RAWIO privileges are necessary to work with groups and
containers using this mode. Groups making use of this support are
named /dev/vfio/noiommu-$GROUP and can only make use of the special
VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically
binding a device without a native IOMMU group to a VFIO bus driver
will taint the kernel and should therefore not be considered
supported. This patch includes no-iommu support for the vfio-pci bus
driver only.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>

03a76b60

04 12月, 2015 1 次提交

Revert: "vfio: Include No-IOMMU mode" · ae5515d6

由 Alex Williamson 提交于 12月 04, 2015

Revert commit 033291ec ("vfio: Include No-IOMMU mode") due to lack
of a user. This was originally intended to fill a need for the DPDK
driver, but uptake has been slow so rather than support an unproven
kernel interface revert it and revisit when userspace catches up.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

ae5515d6

20 11月, 2015 1 次提交

vfio-pci: constify pci_error_handlers structures · 7d10f4e0

由 Julia Lawall 提交于 11月 14, 2015

This pci_error_handlers structure is never modified, like all the other
pci_error_handlers structures, so declare it as const.

Done with the help of Coccinelle.
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

7d10f4e0

05 11月, 2015 1 次提交

vfio: Include No-IOMMU mode · 033291ec

由 Alex Williamson 提交于 10月 15, 2015

033291ec

10 6月, 2015 1 次提交

vfio/pci: Fix racy vfio_device_get_from_dev() call · 20f30017

由 Alex Williamson 提交于 6月 09, 2015

Testing the driver for a PCI device is racy, it can be all but
complete in the release path and still report the driver as ours.
Therefore we can't trust drvdata to be valid. This race can sometimes
be seen when one port of a multifunction device is being unbound from
the vfio-pci driver while another function is being released by the
user and attempting a bus reset. The device in the remove path is
found as a dependent device for the bus reset of the release path
device, the driver is still set to vfio-pci, but the drvdata has
already been cleared, resulting in a null pointer dereference.

To resolve this, fix vfio_device_get_from_dev() to not take the
dev_get_drvdata() shortcut and instead traverse through the
iommu_group, vfio_group, vfio_device path to get a reference we
can trust. Once we have that reference, we know the device isn't
in transition and we can test to make sure the driver is still what
we expect, so that we don't interfere with devices we don't own.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

20f30017

02 5月, 2015 1 次提交

vfio-pci: Log device requests more verbosely · 5f55d2ae

由 Alex Williamson 提交于 4月 28, 2015

Log some clues indicating whether the user is receiving device
request interfaces or not listening.  This can help indicate why a
driver unbind is blocked or explain why QEMU automatically unplugged
a device from the VM.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

5f55d2ae

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功