- 21 12月, 2017 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
At the moment VFIO rightfully assumes that INTx is supported if the interrupt pin is not set to zero in the device config space. However if that is not the case (the pin is not zero but pdev->irq is), vfio_intx_enable() fails. In order to prevent the userspace from trying to enable INTx when we know that it cannot work, let's mask the PCI_INTERRUPT_PIN register. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 03 10月, 2017 2 次提交
-
-
由 Alex Williamson 提交于
MRRS defines the maximum read request size a device is allowed to make. Drivers will often increase this to allow more data transfer with a single request. Completions to this request are bound by the MPS setting for the bus. Aside from device quirks (none known), it doesn't seem to make sense to set an MRRS value less than MPS, yet this is a likely scenario given that user drivers do not have a system-wide view of the PCI topology. Virtualize MRRS such that the user can set MRRS >= MPS, but use MPS as the floor value that we'll write to hardware. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
With virtual PCI-Express chipsets, we now see userspace/guest drivers trying to match the physical MPS setting to a virtual downstream port. Of course a lone physical device surrounded by virtual interconnects cannot make a correct decision for a proper MPS setting. Instead, let's virtualize the MPS control register so that writes through to hardware are disallowed. Userspace drivers like QEMU assume they can write anything to the device and we'll filter out anything dangerous. Since mismatched MPS can lead to AER and other faults, let's add it to the kernel side rather than relying on userspace virtualization to handle it. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Reviewed-by: NEric Auger <eric.auger@redhat.com>
-
- 28 7月, 2017 1 次提交
-
-
由 Alex Williamson 提交于
Root complex integrated endpoints do not have a link and therefore may use a smaller PCIe capability in config space than we expect when building our config map. Add a case for these to avoid reporting an erroneous overlap. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 27 7月, 2017 1 次提交
-
-
由 Alex Williamson 提交于
Device lock bites again; if a device .remove() callback races a user calling ioctl(VFIO_GROUP_GET_DEVICE_FD), the unbind request will hold the device lock, but the user ioctl may have already taken a vfio_device reference. In the case of a PCI device, the initial open will attempt to reset the device, which again attempts to get the device lock, resulting in deadlock. Use the trylock PCI reset interface and return error on the open path if reset fails due to lock contention. Link: https://lkml.org/lkml/2017/7/25/381Reported-by: NWen Congyang <wencongyang2@huawei.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 13 6月, 2017 1 次提交
-
-
由 Alex Williamson 提交于
XXV710 has the same broken INTx behavior as the rest of the X/XL710 series, the interrupt status register is not wired to report pending INTx interrupts, thus we never associate the interrupt to the device. Extend the device IDs to include these so that we hide that the device supports INTx at all to the user. Reported-by: NStefan Assmann <sassmann@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Acked-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
-
- 04 1月, 2017 1 次提交
-
-
由 Arvind Yadav 提交于
Here, pci_iomap can fail, handle this case release selected pci regions and return -ENOMEM. Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 30 12月, 2016 1 次提交
-
-
由 Arnd Bergmann 提交于
Using ancient compilers (gcc-4.5 or older) on ARM, we get a link failure with the vfio-pci driver: ERROR: "__aeabi_lcmp" [drivers/vfio/pci/vfio-pci.ko] undefined! The reason is that the compiler tries to do a comparison of a 64-bit range. This changes it to convert to a 32-bit number explicitly first, as newer compilers do for themselves. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 13 12月, 2016 1 次提交
-
-
由 Wang Sheng-Hui 提交于
Move PCI configuration space size macros (PCI_CFG_SPACE_SIZE and PCI_CFG_SPACE_EXP_SIZE) from drivers/pci/pci.h to include/uapi/linux/pci_regs.h so they can be used by more drivers and eliminate duplicate definitions. [bhelgaas: Expand comment to include PCI-X details] Signed-off-by: NWang Sheng-Hui <shhuiw@foxmail.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 19 11月, 2016 1 次提交
-
-
由 Cao jin 提交于
As of commit d97ffe23 ("PCI: Fix return value from pci_user_{read,write}_config_*()") it's unnecessary to call pcibios_err_to_errno() to fixup the return value from these functions. pcibios_err_to_errno() already does simple passthrough of -errno values, therefore no functional change is expected. [aw: changelog] Signed-off-by: NCao jin <caoj.fnst@cn.fujitsu.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 17 11月, 2016 2 次提交
-
-
由 Kirti Wankhede 提交于
Updated vfio_pci.c file to use vfio_set_irqs_validate_and_prepare() Signed-off-by: NKirti Wankhede <kwankhede@nvidia.com> Signed-off-by: NNeo Jia <cjia@nvidia.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Kirti Wankhede 提交于
Update msix_sparse_mmap_cap() to use vfio_info_add_capability() Update region type capability to use vfio_info_add_capability() Signed-off-by: NKirti Wankhede <kwankhede@nvidia.com> Signed-off-by: NNeo Jia <cjia@nvidia.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 27 10月, 2016 1 次提交
-
-
由 Vlad Tsyrklevich 提交于
The VFIO_DEVICE_SET_IRQS ioctl did not sufficiently sanitize user-supplied integers, potentially allowing memory corruption. This patch adds appropriate integer overflow checks, checks the range bounds for VFIO_IRQ_SET_DATA_NONE, and also verifies that only single element in the VFIO_IRQ_SET_DATA_TYPE_MASK bitmask is set. VFIO_IRQ_SET_ACTION_TYPE_MASK is already correctly checked later in vfio_pci_set_irqs_ioctl(). Furthermore, a kzalloc is changed to a kcalloc because the use of a kzalloc with an integer multiplication allowed an integer overflow condition to be reached without this patch. kcalloc checks for overflow and should prevent a similar occurrence. Signed-off-by: NVlad Tsyrklevich <vlad@tsyrklevich.net> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 30 9月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
Simplify the interrupt setup by using the new PCI layer helpers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 27 9月, 2016 2 次提交
-
-
由 Alex Williamson 提交于
The MSI/X shutdown path can gratuitously enable INTx, which is not something we want to happen if we're dealing with broken INTx device. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
We use a BAR restore trick to try to detect when a user has performed a device reset, possibly through FLR or other backdoors, to put things back into a working state. This is important for backdoor resets, but we can actually just virtualize the "front door" resets provided via PCIe and AF FLR. Set these bits as virtualized + writable, allowing the default write to set them in vconfig, then we can simply check the bit, perform an FLR of our own, and clear the bit. We don't actually have the granularity in PCI to specify the type of reset we want to do, but generally devices don't implement both PCIe and AF FLR and we'll favor these over other types of reset, so we should generally lineup. We do test whether the device provides the requested FLR type to stay consistent with hardware capabilities though. This seems to fix several instance of devices getting into bad states with userspace drivers, like dpdk, running inside a VM. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Reviewed-by: NGreg Rose <grose@lightfleet.com>
-
- 30 8月, 2016 1 次提交
-
-
由 Wei Jiangang 提交于
Signed-off-by: NWei Jiangang <weijg.fnst@cn.fujitsu.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 09 8月, 2016 1 次提交
-
-
由 Alex Williamson 提交于
There are multiple cases in vfio_pci_set_ctx_trigger_single() where we assume we can safely read from our data pointer without actually checking whether the user has passed any data via the count field. VFIO_IRQ_SET_DATA_NONE in particular is entirely broken since we attempt to pull an int32_t file descriptor out before even checking the data type. The other data types assume the data pointer contains one element of their type as well. In part this is good news because we were previously restricted from doing much sanitization of parameters because it was missed in the past and we didn't want to break existing users. Clearly DATA_NONE is completely broken, so it must not have any users and we can fix it up completely. For DATA_BOOL and DATA_EVENTFD, we'll just protect ourselves, returning error when count is zero since we previously would have oopsed. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Reported-by: NChris Thompson <the_cartographer@hotmail.com> Cc: stable@vger.kernel.org Reviewed-by: NEric Auger <eric.auger@redhat.com>
-
- 09 7月, 2016 1 次提交
-
-
由 Yongji Xie 提交于
Current vfio-pci implementation disallows to mmap sub-page(size < PAGE_SIZE) MMIO BARs because these BARs' mmio page may be shared with other BARs. This will cause some performance issues when we passthrough a PCI device with this kind of BARs. Guest will be not able to handle the mmio accesses to the BARs which leads to mmio emulations in host. However, not all sub-page BARs will share page with other BARs. We should allow to mmap the sub-page MMIO BARs which we can make sure will not share page with other BARs. This patch adds support for this case. And we try to add a dummy resource to reserve the remainder of the page which hot-add device's BAR might be assigned into. But it's not necessary to handle the case when the BAR is not page aligned. Because we can't expect the BAR will be assigned into the same location in a page in guest when we passthrough the BAR. And it's hard to access this BAR in userspace because we have no way to get the BAR's location in a page. Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 01 6月, 2016 1 次提交
-
-
由 Alex Williamson 提交于
The size of the VPD area is not necessarily 4-byte aligned, so a pci_vpd_read() might return less than 4 bytes. Zero our buffer and accept anything other than an error. Intel X710 NICs exercise this. Fixes: 4e1a6355 ("vfio/pci: Use kernel VPD access functions") Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 30 5月, 2016 1 次提交
-
-
由 Alex Williamson 提交于
Both the INTx and MSI/X disable paths do an eventfd_ctx_put() for the trigger eventfd before calling vfio_virqfd_disable() any potential mask and unmask eventfds. This opens a use-after-free race where an inopportune irqfd can reference the freed signalling eventfd. Reorder to avoid this possibility. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 20 5月, 2016 1 次提交
-
-
由 Alexey Kardashevskiy 提交于
PCI-Express spec says that reading 4 bytes at offset 100h should return zero if there is no extended capability so VFIO reads this dword to know if there are extended capabilities. However it is not always possible to access the extended space so generic PCI code in pci_cfg_space_size_ext() checks if pci_read_config_dword() can read beyond 100h and if the check fails, it sets the config space size to 100h. VFIO does its own extended capabilities check by reading at offset 100h which may produce 0xffffffff which VFIO treats as the extended config space presense and calls vfio_ecap_init() which fails to parse capabilities (which is expected) but right before the exit, it writes zero at offset 100h which is beyond the buffer allocated for vdev->vconfig (which is 256 bytes) which leads to random memory corruption. This makes VFIO only check for the extended capabilities if the discovered config size is more than 256 bytes. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 29 4月, 2016 2 次提交
-
-
由 Alex Williamson 提交于
If a device is reset without the memory or i/o bits enabled in the command register we may not detect it, potentially leaving the device without valid BAR programming. Add an additional test to check the BARs on each write to the command register. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
INTx masking has two components, the first is that we need the ability to prevent the device from continuing to assert INTx. This is provided via the DisINTx bit in the command register and is the only thing we can really probe for when testing if INTx masking is supported. The second component is that the device needs to indicate if INTx is asserted via the interrupt status bit in the device status register. With these two features we can generically determine if one of the devices we own is asserting INTx, signal the user, and mask the interrupt while the user services the device. Generally if one or both of these components is broken we resort to APIC level interrupt masking, which requires an exclusive interrupt since we have no way to determine the source of the interrupt in a shared configuration. This often makes it difficult or impossible to configure the system for userspace use of the device, for an interrupt mode that the user may not need. One possible configuration of broken INTx masking is that the DisINTx support is fully functional, but the interrupt status bit never signals interrupt assertion. In this case we do have the ability to prevent the device from asserting INTx, but lack the ability to identify the interrupt source. For this case we can simply pretend that the device lacks INTx support entirely, keeping DisINTx set on the physical device, virtualizing this bit for the user, and virtualizing the interrupt pin register to indicate no INTx support. We already support virtualization of the DisINTx bit and already virtualize the interrupt pin for platforms without INTx support. By tying these components together, setting DisINTx on open and reset, and identifying devices broken in this particular way, we can provide support for them w/o the handicap of APIC level INTx masking. Intel i40e (XL710/X710) 10/20/40GbE NICs have been identified as being broken in this specific way. We leave the vfio-pci.nointxmask option as a mechanism to bypass this support, enabling INTx on the device with all the requirements of APIC level masking. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Cc: John Ronciak <john.ronciak@intel.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
-
- 28 2月, 2016 1 次提交
-
-
由 Michael S. Tsirkin 提交于
Calling return copy_to_user(...) in an ioctl will not do the right thing if there's a pagefault: copy_to_user returns the number of bytes not copied in this case. Fix up vfio to do return copy_to_user(...)) ? -EFAULT : 0; everywhere. Cc: stable@vger.kernel.org Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 26 2月, 2016 1 次提交
-
-
由 Dan Carpenter 提交于
The copy_to_user() function returns the number of bytes that were not copied but we want to return -EFAULT on error here. Fixes: 188ad9d6 ('vfio/pci: Include sparse mmap capability for MSI-X table regions') Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 23 2月, 2016 7 次提交
-
-
由 Alex Williamson 提交于
Integrated graphics may have their ROM shadowed at 0xc0000 rather than implement a PCI option ROM. Make this ROM appear to the user using the ROM BAR. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
Provide read-only access to PCI config space of the PCI host bridge and LPC bridge through device specific regions. This may be used to configure a VM with matching register contents to satisfy driver requirements. Providing this through the vfio file descriptor removes an additional userspace requirement for access through pci-sysfs and removes the CAP_SYS_ADMIN requirement that doesn't appear to apply to the specific devices we're accessing. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
This is the first consumer of vfio device specific resource support, providing read-only access to the OpRegion for Intel graphics devices. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
Typically config space for a device is mapped out into capability specific handlers and unassigned space. The latter allows direct read/write access to config space. Sometimes we know about registers living in this void space and would like an easy way to virtualize them, similar to how BAR registers are managed. To do this, create one more pseudo (fake) PCI capability to be handled as purely virtual space. Reads and writes are serviced entirely from virtual config space. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
Add support for additional regions with indexes started after the already defined fixed regions. Device specific code can register these regions with the new vfio_pci_register_dev_region() function. The ops structure per region currently only includes read/write access and a release function, allowing automatic cleanup when the device is closed. mmap support is only missing here because it's not needed by the first user queued for this support. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
vfio-pci has never allowed the user to directly mmap the MSI-X vector table, but we've always relied on implicit knowledge of the user that they cannot do this. Now that we have capability chains that we can expose in the region info ioctl and a sparse mmap capability that represents the sub-areas within the region that can be mmap'd, we can make the mmap constraints more explicit. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
Signed versus unsigned comparisons are implicitly cast to unsigned, which result in a couple possible overflows. For instance (start + count) might overflow and wrap, getting through our validation test. Also when unwinding setup, -1 being compared as unsigned doesn't produce the intended stop condition. Fix both of these and also fix vfio_msi_set_vector_signal() to validate parameters before using the vector index, though none of the callers should pass bad indexes anymore. Reported-by: NEric Auger <eric.auger@linaro.org> Reviewed-by: NEric Auger <eric.auger@linaro.org> Tested-by: NEric Auger <eric.auger@linaro.org> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 22 12月, 2015 1 次提交
-
-
由 Alex Williamson 提交于
There is really no way to safely give a user full access to a DMA capable device without an IOMMU to protect the host system. There is also no way to provide DMA translation, for use cases such as device assignment to virtual machines. However, there are still those users that want userspace drivers even under those conditions. The UIO driver exists for this use case, but does not provide the degree of device access and programming that VFIO has. In an effort to avoid code duplication, this introduces a No-IOMMU mode for VFIO. This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling the "enable_unsafe_noiommu_mode" option on the vfio driver. This should make it very clear that this mode is not safe. Additionally, CAP_SYS_RAWIO privileges are necessary to work with groups and containers using this mode. Groups making use of this support are named /dev/vfio/noiommu-$GROUP and can only make use of the special VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically binding a device without a native IOMMU group to a VFIO bus driver will taint the kernel and should therefore not be considered supported. This patch includes no-iommu support for the vfio-pci bus driver only. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 04 12月, 2015 1 次提交
-
-
由 Alex Williamson 提交于
Revert commit 033291ec ("vfio: Include No-IOMMU mode") due to lack of a user. This was originally intended to fill a need for the DPDK driver, but uptake has been slow so rather than support an unproven kernel interface revert it and revisit when userspace catches up. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 20 11月, 2015 1 次提交
-
-
由 Julia Lawall 提交于
This pci_error_handlers structure is never modified, like all the other pci_error_handlers structures, so declare it as const. Done with the help of Coccinelle. Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 09 11月, 2015 1 次提交
-
-
由 Dan Carpenter 提交于
Smatch complains about a possible out of bounds error: drivers/vfio/pci/vfio_pci_config.c:1241 vfio_cap_init() error: buffer overflow 'pci_cap_length' 20 <= 20 The problem is that pci_cap_length[] was defined as large enough to hold "PCI_CAP_ID_AF + 1" elements. The code in vfio_cap_init() assumes it has PCI_CAP_ID_MAX + 1 elements. Originally, PCI_CAP_ID_AF and PCI_CAP_ID_MAX were the same but then we introduced PCI_CAP_ID_EA in commit f80b0ba9 ("PCI: Add Enhanced Allocation register entries") so now the array is too small. Let's fix this by making the array size PCI_CAP_ID_MAX + 1. And let's make a similar change to pci_ext_cap_length[] for consistency. Also both these arrays can be made const. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 05 11月, 2015 1 次提交
-
-
由 Alex Williamson 提交于
There is really no way to safely give a user full access to a DMA capable device without an IOMMU to protect the host system. There is also no way to provide DMA translation, for use cases such as device assignment to virtual machines. However, there are still those users that want userspace drivers even under those conditions. The UIO driver exists for this use case, but does not provide the degree of device access and programming that VFIO has. In an effort to avoid code duplication, this introduces a No-IOMMU mode for VFIO. This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling the "enable_unsafe_noiommu_mode" option on the vfio driver. This should make it very clear that this mode is not safe. Additionally, CAP_SYS_RAWIO privileges are necessary to work with groups and containers using this mode. Groups making use of this support are named /dev/vfio/noiommu-$GROUP and can only make use of the special VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically binding a device without a native IOMMU group to a VFIO bus driver will taint the kernel and should therefore not be considered supported. This patch includes no-iommu support for the vfio-pci bus driver only. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 28 10月, 2015 1 次提交
-
-
由 Alex Williamson 提交于
The PCI VPD capability operates on a set of window registers in PCI config space. Writing to the address register triggers either a read or write, depending on the setting of the PCI_VPD_ADDR_F bit within the address register. The data register provides either the source for writes or the target for reads. This model is susceptible to being broken by concurrent access, for which the kernel has adopted a set of access functions to serialize these registers. Additionally, commits like 932c435c ("PCI: Add dev_flags bit to access VPD through function 0") and 7aa6ca4d ("PCI: Add VPD function 0 quirk for Intel Ethernet devices") indicate that VPD registers can be shared between functions on multifunction devices creating dependencies between otherwise independent devices. Fortunately it's quite easy to emulate the VPD registers, simply storing copies of the address and data registers in memory and triggering a VPD read or write on writes to the address register. This allows vfio users to avoid seeing spurious register changes from accesses on other devices and enables the use of shared quirks in the host kernel. We can theoretically still race with access through sysfs, but the window of opportunity is much smaller. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com> Acked-by: NMark Rustad <mark.d.rustad@intel.com>
-
- 01 10月, 2015 1 次提交
-
-
由 Feng Wu 提交于
This patch adds the registration/unregistration of an irq_bypass_producer for MSI/MSIx on vfio pci devices. Acked-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NFeng Wu <feng.wu@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-