- 13 10月, 2020 1 次提交
-
-
由 Matthew Rosato 提交于
Define a new configuration entry VFIO_PCI_ZDEV for VFIO/PCI. When this s390-only feature is configured we add capabilities to the VFIO_DEVICE_GET_INFO ioctl that describe features of the associated zPCI device and its underlying hardware. This patch is based on work previously done by Pierre Morel. Signed-off-by: NMatthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 24 8月, 2020 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
-
- 28 7月, 2020 4 次提交
-
-
由 Giovanni Cabiddu 提交于
The current generation of Intel® QuickAssist Technology devices are not designed to run in an untrusted environment because of the following issues reported in the document "Intel® QuickAssist Technology (Intel® QAT) Software for Linux" (document number 336211-014): QATE-39220 - GEN - Intel® QAT API submissions with bad addresses that trigger DMA to invalid or unmapped addresses can cause a platform hang QATE-7495 - GEN - An incorrectly formatted request to Intel® QAT can hang the entire Intel® QAT Endpoint The document is downloadable from https://01.org/intel-quickassist-technology at the following link: https://01.org/sites/default/files/downloads/336211-014-qatforlinux-releasenotes-hwv1.7_0.pdf This patch adds the following QAT devices to the denylist: DH895XCC, C3XXX and C62X. Signed-off-by: NGiovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: NFiona Trahe <fiona.trahe@intel.com> Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Giovanni Cabiddu 提交于
Add denylist of devices that by default are not probed by vfio-pci. Devices in this list may be susceptible to untrusted application, even if the IOMMU is enabled. To be accessed via vfio-pci, the user has to explicitly disable the denylist. The denylist can be disabled via the module parameter disable_denylist. Signed-off-by: NGiovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NFiona Trahe <fiona.trahe@intel.com> Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
No need to release and immediately re-acquire igate while clearing out the eventfd ctxs. Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
Intel document 333717-008, "Intel® Ethernet Controller X550 Specification Update", version 2.7, dated June 2020, includes errata #22, added in version 2.1, May 2016, indicating X550 NICs suffer from the same implementation deficiency as the 700-series NICs: "The Interrupt Status bit in the Status register of the PCIe configuration space is not implemented and is not set as described in the PCIe specification." Without the interrupt status bit, vfio-pci cannot determine when these devices signal INTx. They are therefore added to the nointx quirk. Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 17 7月, 2020 1 次提交
-
-
由 Zeng Tao 提交于
The vfio_pci_release call will free and clear the error and request eventfd ctx while these ctx could be in use at the same time in the function like vfio_pci_request, and it's expected to protect them under the vdev->igate mutex, which is missing in vfio_pci_release. This issue is introduced since commit 1518ac27 ("vfio/pci: fix memory leaks of eventfd ctx"),and since commit 5c5866c5 ("vfio/pci: Clear error and request eventfd ctx after releasing"), it's very easily to trigger the kernel panic like this: [ 9513.904346] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 9513.913091] Mem abort info: [ 9513.915871] ESR = 0x96000006 [ 9513.918912] EC = 0x25: DABT (current EL), IL = 32 bits [ 9513.924198] SET = 0, FnV = 0 [ 9513.927238] EA = 0, S1PTW = 0 [ 9513.930364] Data abort info: [ 9513.933231] ISV = 0, ISS = 0x00000006 [ 9513.937048] CM = 0, WnR = 0 [ 9513.940003] user pgtable: 4k pages, 48-bit VAs, pgdp=0000007ec7d12000 [ 9513.946414] [0000000000000008] pgd=0000007ec7d13003, p4d=0000007ec7d13003, pud=0000007ec728c003, pmd=0000000000000000 [ 9513.956975] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 9513.962521] Modules linked in: vfio_pci vfio_virqfd vfio_iommu_type1 vfio hclge hns3 hnae3 [last unloaded: vfio_pci] [ 9513.972998] CPU: 4 PID: 1327 Comm: bash Tainted: G W 5.8.0-rc4+ #3 [ 9513.980443] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B270.01 05/08/2020 [ 9513.989274] pstate: 80400089 (Nzcv daIf +PAN -UAO BTYPE=--) [ 9513.994827] pc : _raw_spin_lock_irqsave+0x48/0x88 [ 9513.999515] lr : eventfd_signal+0x6c/0x1b0 [ 9514.003591] sp : ffff800038a0b960 [ 9514.006889] x29: ffff800038a0b960 x28: ffff007ef7f4da10 [ 9514.012175] x27: ffff207eefbbfc80 x26: ffffbb7903457000 [ 9514.017462] x25: ffffbb7912191000 x24: ffff007ef7f4d400 [ 9514.022747] x23: ffff20be6e0e4c00 x22: 0000000000000008 [ 9514.028033] x21: 0000000000000000 x20: 0000000000000000 [ 9514.033321] x19: 0000000000000008 x18: 0000000000000000 [ 9514.038606] x17: 0000000000000000 x16: ffffbb7910029328 [ 9514.043893] x15: 0000000000000000 x14: 0000000000000001 [ 9514.049179] x13: 0000000000000000 x12: 0000000000000002 [ 9514.054466] x11: 0000000000000000 x10: 0000000000000a00 [ 9514.059752] x9 : ffff800038a0b840 x8 : ffff007ef7f4de60 [ 9514.065038] x7 : ffff007fffc96690 x6 : fffffe01faffb748 [ 9514.070324] x5 : 0000000000000000 x4 : 0000000000000000 [ 9514.075609] x3 : 0000000000000000 x2 : 0000000000000001 [ 9514.080895] x1 : ffff007ef7f4d400 x0 : 0000000000000000 [ 9514.086181] Call trace: [ 9514.088618] _raw_spin_lock_irqsave+0x48/0x88 [ 9514.092954] eventfd_signal+0x6c/0x1b0 [ 9514.096691] vfio_pci_request+0x84/0xd0 [vfio_pci] [ 9514.101464] vfio_del_group_dev+0x150/0x290 [vfio] [ 9514.106234] vfio_pci_remove+0x30/0x128 [vfio_pci] [ 9514.111007] pci_device_remove+0x48/0x108 [ 9514.115001] device_release_driver_internal+0x100/0x1b8 [ 9514.120200] device_release_driver+0x28/0x38 [ 9514.124452] pci_stop_bus_device+0x68/0xa8 [ 9514.128528] pci_stop_and_remove_bus_device+0x20/0x38 [ 9514.133557] pci_iov_remove_virtfn+0xb4/0x128 [ 9514.137893] sriov_disable+0x3c/0x108 [ 9514.141538] pci_disable_sriov+0x28/0x38 [ 9514.145445] hns3_pci_sriov_configure+0x48/0xb8 [hns3] [ 9514.150558] sriov_numvfs_store+0x110/0x198 [ 9514.154724] dev_attr_store+0x44/0x60 [ 9514.158373] sysfs_kf_write+0x5c/0x78 [ 9514.162018] kernfs_fop_write+0x104/0x210 [ 9514.166010] __vfs_write+0x48/0x90 [ 9514.169395] vfs_write+0xbc/0x1c0 [ 9514.172694] ksys_write+0x74/0x100 [ 9514.176079] __arm64_sys_write+0x24/0x30 [ 9514.179987] el0_svc_common.constprop.4+0x110/0x200 [ 9514.184842] do_el0_svc+0x34/0x98 [ 9514.188144] el0_svc+0x14/0x40 [ 9514.191185] el0_sync_handler+0xb0/0x2d0 [ 9514.195088] el0_sync+0x140/0x180 [ 9514.198389] Code: b9001020 d2800000 52800022 f9800271 (885ffe61) [ 9514.204455] ---[ end trace 648de00c8406465f ]--- [ 9514.212308] note: bash[1327] exited with preempt_count 1 Cc: Qian Cai <cai@lca.pw> Cc: Alex Williamson <alex.williamson@redhat.com> Fixes: 1518ac27 ("vfio/pci: fix memory leaks of eventfd ctx") Signed-off-by: NZeng Tao <prime.zeng@hisilicon.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 18 6月, 2020 1 次提交
-
-
由 Alex Williamson 提交于
The next use of the device will generate an underflow from the stale reference. Cc: Qian Cai <cai@lca.pw> Fixes: 1518ac27 ("vfio/pci: fix memory leaks of eventfd ctx") Reported-by: NDaniel Wagner <dwagner@suse.de> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Tested-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 10 6月, 2020 2 次提交
-
-
由 Michel Lespinasse 提交于
Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
Convert the last few remaining mmap_sem rwsem calls to use the new mmap locking API. These were missed by coccinelle for some reason (I think coccinelle does not support some of the preprocessor constructs in these files ?) [akpm@linux-foundation.org: convert linux-next leftovers] [akpm@linux-foundation.org: more linux-next leftovers] [akpm@linux-foundation.org: more linux-next leftovers] Signed-off-by: NMichel Lespinasse <walken@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-6-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 5月, 2020 1 次提交
-
-
由 Qian Cai 提交于
Finished a qemu-kvm (-device vfio-pci,host=0001:01:00.0) triggers a few memory leaks after a while because vfio_pci_set_ctx_trigger_single() calls eventfd_ctx_fdget() without the matching eventfd_ctx_put() later. Fix it by calling eventfd_ctx_put() for those memory in vfio_pci_release() before vfio_device_release(). unreferenced object 0xebff008981cc2b00 (size 128): comm "qemu-kvm", pid 4043, jiffies 4294994816 (age 9796.310s) hex dump (first 32 bytes): 01 00 00 00 6b 6b 6b 6b 00 00 00 00 ad 4e ad de ....kkkk.....N.. ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff ....kkkk........ backtrace: [<00000000917e8f8d>] slab_post_alloc_hook+0x74/0x9c [<00000000df0f2aa2>] kmem_cache_alloc_trace+0x2b4/0x3d4 [<000000005fcec025>] do_eventfd+0x54/0x1ac [<0000000082791a69>] __arm64_sys_eventfd2+0x34/0x44 [<00000000b819758c>] do_el0_svc+0x128/0x1dc [<00000000b244e810>] el0_sync_handler+0xd0/0x268 [<00000000d495ef94>] el0_sync+0x164/0x180 unreferenced object 0x29ff008981cc4180 (size 128): comm "qemu-kvm", pid 4043, jiffies 4294994818 (age 9796.290s) hex dump (first 32 bytes): 01 00 00 00 6b 6b 6b 6b 00 00 00 00 ad 4e ad de ....kkkk.....N.. ff ff ff ff 6b 6b 6b 6b ff ff ff ff ff ff ff ff ....kkkk........ backtrace: [<00000000917e8f8d>] slab_post_alloc_hook+0x74/0x9c [<00000000df0f2aa2>] kmem_cache_alloc_trace+0x2b4/0x3d4 [<000000005fcec025>] do_eventfd+0x54/0x1ac [<0000000082791a69>] __arm64_sys_eventfd2+0x34/0x44 [<00000000b819758c>] do_el0_svc+0x128/0x1dc [<00000000b244e810>] el0_sync_handler+0xd0/0x268 [<00000000d495ef94>] el0_sync+0x164/0x180 Signed-off-by: NQian Cai <cai@lca.pw> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 18 5月, 2020 2 次提交
-
-
由 Alex Williamson 提交于
Accessing the disabled memory space of a PCI device would typically result in a master abort response on conventional PCI, or an unsupported request on PCI express. The user would generally see these as a -1 response for the read return data and the write would be silently discarded, possibly with an uncorrected, non-fatal AER error triggered on the host. Some systems however take it upon themselves to bring down the entire system when they see something that might indicate a loss of data, such as this discarded write to a disabled memory space. To avoid this, we want to try to block the user from accessing memory spaces while they're disabled. We start with a semaphore around the memory enable bit, where writers modify the memory enable state and must be serialized, while readers make use of the memory region and can access in parallel. Writers include both direct manipulation via the command register, as well as any reset path where the internal mechanics of the reset may both explicitly and implicitly disable memory access, and manipulation of the MSI-X configuration, where the MSI-X vector table resides in MMIO space of the device. Readers include the read and write file ops to access the vfio device fd offsets as well as memory mapped access. In the latter case, we make use of our new vma list support to zap, or invalidate, those memory mappings in order to force them to be faulted back in on access. Our semaphore usage will stall user access to MMIO spaces across internal operations like reset, but the user might experience new behavior when trying to access the MMIO space while disabled via the PCI command register. Access via read or write while disabled will return -EIO and access via memory maps will result in a SIGBUS. This is expected to be compatible with known use cases and potentially provides better error handling capabilities than present in the hardware, while avoiding the more readily accessible and severe platform error responses that might otherwise occur. Fixes: CVE-2020-12888 Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
Rather than calling remap_pfn_range() when a region is mmap'd, setup a vm_ops handler to support dynamic faulting of the range on access. This allows us to manage a list of vmas actively mapping the area that we can later use to invalidate those mappings. The open callback invalidates the vma range so that all tracking is inserted in the fault handler and removed in the close handler. Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 24 3月, 2020 6 次提交
-
-
由 Alex Williamson 提交于
The cleanup is getting a tad long. Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
It currently results in messages like: "vfio-pci 0000:03:00.0: vfio_pci: ..." Which is quite a bit redundant. Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
With the VF Token interface we can now expect that a vfio userspace driver must be in collaboration with the PF driver, an unwitting userspace driver will not be able to get past the GET_DEVICE_FD step in accessing the device. We can now move on to actually allowing SR-IOV to be enabled by vfio-pci on the PF. Support for this is not enabled by default in this commit, but it does provide a module option for this to be enabled (enable_sriov=1). Enabling VFs is rather straightforward, except we don't want to risk that a VF might get autoprobed and bound to other drivers, so a bus notifier is used to "capture" VFs to vfio-pci using the driver_override support. We assume any later action to bind the device to other drivers is condoned by the system admin and allow it with a log warning. vfio-pci will disable SR-IOV on a PF before releasing the device, allowing a VF driver to be assured other drivers cannot take over the PF and that any other userspace driver must know the shared VF token. This support also does not provide a mechanism for the PF userspace driver itself to manipulate SR-IOV through the vfio API. With this patch SR-IOV can only be enabled via the host sysfs interface and the PF driver user cannot create or remove VFs. Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
The VFIO_DEVICE_FEATURE ioctl is meant to be a general purpose, device agnostic ioctl for setting, retrieving, and probing device features. This implementation provides a 16-bit field for specifying a feature index, where the data porition of the ioctl is determined by the semantics for the given feature. Additional flag bits indicate the direction and nature of the operation; SET indicates user data is provided into the device feature, GET indicates the device feature is written out into user data. The PROBE flag augments determining whether the given feature is supported, and if provided, whether the given operation on the feature is supported. The first user of this ioctl is for setting the vfio-pci VF token, where the user provides a shared secret key (UUID) on a SR-IOV PF device, which users must provide when opening associated VF devices. Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
If we enable SR-IOV on a vfio-pci owned PF, the resulting VFs are not fully isolated from the PF. The PF can always cause a denial of service to the VF, even if by simply resetting itself. The degree to which a PF can access the data passed through a VF or interfere with its operation is dependent on a given SR-IOV implementation. Therefore we want to avoid a scenario where an existing vfio-pci based userspace driver might assume the PF driver is trusted, for example assigning a PF to one VM and VF to another with some expectation of isolation. IOMMU grouping could be a solution to this, but imposes an unnecessarily strong relationship between PF and VF drivers if they need to operate with the same IOMMU context. Instead we introduce a "VF token", which is essentially just a shared secret between PF and VF drivers, implemented as a UUID. The VF token can be set by a vfio-pci based PF driver and must be known by the vfio-pci based VF driver in order to gain access to the device. This allows the degree to which this VF token is considered secret to be determined by the applications and environment. For example a VM might generate a random UUID known only internally to the hypervisor while a userspace networking appliance might use a shared, or even well know, UUID among the application drivers. To incorporate this VF token, the VFIO_GROUP_GET_DEVICE_FD interface is extended to accept key=value pairs in addition to the device name. This allows us to most easily deny user access to the device without risk that existing userspace drivers assume region offsets, IRQs, and other device features, leading to more elaborate error paths. The format of these options are expected to take the form: "$DEVICE_NAME $OPTION1=$VALUE1 $OPTION2=$VALUE2" Where the device name is always provided first for compatibility and additional options are specified in a space separated list. The relation between and requirements for the additional options will be vfio bus driver dependent, however unknown or unused option within this schema should return error. This allow for future use of unknown options as well as a positive indication to the user that an option is used. An example VF token option would take this form: "0000:03:00.0 vf_token=2ab74924-c335-45f4-9b16-8569e5b08258" When accessing a VF where the PF is making use of vfio-pci, the user MUST provide the current vf_token. When accessing a PF, the user MUST provide the current vf_token IF there are active VF users or MAY provide a vf_token in order to set the current VF token when no VF users are active. The former requirement assures VF users that an unassociated driver cannot usurp the PF device. These semantics also imply that a VF token MUST be set by a PF driver before VF drivers can access their device, the default token is random and mechanisms to read the token are not provided in order to protect the VF token of previous users. Use of the vf_token option outside of these cases will return an error, as discussed above. Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
This currently serves the same purpose as the default implementation but will be expanded for additional functionality. Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 14 10月, 2019 1 次提交
-
-
由 Denis Efremov 提交于
Code that iterates over all standard PCI BARs typically uses PCI_STD_RESOURCE_END. However, that requires the unusual test "i <= PCI_STD_RESOURCE_END" rather than something the typical "i < PCI_STD_NUM_BARS". Add a definition for PCI_STD_NUM_BARS and change loops to use the more idiomatic C style to help avoid fencepost errors. Link: https://lore.kernel.org/r/20190927234026.23342-1-efremov@linux.com Link: https://lore.kernel.org/r/20190927234308.23935-1-efremov@linux.com Link: https://lore.kernel.org/r/20190916204158.6889-3-efremov@linux.comSigned-off-by: NDenis Efremov <efremov@linux.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: Sebastian Ott <sebott@linux.ibm.com> # arch/s390/ Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> # video/fbdev/ Acked-by: Gustavo Pimentel <gustavo.pimentel@synopsys.com> # pci/controller/dwc/ Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> # scsi/pm8001/ Acked-by: Martin K. Petersen <martin.petersen@oracle.com> # scsi/pm8001/ Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # memstick/
-
- 23 8月, 2019 1 次提交
-
-
由 hexin 提交于
vfio_pci_enable() saves the device's initial configuration information with the intent that it is restored in vfio_pci_disable(). However, the commit referenced in Fixes: below replaced the call to __pci_reset_function_locked(), which is not wrapped in a state save and restore, with pci_try_reset_function(), which overwrites the restored device state with the current state before applying it to the device. Reinstate use of __pci_reset_function_locked() to return to the desired behavior. Fixes: 890ed578 ("vfio-pci: Use pci "try" reset interface") Signed-off-by: Nhexin <hexin15@baidu.com> Signed-off-by: NLiu Qi <liuqi16@baidu.com> Signed-off-by: NZhang Yu <zhangyu31@baidu.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 19 6月, 2019 1 次提交
-
-
由 Thomas Gleixner 提交于
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NEnrico Weigelt <info@metux.net> Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org> Reviewed-by: NAllison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 23 4月, 2019 1 次提交
-
-
由 Bjorn Helgaas 提交于
Use dev_printk() when possible to make messages consistent with other device-related messages. Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: NEric Auger <eric.auger@redhat.com> Reviewed-by: NEric Auger <eric.auger@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 04 4月, 2019 1 次提交
-
-
由 Louis Taylor 提交于
When compiling with -Wformat, clang emits the following warnings: drivers/vfio/pci/vfio_pci.c:1601:5: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1601:13: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1601:21: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ drivers/vfio/pci/vfio_pci.c:1601:32: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ drivers/vfio/pci/vfio_pci.c:1605:5: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1605:13: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~ drivers/vfio/pci/vfio_pci.c:1605:21: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ drivers/vfio/pci/vfio_pci.c:1605:32: warning: format specifies type 'unsigned short' but the argument has type 'unsigned int' [-Wformat] vendor, device, subvendor, subdevice, ^~~~~~~~~ The types of these arguments are unconditionally defined, so this patch updates the format character to the correct ones for unsigned ints. Link: https://github.com/ClangBuiltLinux/linux/issues/378Signed-off-by: NLouis Taylor <louis@kragniz.eu> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 19 2月, 2019 2 次提交
-
-
由 Eric Auger 提交于
pci_map_rom/pci_get_rom_size() performs memory access in the ROM. In case the Memory Space accesses were disabled, readw() is likely to trigger a synchronous external abort on some platforms. In case memory accesses were disabled, re-enable them before the call and disable them back again just after. Fixes: 89e1f7d4 ("vfio: Add PCI device driver") Signed-off-by: NEric Auger <eric.auger@redhat.com> Suggested-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
PCI core handles save and restore of device state around reset, but when using pci_set_power_state() we can unintentionally trigger a soft reset of the device, where PCI core only restores the BAR state. If we're using vfio-pci's idle D3 support to try to put devices into low power when unused, this might trigger a reset when the device is woken for use. Also power state management by the user, or within a guest, can put the device into D3 power state with potentially limited ability to restore the device if it should undergo a reset. The PCI spec does not define the extent of a soft reset and many devices reporting soft reset on D3->D0 transition do not undergo a PCI config space reset. It's therefore assumed safe to unconditionally restore the remainder of the state if the device indicates soft reset support, even on a user initiated wakeup. Implement a wrapper in vfio-pci to tag devices reporting PM reset support, save their state on transitions into D3 and restore on transitions back to D0. Reported-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 21 12月, 2018 3 次提交
-
-
由 Alexey Kardashevskiy 提交于
POWER9 Witherspoon machines come with 4 or 6 V100 GPUs which are not pluggable PCIe devices but still have PCIe links which are used for config space and MMIO. In addition to that the GPUs have 6 NVLinks which are connected to other GPUs and the POWER9 CPU. POWER9 chips have a special unit on a die called an NPU which is an NVLink2 host bus adapter with p2p connections to 2 to 3 GPUs, 3 or 2 NVLinks to each. These systems also support ATS (address translation services) which is a part of the NVLink2 protocol. Such GPUs also share on-board RAM (16GB or 32GB) to the system via the same NVLink2 so a CPU has cache-coherent access to a GPU RAM. This exports GPU RAM to the userspace as a new VFIO device region. This preregisters the new memory as device memory as it might be used for DMA. This inserts pfns from the fault handler as the GPU memory is not onlined until the vendor driver is loaded and trained the NVLinks so doing this earlier causes low level errors which we fence in the firmware so it does not hurt the host system but still better be avoided; for the same reason this does not map GPU RAM into the host kernel (usual thing for emulated access otherwise). This exports an ATSD (Address Translation Shootdown) register of NPU which allows TLB invalidations inside GPU for an operating system. The register conveniently occupies a single 64k page. It is also presented to the userspace as a new VFIO device region. One NPU has 8 ATSD registers, each of them can be used for TLB invalidation in a GPU linked to this NPU. This allocates one ATSD register per an NVLink bridge allowing passing up to 6 registers. Due to the host firmware bug (just recently fixed), only 1 ATSD register per NPU was actually advertised to the host system so this passes that alone register via the first NVLink bridge device in the group which is still enough as QEMU collects them all back and presents to the guest via vPHB to mimic the emulated NPU PHB on the host. In order to provide the userspace with the information about GPU-to-NVLink connections, this exports an additional capability called "tgt" (which is an abbreviated host system bus address). The "tgt" property tells the GPU its own system address and allows the guest driver to conglomerate the routing information so each GPU knows how to get directly to the other GPUs. For ATS to work, the nest MMU (an NVIDIA block in a P9 CPU) needs to know LPID (a logical partition ID or a KVM guest hardware ID in other words) and PID (a memory context ID of a userspace process, not to be confused with a linux pid). This assigns a GPU to LPID in the NPU and this is why this adds a listener for KVM on an IOMMU group. A PID comes via NVLink from a GPU and NPU uses a PID wildcard to pass it through. This requires coherent memory and ATSD to be available on the host as the GPU vendor only supports configurations with both features enabled and other configurations are known not to work. Because of this and because of the ways the features are advertised to the host system (which is a device tree with very platform specific properties), this requires enabled POWERNV platform. The V100 GPUs do not advertise any of these capabilities via the config space and there are more than just one device ID so this relies on the platform to tell whether these GPUs have special abilities such as NVLinks. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
VFIO regions already support region capabilities with a limited set of fields. However the subdriver might have to report to the userspace additional bits. This adds an add_capability() hook to vfio_pci_regops. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
So far we only allowed mapping of MMIO BARs to the userspace. However there are GPUs with on-board coherent RAM accessible via side channels which we also want to map to the userspace. The first client for this is NVIDIA V100 GPU with NVLink2 direct links to a POWER9 NPU-enabled CPU; such GPUs have 16GB RAM which is coherently mapped to the system address space, we are going to export these as an extra PCI region. We already support extra PCI regions and this adds support for mapping them to the userspace. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Acked-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 13 12月, 2018 1 次提交
-
-
由 Alex Williamson 提交于
In commit 61d79256 ("vfio-pci: Use mutex around open, release, and remove") a mutex was added to freeze the refcnt for a device so that we can handle errors and perform bus resets on final close. However, bus resets can be rather slow and a global mutex here is undesirable. Evaluating the potential locking granularity, a per-device mutex provides the best resolution but with multiple devices on a bus all released concurrently, they'll race to acquire each other's mutex, likely resulting in no reset at all if we use trylock. We therefore lock at the granularity of the bus/slot reset as we're only attempting a single reset for this group of devices anyway. This allows much greater scaling as we're bounded in the number of devices protected by a single reflck object. Reported-by: NChristian Ehrhardt <christian.ehrhardt@canonical.com> Tested-by: NChristian Ehrhardt <christian.ehrhardt@canonical.com> Reviewed-by: NEric Auger <eric.auger@redhat.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 26 9月, 2018 1 次提交
-
-
由 Alex Williamson 提交于
The SR-IOV spec requires that VFs must report zero for the INTx pin register as VFs are precluded from INTx support. It's much easier for the host kernel to understand whether a device is a VF and therefore whether a non-zero pin register value is bogus than it is to do the same in userspace. Override the INTx count for such devices and virtualize the pin register to provide a consistent view of the device to the user. As this is clearly a spec violation, warn about it to support hardware validation, but also provide a known whitelist as it doesn't do much good to continue complaining if the hardware vendor doesn't plan to fix it. Known devices with this issue: 8086:270c Tested-by: NGage Eads <gage.eads@intel.com> Reviewed-by: NAshok Raj <ashok.raj@intel.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 07 8月, 2018 2 次提交
-
-
由 Alex Williamson 提交于
We expect to receive PFs with SR-IOV disabled, however some host drivers leave SR-IOV enabled at unbind. This puts us in a state where we can potentially assign both the PF and the VF, leading to both functionality as well as security concerns due to lack of managing the SR-IOV state as well as vendor dependent isolation from the PF to VF. If we were to attempt to actively disable SR-IOV on driver probe, we risk VF bound drivers blocking, potentially risking live lock scenarios. Therefore simply refuse to bind to PFs with SR-IOV enabled with a warning message indicating the issue. Users can resolve this by re-binding to the host driver and disabling SR-IOV before attempting to use the device with vfio-pci. Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Gustavo A. R. Silva 提交于
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 20 7月, 2018 2 次提交
-
-
由 Sinan Kaya 提交于
Now that the old implementation of pci_reset_bus() is gone, replace pci_try_reset_bus() with pci_reset_bus(). Compared to the old implementation, new code will fail immmediately with -EAGAIN if object lock cannot be obtained. Signed-off-by: NSinan Kaya <okaya@codeaurora.org> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
由 Sinan Kaya 提交于
Drivers are expected to call pci_try_reset_slot() or pci_try_reset_bus() by querying if a system supports hotplug or not. A survey showed that most drivers don't do this and we are leaking hotplug capability to the user. Hide pci_try_slot_reset() from drivers and embed into pci_try_bus_reset(). Change pci_try_reset_bus() parameter from struct pci_bus to struct pci_dev. Signed-off-by: NSinan Kaya <okaya@codeaurora.org> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 19 7月, 2018 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
info.index can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/vfio/pci/vfio_pci.c:734 vfio_pci_ioctl() warn: potential spectre issue 'vdev->region' Fix this by sanitizing info.index before indirectly using it to index vdev->region Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Cc: stable@vger.kernel.org Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 27 3月, 2018 1 次提交
-
-
由 Alex Williamson 提交于
The ioeventfd here is actually irqfd handling of an ioeventfd such as supported in KVM. A user is able to pre-program a device write to occur when the eventfd triggers. This is yet another instance of eventfd-irqfd triggering between KVM and vfio. The impetus for this is high frequency writes to pages which are virtualized in QEMU. Enabling this near-direct write path for selected registers within the virtualized page can improve performance and reduce overhead. Specifically this is initially targeted at NVIDIA graphics cards where the driver issues a write to an MMIO register within a virtualized region in order to allow the MSI interrupt to re-trigger. Reviewed-by: NPeter Xu <peterx@redhat.com> Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 22 3月, 2018 1 次提交
-
-
由 Alex Williamson 提交于
This reverts commit 2170dd04 The intent of commit 2170dd04 ("vfio-pci: Mask INTx if a device is not capabable of enabling it") was to disallow the user from seeing that the device supports INTx if the platform is incapable of enabling it. The detection of this case however incorrectly includes devices which natively do not support INTx, such as SR-IOV VFs, and further discussions reveal gaps even for the target use case. Reported-by: NArjun Vynipadath <arjun@chelsio.com> Fixes: 2170dd04 ("vfio-pci: Mask INTx if a device is not capabable of enabling it") Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 21 12月, 2017 2 次提交
-
-
由 Alexey Kardashevskiy 提交于
By default VFIO disables mapping of MSIX BAR to the userspace as the userspace may program it in a way allowing spurious interrupts; instead the userspace uses the VFIO_DEVICE_SET_IRQS ioctl. In order to eliminate guessing from the userspace about what is mmapable, VFIO also advertises a sparse list of regions allowed to mmap. This works fine as long as the system page size equals to the MSIX alignment requirement which is 4KB. However with a bigger page size the existing code prohibits mapping non-MSIX parts of a page with MSIX structures so these parts have to be emulated via slow reads/writes on a VFIO device fd. If these emulated bits are accessed often, this has serious impact on performance. This allows mmap of the entire BAR containing MSIX vector table. This removes the sparse capability for PCI devices as it becomes useless. As the userspace needs to know for sure whether mmapping of the MSIX vector containing data can succeed, this adds a new capability - VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - which explicitly tells the userspace that the entire BAR can be mmapped. This does not touch the MSIX mangling in the BAR read/write handlers as we are doing this just to enable direct access to non MSIX registers. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> [aw - fixup whitespace, trim function name] Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
The vfio_info_add_capability() helper requires the caller to pass a capability ID, which it then uses to fill in header fields, assuming hard coded versions. This makes for an awkward and rigid interface. The only thing we want this helper to do is allocate sufficient space in the caps buffer and chain this capability into the list. Reduce it to that simple task. Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NZhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: NKirti Wankhede <kwankhede@nvidia.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Reviewed-by: NEric Auger <eric.auger@redhat.com> Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-