- 24 5月, 2022 2 次提交
-
-
由 Jason Gunthorpe 提交于
When the iommu series adding driver_managed_dma was rebased it missed that new VFIO drivers were added and did not update them too. Without this vfio will claim the groups are not viable. Add driver_managed_dma to mlx5 and hisi. Fixes: 70693f47 ("vfio: Set DMA ownership for VFIO devices") Reported-by: NYishai Hadas <yishaih@nvidia.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Tested-by: NShameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-f9dfa642fab0+2b3-vfio_managed_dma_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
Since asserting dma ownership now causes the group to have its DMA blocked the iommu layer requires a working iommu. This means the dma_owner APIs cannot be used on the fake groups that VFIO creates. Test for this and avoid calling them. Otherwise asserting dma ownership will fail for VFIO mdev devices as a BLOCKING iommu_domain cannot be allocated due to the NULL iommu ops. Fixes: 0286300e ("iommu: iommu_group_claim_dma_owner() must always assign a domain") Reported-by: NEric Farman <farman@linux.ibm.com> Tested-by: NEric Farman <farman@linux.ibm.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/0-v1-9cfc47edbcd4+13546-vfio_dma_owner_fix_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 19 5月, 2022 3 次提交
-
-
由 Abhishek Sahu 提交于
Currently, there is very limited power management support available in the upstream vfio_pci_core based drivers. If there are no users of the device, then the PCI device will be moved into D3hot state by writing directly into PCI PM registers. This D3hot state help in saving power but we can achieve zero power consumption if we go into the D3cold state. The D3cold state cannot be possible with native PCI PM. It requires interaction with platform firmware which is system-specific. To go into low power states (including D3cold), the runtime PM framework can be used which internally interacts with PCI and platform firmware and puts the device into the lowest possible D-States. This patch registers vfio_pci_core based drivers with the runtime PM framework. 1. The PCI core framework takes care of most of the runtime PM related things. For enabling the runtime PM, the PCI driver needs to decrement the usage count and needs to provide 'struct dev_pm_ops' at least. The runtime suspend/resume callbacks are optional and needed only if we need to do any extra handling. Now there are multiple vfio_pci_core based drivers. Instead of assigning the 'struct dev_pm_ops' in individual parent driver, the vfio_pci_core itself assigns the 'struct dev_pm_ops'. There are other drivers where the 'struct dev_pm_ops' is being assigned inside core layer (For example, wlcore_probe() and some sound based driver, etc.). 2. This patch provides the stub implementation of 'struct dev_pm_ops'. The subsequent patch will provide the runtime suspend/resume callbacks. All the config state saving, and PCI power management related things will be done by PCI core framework itself inside its runtime suspend/resume callbacks (pci_pm_runtime_suspend() and pci_pm_runtime_resume()). 3. Inside pci_reset_bus(), all the devices in dev_set needs to be runtime resumed. vfio_pci_dev_set_pm_runtime_get() will take care of the runtime resume and its error handling. 4. Inside vfio_pci_core_disable(), the device usage count always needs to be decremented which was incremented in vfio_pci_core_enable(). 5. Since the runtime PM framework will provide the same functionality, so directly writing into PCI PM config register can be replaced with the use of runtime PM routines. Also, the use of runtime PM can help us in more power saving. In the systems which do not support D3cold, With the existing implementation: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3hot So, with runtime PM, the upstream bridge or root port will also go into lower power state which is not possible with existing implementation. In the systems which support D3cold, // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3hot // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D0 With runtime PM: // PCI device # cat /sys/bus/pci/devices/0000\:01\:00.0/power_state D3cold // upstream bridge # cat /sys/bus/pci/devices/0000\:00\:01.0/power_state D3cold So, with runtime PM, both the PCI device and upstream bridge will go into D3cold state. 6. If 'disable_idle_d3' module parameter is set, then also the runtime PM will be enabled, but in this case, the usage count should not be decremented. 7. vfio_pci_dev_set_try_reset() return value is unused now, so this function return type can be changed to void. 8. Use the runtime PM API's in vfio_pci_core_sriov_configure(). The device can be in low power state either with runtime power management (when there is no user) or PCI_PM_CTRL register write by the user. In both the cases, the PF should be moved to D0 state. For preventing any runtime usage mismatch, pci_num_vf() has been called explicitly during disable. Signed-off-by: NAbhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-5-abhsahu@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Abhishek Sahu 提交于
If any PME event will be generated by PCI, then it will be mostly handled in the host by the root port PME code. For example, in the case of PCIe, the PME event will be sent to the root port and then the PME interrupt will be generated. This will be handled in drivers/pci/pcie/pme.c at the host side. Inside this, the pci_check_pme_status() will be called where PME_Status and PME_En bits will be cleared. So, the guest OS which is using vfio-pci device will not come to know about this PME event. To handle these PME events inside guests, we need some framework so that if any PME events will happen, then it needs to be forwarded to virtual machine monitor. We can virtualize PME related registers bits and initialize these bits to zero so vfio-pci device user will assume that it is not capable of asserting the PME# signal from any power state. Signed-off-by: NAbhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-4-abhsahu@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Abhishek Sahu 提交于
According to [PCIe v5 9.6.2] for PF Device Power Management States "The PF's power management state (D-state) has global impact on its associated VFs. If a VF does not implement the Power Management Capability, then it behaves as if it is in an equivalent power state of its associated PF. If a VF implements the Power Management Capability, the Device behavior is undefined if the PF is placed in a lower power state than the VF. Software should avoid this situation by placing all VFs in lower power state before lowering their associated PF's power state." From the vfio driver side, user can enable SR-IOV when the PF is in D3hot state. If VF does not implement the Power Management Capability, then the VF will be actually in D3hot state and then the VF BAR access will fail. If VF implements the Power Management Capability, then VF will assume that its current power state is D0 when the PF is D3hot and in this case, the behavior is undefined. To support PF power management, we need to create power management dependency between PF and its VF's. The runtime power management support may help with this where power management dependencies are supported through device links. But till we have such support in place, we can disallow the PF to go into low power state, if PF has VF enabled. There can be a case, where user first enables the VF's and then disables the VF's. If there is no user of PF, then the PF can put into D3hot state again. But with this patch, the PF will still be in D0 state after disabling VF's since detecting this case inside vfio_pci_core_sriov_configure() requires access to struct vfio_device::open_count along with its locks. But the subsequent patches related to runtime PM will handle this case since runtime PM maintains its own usage count. Also, vfio_pci_core_sriov_configure() can be called at any time (with and without vfio pci device user), so the power state change and SR-IOV enablement need to be protected with the required locks. Signed-off-by: NAbhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-3-abhsahu@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 18 5月, 2022 8 次提交
-
-
由 Abhishek Sahu 提交于
According to [PCIe v5 5.3.1.4.1] for D3hot state "Configuration and Message requests are the only TLPs accepted by a Function in the D3Hot state. All other received Requests must be handled as Unsupported Requests, and all received Completions may optionally be handled as Unexpected Completions." Currently, if the vfio PCI device has been put into D3hot state and if user makes non-config related read/write request in D3hot state, these requests will be forwarded to the host and this access may cause issues on a few systems. This patch leverages the memory-disable support added in commit 'abafbc55 ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory")' to generate page fault on mmap access and return error for the direct read/write. If the device is D3hot state, then the error will be returned for MMIO access. The IO access generally does not make the system unresponsive so the IO access can still happen in D3hot state. The default value should be returned in this case without bringing down the complete system. Also, the power related structure fields need to be protected so we can use the same 'memory_lock' to protect these fields also. This protection is mainly needed when user changes the PCI power state by writing into PCI_PM_CTRL register. vfio_lock_and_set_power_state() wrapper function will take the required locks and then it will invoke the vfio_pci_set_power_state(). Signed-off-by: NAbhishek Sahu <abhsahu@nvidia.com> Link: https://lore.kernel.org/r/20220518111612.16985-2-abhsahu@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
Now that everything is fully locked there is no need for container_users to remain as an atomic, change it to an unsigned int. Use 'if (group->container)' as the test to determine if the container is present or not instead of using container_users. Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Tested-by: NNicolin Chen <nicolinc@nvidia.com> Tested-by: NMatthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/6-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
Once userspace opens a group FD it is prevented from opening another instance of that same group FD until all the prior group FDs and users of the container are done. The first is done trivially by checking the group->opened during group FD open. However, things get a little weird if userspace creates a device FD and then closes the group FD. The group FD still cannot be re-opened, but this time it is because the group->container is still set and container_users is elevated by the device FD. Due to this mismatched lifecycle we have the vfio_group_try_dissolve_container() which tries to auto-free a container after the group FD is closed but the device FD remains open. Instead have the device FD hold onto a reference to the single group FD. This directly prevents vfio_group_fops_release() from being called when any device FD exists and makes the lifecycle model more understandable. vfio_group_try_dissolve_container() is removed as the only place a container is auto-deleted is during vfio_group_fops_release(). At this point the container_users is either 1 or 0 since all device FDs must be closed. Change group->opened to group->opened_file which points to the single struct file * that is open for the group. If the group->open_file is NULL then group->container == NULL. If all device FDs have closed then the group's notifier list must be empty. Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Tested-by: NNicolin Chen <nicolinc@nvidia.com> Tested-by: NMatthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/5-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
This is necessary to avoid various user triggerable races, for instance racing SET_CONTAINER/UNSET_CONTAINER: ioctl(VFIO_GROUP_SET_CONTAINER) ioctl(VFIO_GROUP_UNSET_CONTAINER) vfio_group_unset_container int users = atomic_cmpxchg(&group->container_users, 1, 0); // users == 1 container_users == 0 __vfio_group_unset_container(group); container = group->container; vfio_group_set_container() if (!atomic_read(&group->container_users)) down_write(&container->group_lock); group->container = container; up_write(&container->group_lock); down_write(&container->group_lock); group->container = NULL; up_write(&container->group_lock); vfio_container_put(container); /* woops we lost/leaked the new container */ This can then go on to NULL pointer deref since container == 0 and container_users == 1. Wrap all touches of container, except those on a performance path with a known open device, with the group_rwsem. The only user of vfio_group_add_container_user() holds the user count for a simple operation, change it to just hold the group_lock over the operation and delete vfio_group_add_container_user(). Containers now only gain a user when a device FD is opened. Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Tested-by: NNicolin Chen <nicolinc@nvidia.com> Tested-by: NMatthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/4-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
The split follows the pairing with the destroy functions: - vfio_group_get_device_fd() destroyed by close() - vfio_device_open() destroyed by vfio_device_fops_release() - vfio_device_assign_container() destroyed by vfio_group_try_dissolve_container() The next patch will put a lock around vfio_device_assign_container(). Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Tested-by: NNicolin Chen <nicolinc@nvidia.com> Tested-by: NMatthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/3-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
This is not a performance path, just use the group_rwsem to protect the value. Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Tested-by: NNicolin Chen <nicolinc@nvidia.com> Tested-by: NMatthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/2-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
Without locking userspace can trigger a UAF by racing KVM_DEV_VFIO_GROUP_DEL with VFIO_GROUP_GET_DEVICE_FD: CPU1 CPU2 ioctl(KVM_DEV_VFIO_GROUP_DEL) ioctl(VFIO_GROUP_GET_DEVICE_FD) vfio_group_get_device_fd open_device() intel_vgpu_open_device() vfio_register_notifier() vfio_register_group_notifier() blocking_notifier_call_chain(&group->notifier, VFIO_GROUP_NOTIFY_SET_KVM, group->kvm); set_kvm() group->kvm = NULL close() kfree(kvm) intel_vgpu_group_notifier() vdev->kvm = data [..] kvm_get_kvm(vgpu->kvm); // UAF! Add a simple rwsem in the group to protect the kvm while the notifier is using it. Note this doesn't fix the race internal to i915 where userspace can trigger two VFIO_GROUP_NOTIFY_SET_KVM's before we reach a consumer of vgpu->kvm and trigger this same UAF, it just makes the notifier self-consistent. Fixes: ccd46dba ("vfio: support notifier chain in vfio_group") Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Tested-by: NNicolin Chen <nicolinc@nvidia.com> Tested-by: NMatthew Rosato <mjrosato@linux.ibm.com> Link: https://lore.kernel.org/r/1-v2-d035a1842d81+1bf-vfio_group_locking_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Wan Jiabing 提交于
Fix following coccicheck warning: ./virt/kvm/vfio.c:258:1-7: preceding lock on line 236 If kvm_vfio_file_iommu_group() failed, code would goto err_fdput with mutex_lock acquired and then return ret. It might cause potential deadlock. Move mutex_unlock bellow err_fdput tag to fix it. Fixes: d55d9e7a ("kvm/vfio: Store the struct file in the kvm_vfio_group") Signed-off-by: NWan Jiabing <wanjiabing@vivo.com> Reviewed-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20220517023441.4258-1-wanjiabing@vivo.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 17 5月, 2022 1 次提交
-
-
由 Thomas Huth 提交于
There is no macro called _IORW, so use _IOWR in the comment instead. Signed-off-by: NThomas Huth <thuth@redhat.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20220516101202.88373-1-thuth@redhat.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 14 5月, 2022 10 次提交
-
-
由 Jason Gunthorpe 提交于
VFIO PCI does a security check as part of hot reset to prove that the user has permission to manipulate all the devices that will be impacted by the reset. Use a new API vfio_file_has_dev() to perform this security check against the struct file directly and remove the vfio_group from VFIO PCI. Since VFIO PCI was the last user of vfio_group_get_external_user() and vfio_group_put_external_user() remove it as well. Reviewed-by: NKevin Tian <kevin.tian@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/8-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
None of the VFIO APIs take in the vfio_group anymore, so we can remove it completely. This has a subtle side effect on the enforced coherency tracking. The vfio_group_get_external_user() was holding on to the container_users which would prevent the iommu_domain and thus the enforced coherency value from changing while the group is registered with kvm. It changes the security proof slightly into 'user must hold a group FD that has a device that cannot enforce DMA coherence'. As opening the group FD, not attaching the container, is the privileged operation this doesn't change the security properties much. On the flip side it paves the way to changing the iommu_domain/container attached to a group at runtime which is something that will be required to support nested translation. Reviewed-by: NKevin Tian <kevin.tian@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de>i Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
Just change the argument from struct vfio_group to struct file *. Reviewed-by: NKevin Tian <kevin.tian@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
Instead of a general extension check change the function into a limited test if the iommu_domain has enforced coherency, which is the only thing kvm needs to query. Make the new op self contained by properly refcounting the container before touching it. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/5-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
vfio_group_fops_open() ensures there is only ever one struct file open for any struct vfio_group at any time: /* Do we need multiple instances of the group open? Seems not. */ opened = atomic_cmpxchg(&group->opened, 0, 1); if (opened) { vfio_group_put(group); return -EBUSY; Therefor the struct file * can be used directly to search the list of VFIO groups that KVM keeps instead of using the vfio_external_group_match_file() callback to try to figure out if the passed in FD matches the list or not. Delete vfio_external_group_match_file(). Reviewed-by: NKevin Tian <kevin.tian@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NYi Liu <yi.l.liu@intel.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
The only caller wants to get a pointer to the struct iommu_group associated with the VFIO group file. Instead of returning the group ID then searching sysfs for that string to get the struct iommu_group just directly return the iommu_group pointer already held by the vfio_group struct. It already has a safe lifetime due to the struct file kref, the vfio_group and thus the iommu_group cannot be destroyed while the group file is open. Reviewed-by: NKevin Tian <kevin.tian@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NYi Liu <yi.l.liu@intel.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
Following patches will change the APIs to use the struct file as the handle instead of the vfio_group, so hang on to a reference to it with the same duration of as the vfio_group. Reviewed-by: NKevin Tian <kevin.tian@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NYi Liu <yi.l.liu@intel.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
To make it easier to read and change in following patches. Reviewed-by: NKevin Tian <kevin.tian@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NYi Liu <yi.l.liu@intel.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
Now that the iommu core takes care of isolation there is no race between driver attach and container unset. Once iommu_group_release_dma_owner() returns the device can immediately be re-used. Remove this mechanism. Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v1-a1e8791d795b+6b-vfio_container_q_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
Merge IOMMU dependencies for vfio. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 13 5月, 2022 1 次提交
-
-
Once the group enters 'owned' mode it can never be assigned back to the default_domain or to a NULL domain. It must always be actively assigned to a current domain. If the caller hasn't provided a domain then the core must provide an explicit DMA blocking domain that has no DMA map. Lazily create a group-global blocking DMA domain when iommu_group_claim_dma_owner is first called and immediately assign the group to it. This ensures that DMA is immediately fully isolated on all IOMMU drivers. If the user attaches/detaches while owned then detach will set the group back to the blocking domain. Slightly reorganize the call chains so that __iommu_group_set_core_domain() is the function that removes any caller configured domain and sets the domains back a core owned domain with an appropriate lifetime. __iommu_group_set_domain() is the worker function that can change the domain assigned to a group to any target domain, including NULL. Add comments clarifying how the NULL vs detach_dev vs default_domain works based on Robin's remarks. This fixes an oops with VFIO and SMMUv3 because VFIO will call iommu_detach_group() and then immediately iommu_domain_free(), but SMMUv3 has no way to know that the domain it is holding a pointer to has been freed. Now the iommu_detach_group() will assign the blocking domain and SMMUv3 will no longer hold a stale domain reference. Fixes: 1ea2a07a ("iommu: Add DMA ownership management interfaces") Reported-by: NQian Cai <quic_qiancai@quicinc.com> Tested-by: NBaolu Lu <baolu.lu@linux.intel.com> Tested-by: NNicolin Chen <nicolinc@nvidia.com> Co-developed-by: NRobin Murphy <robin.murphy@arm.com> Signed-off-by: NRobin Murphy <robin.murphy@arm.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> -- Just minor polishing as discussed v3: - Change names to __iommu_group_set_domain() / __iommu_group_set_core_domain() - Clarify comments - Call __iommu_group_set_domain() directly in iommu_group_release_dma_owner() since we know it is always selecting the default_domain - Remove redundant detach_dev ops check in __iommu_detach_device and make the added WARN_ON fail instead - Check for blocking_domain in __iommu_attach_group() so VFIO can actually attach a new group - Update comments and spelling - Fix missed change to new_domain in iommu_group_do_detach_device() v2: https://lore.kernel.org/r/0-v2-f62259511ac0+6-iommu_dma_block_jgg@nvidia.com v1: https://lore.kernel.org/r/0-v1-6e9d2d0a759d+11b-iommu_dma_block_jgg@nvidia.comReviewed-by: NKevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/0-v3-db7f0785022b+149-iommu_dma_block_jgg@nvidia.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
-
- 12 5月, 2022 12 次提交
-
-
由 Jason Gunthorpe 提交于
The last user of this function is in PCI callbacks that want to convert their struct pci_dev to a vfio_device. Instead of searching use the vfio_device available trivially through the drvdata. When a callback in the device_driver is called, the caller must hold the device_lock() on dev. The purpose of the device_lock is to prevent remove() from being called (see __device_release_driver), and allow the driver to safely interact with its drvdata without races. The PCI core correctly follows this and holds the device_lock() when calling error_detected (see report_error_detected) and sriov_configure (see sriov_numvfs_store). Further, since the drvdata holds a positive refcount on the vfio_device any access of the drvdata, under the device_lock(), from a driver callback needs no further protection or refcounting. Thus the remark in the vfio_device_get_from_dev() comment does not apply here, VFIO PCI drivers all call vfio_unregister_group_dev() from their remove callbacks under the device_lock() and cannot race with the remaining callers. Reviewed-by: NKevin Tian <kevin.tian@intel.com> Reviewed-by: NShameer Kolothum <shameerali.kolothum.thodi@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-c841817a0349+8f-vfio_get_from_dev_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
Having a consistent pointer in the drvdata will allow the next patch to make use of the drvdata from some of the core code helpers. Use a WARN_ON inside vfio_pci_core_register_device() to detect drivers that miss this. Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-c841817a0349+8f-vfio_get_from_dev_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
When the open_device() op is called the container_users is incremented and held incremented until close_device(). Thus, so long as drivers call functions within their open_device()/close_device() region they do not need to worry about the container_users. These functions can all only be called between open_device() and close_device(): vfio_pin_pages() vfio_unpin_pages() vfio_dma_rw() vfio_register_notifier() vfio_unregister_notifier() Eliminate the calls to vfio_group_add_container_user() and add vfio_assert_device_open() to detect driver mis-use. This causes the close_device() op to check device->open_count so always leave it elevated while calling the op. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/7-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
Now that callers have been updated to use the vfio_device APIs the driver facing group interface is no longer used, delete it: - vfio_group_get_external_user_from_dev() - vfio_group_pin_pages() - vfio_group_unpin_pages() - vfio_group_iommu_domain() -- Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/6-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
Use the existing vfio_device versions of vfio_(un)pin_pages(). There is no reason to use a group interface here, kvmgt has easy access to a vfio_device. Delete kvmgt_vdev::vfio_group since these calls were the last users. Reviewed-by: NKevin Tian <kevin.tian@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Acked-by: NZhi Wang <zhi.a.wang@intel.com> Link: https://lore.kernel.org/r/5-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. Change vfio_dma_rw() to take in the struct vfio_device and move the container users that would have been held by vfio_group_get_external_user_from_dev() to vfio_dma_rw() directly, like vfio_pin/unpin_pages(). Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/4-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. The struct vfio_device already contains the group we need so this avoids complexity, extra refcountings, and a confusing lifecycle model. Reviewed-by: NChristoph Hellwig <hch@lst.de> Acked-by: NEric Farman <farman@linux.ibm.com> Reviewed-by: NJason J. Herne <jjherne@linux.ibm.com> Reviewed-by: NTony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
The next patch wants the vfio_device instead. There is no reason to store a pointer here since we can container_of back to the vfio_device. Reviewed-by: NEric Farman <farman@linux.ibm.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Jason Gunthorpe 提交于
All callers have a struct vfio_device trivially available, pass it in directly and avoid calling the expensive vfio_group_get_from_dev(). Acked-by: NEric Farman <farman@linux.ibm.com> Reviewed-by: NJason J. Herne <jjherne@linux.ibm.com> Reviewed-by: NTony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Robin Murphy 提交于
IOMMU groups have been mandatory for some time now, so a device without one is necessarily a device without any usable IOMMU, therefore the iommu_present() check is redundant (or at best unhelpful). Signed-off-by: NRobin Murphy <robin.murphy@arm.com> Reviewed-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/537103bbd7246574f37f2c88704d7824a3a889f2.1649160714.git.robin.murphy@arm.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
Merge GVT-g dependencies for vfio. Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
由 Alex Williamson 提交于
Merge tag 'mlx5-lm-parallel' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into v5.19/vfio/next Improve mlx5 live migration driver From Yishai: This series improves mlx5 live migration driver in few aspects as of below. Refactor to enable running migration commands in parallel over the PF command interface. To achieve that we exposed from mlx5_core an API to let the VF be notified before that the PF command interface goes down/up. (e.g. PF reload upon health recovery). Once having the above functionality in place mlx5 vfio doesn't need any more to obtain the global PF lock upon using the command interface but can rely on the above mechanism to be in sync with the PF. This can enable parallel VFs migration over the PF command interface from kernel driver point of view. In addition, Moved to use the PF async command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Alex, as this series touches mlx5_core we may need to send this in a pull request format to VFIO to avoid conflicts before acceptance. Link: https://lore.kernel.org/all/20220510090206.90374-1-yishaih@nvidia.comSigned-of-by: NLeon Romanovsky <leonro@nvidia.com>
-
- 11 5月, 2022 3 次提交
-
-
由 Yishai Hadas 提交于
Use the PF asynchronous command mode for the SAVE state command. This enables returning earlier to user space upon issuing successfully the command and improve latency by let things run in parallel. Link: https://lore.kernel.org/r/20220510090206.90374-5-yishaih@nvidia.comSigned-off-by: NYishai Hadas <yishaih@nvidia.com> Reviewed-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
-
由 Yishai Hadas 提交于
Refactor to enable different VFs to run their commands over the PF command interface in parallel and to not block one each other. This is done by not using the global PF lock that was used before but relying on the VF attach/detach mechanism to sync. Link: https://lore.kernel.org/r/20220510090206.90374-4-yishaih@nvidia.comSigned-off-by: NYishai Hadas <yishaih@nvidia.com> Reviewed-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
-
由 Yishai Hadas 提交于
Manage the VF attach/detach callback from the PF. This lets the driver to enable parallel VFs migration as will be introduced in the next patch. As part of this, reorganize the VF is migratable code to be in a separate function and rename it to be set_migratable() to match its functionality. Link: https://lore.kernel.org/r/20220510090206.90374-3-yishaih@nvidia.comSigned-off-by: NYishai Hadas <yishaih@nvidia.com> Reviewed-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
-