提交 · 7403e6d8263937dea206dd201fed1ceed190ca18 · openeuler / Kernel

03 3月, 2022 3 次提交

vfio: Extend the device migration protocol with RUNNING_P2P · 8cb3d83b

由 Jason Gunthorpe 提交于 2月 24, 2022

The RUNNING_P2P state is designed to support multiple devices in the same
VM that are doing P2P transactions between themselves. When in RUNNING_P2P
the device must be able to accept incoming P2P transactions but should not
generate outgoing P2P transactions.

As an optional extension to the mandatory states it is defined as
in between STOP and RUNNING:
   STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP

For drivers that are unable to support RUNNING_P2P the core code
silently merges RUNNING_P2P and RUNNING together. Unless driver support
is present, the new state cannot be used in SET_STATE.
Drivers that support this will be required to implement 4 FSM arcs
beyond the basic FSM. 2 of the basic FSM arcs become combination
transitions.

Compared to the v1 clarification, NDMA is redefined into FSM states and is
described in terms of the desired P2P quiescent behavior, noting that
halting all DMA is an acceptable implementation.

Link: https://lore.kernel.org/all/20220224142024.147653-11-yishaih@nvidia.comSigned-off-by: NJason Gunthorpe <jgg@nvidia.com>
Tested-by: NShameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NYishai Hadas <yishaih@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

8cb3d83b

vfio: Define device migration protocol v2 · 115dcec6

由 Jason Gunthorpe 提交于 2月 24, 2022

Replace the existing region based migration protocol with an ioctl based
protocol. The two protocols have the same general semantic behaviors, but
the way the data is transported is changed.

This is the STOP_COPY portion of the new protocol, it defines the 5 states
for basic stop and copy migration and the protocol to move the migration
data in/out of the kernel.

Compared to the clarification of the v1 protocol Alex proposed:

https://lore.kernel.org/r/163909282574.728533.7460416142511440919.stgit@omen

This has a few deliberate functional differences:

- ERROR arcs allow the device function to remain unchanged.

- The protocol is not required to return to the original state on
transition failure. Instead userspace can execute an unwind back to
the original state, reset, or do something else without needing kernel
support. This simplifies the kernel design and should userspace choose
a policy like always reset, avoids doing useless work in the kernel
on error handling paths.

- PRE_COPY is made optional, userspace must discover it before using it.
This reflects the fact that the majority of drivers we are aware of
right now will not implement PRE_COPY.

- segmentation is not part of the data stream protocol, the receiver
does not have to reproduce the framing boundaries.

The hybrid FSM for the device_state is described as a Mealy machine by
documenting each of the arcs the driver is required to implement. Defining
the remaining set of old/new device_state transitions as 'combination
transitions' which are naturally defined as taking multiple FSM arcs along
the shortest path within the FSM's digraph allows a complete matrix of
transitions.

A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE is
defined to replace writing to the device_state field in the region. This
allows returning a brand new FD whenever the requested transition opens
a data transfer session.

The VFIO core code implements the new feature and provides a helper
function to the driver. Using the helper the driver only has to
implement 6 of the FSM arcs and the other combination transitions are
elaborated consistently from those arcs.

A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIGRATION is defined to
report the capability for migration and indicate which set of states and
arcs are supported by the device. The FSM provides a lot of flexibility to
make backwards compatible extensions but the VFIO_DEVICE_FEATURE also
allows for future breaking extensions for scenarios that cannot support
even the basic STOP_COPY requirements.

The VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE with the GET option (i.e.
VFIO_DEVICE_FEATURE_GET) can be used to read the current migration state
of the VFIO device.

Data transfer sessions are now carried over a file descriptor, instead of
the region. The FD functions for the lifetime of the data transfer
session. read() and write() transfer the data with normal Linux stream FD
semantics. This design allows future expansion to support poll(),
io_uring, and other performance optimizations.

The complicated mmap mode for data transfer is discarded as current qemu
doesn't take meaningful advantage of it, and the new qemu implementation
avoids substantially all the performance penalty of using a read() on the
region.

Link: https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.comSigned-off-by: NJason Gunthorpe <jgg@nvidia.com>
Tested-by: NShameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NYishai Hadas <yishaih@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

115dcec6

vfio: Have the core code decode the VFIO_DEVICE_FEATURE ioctl · 445ad495

由 Jason Gunthorpe 提交于 2月 24, 2022

Invoke a new device op 'device_feature' to handle just the data array
portion of the command. This lifts the ioctl validation to the core code
and makes it simpler for either the core code, or layered drivers, to
implement their own feature values.

Provide vfio_check_feature() to consolidate checking the flags/etc against
what the driver supports.

Link: https://lore.kernel.org/all/20220224142024.147653-9-yishaih@nvidia.comSigned-off-by: NJason Gunthorpe <jgg@nvidia.com>
Tested-by: NShameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NYishai Hadas <yishaih@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

445ad495

01 12月, 2021 1 次提交

vfio: remove all kernel-doc notation · 3b9a2d57

由 Randy Dunlap 提交于 11月 10, 2021

vfio.c abuses (misuses) "/**", which indicates the beginning of
kernel-doc notation in the kernel tree. This causes a bunch of
kernel-doc complaints about this source file, so quieten all of
them by changing all "/**" to "/*".

vfio.c:236: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
  * IOMMU driver registration
vfio.c:236: warning: missing initial short description on line:
  * IOMMU driver registration
vfio.c:295: warning: expecting prototype for Container objects(). Prototype was for vfio_container_get() instead
vfio.c:317: warning: expecting prototype for Group objects(). Prototype was for __vfio_group_get_from_iommu() instead
vfio.c:496: warning: Function parameter or member 'device' not described in 'vfio_device_put'
vfio.c:496: warning: expecting prototype for Device objects(). Prototype was for vfio_device_put() instead
vfio.c:599: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
  * Async device support
vfio.c:599: warning: missing initial short description on line:
  * Async device support
vfio.c:693: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
  * VFIO driver API
vfio.c:693: warning: missing initial short description on line:
  * VFIO driver API
vfio.c:835: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
  * Get a reference to the vfio_device for a device.  Even if the
vfio.c:835: warning: missing initial short description on line:
  * Get a reference to the vfio_device for a device.  Even if the
vfio.c:969: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
  * VFIO base fd, /dev/vfio/vfio
vfio.c:969: warning: missing initial short description on line:
  * VFIO base fd, /dev/vfio/vfio
vfio.c:1187: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
  * VFIO Group fd, /dev/vfio/$GROUP
vfio.c:1187: warning: missing initial short description on line:
  * VFIO Group fd, /dev/vfio/$GROUP
vfio.c:1540: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
  * VFIO Device fd
vfio.c:1540: warning: missing initial short description on line:
  * VFIO Device fd
vfio.c:1615: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
  * External user API, exported by symbols to be linked dynamically.
vfio.c:1615: warning: missing initial short description on line:
  * External user API, exported by symbols to be linked dynamically.
vfio.c:1663: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
  * External user API, exported by symbols to be linked dynamically.
vfio.c:1663: warning: missing initial short description on line:
  * External user API, exported by symbols to be linked dynamically.
vfio.c:1742: warning: Function parameter or member 'caps' not described in 'vfio_info_cap_add'
vfio.c:1742: warning: Function parameter or member 'size' not described in 'vfio_info_cap_add'
vfio.c:1742: warning: Function parameter or member 'id' not described in 'vfio_info_cap_add'
vfio.c:1742: warning: Function parameter or member 'version' not described in 'vfio_info_cap_add'
vfio.c:1742: warning: expecting prototype for Sub(). Prototype was for vfio_info_cap_add() instead
vfio.c:2276: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
  * Module/class support
vfio.c:2276: warning: missing initial short description on line:
  * Module/class support
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Reported-by: Nkernel test robot <lkp@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/38a9cb92-a473-40bf-b8f9-85cc5cfc2da4@infradead.orgReviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

3b9a2d57

16 10月, 2021 5 次提交

vfio: Use cdev_device_add() instead of device_create() · 9cef7391

由 Jason Gunthorpe 提交于 10月 15, 2021

Modernize how vfio is creating the group char dev and sysfs presence.

These days drivers with state should use cdev_device_add() and
cdev_device_del() to manage the cdev and sysfs lifetime.

This API requires the driver to put the struct device and struct cdev
inside its state struct (vfio_group), and then use the usual
device_initialize()/cdev_device_add()/cdev_device_del() sequence.

Split the code to make this possible:

 - vfio_group_alloc()/vfio_group_release() are pair'd functions to
   alloc/free the vfio_group. release is done under the struct device
   kref.

 - vfio_create_group()/vfio_group_put() are pairs that manage the
   sysfs/cdev lifetime. Once the uses count is zero the vfio group's
   userspace presence is destroyed.

 - The IDR is replaced with an IDA. container_of(inode->i_cdev)
   is used to get back to the vfio_group during fops open. The IDA
   assigns unique minor numbers.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/5-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

9cef7391

vfio: Use a refcount_t instead of a kref in the vfio_group · 2b678aa2

由 Jason Gunthorpe 提交于 10月 15, 2021

The next patch adds a struct device to the struct vfio_group, and it is
confusing/bad practice to have two krefs in the same struct. This kref is
controlling the period when the vfio_group is registered in sysfs, and
visible in the internal lookup. Switch it to a refcount_t instead.

The refcount_dec_and_mutex_lock() is still required because we need
atomicity of the list searches and sysfs presence.
Reviewed-by: NLiu Yi L <yi.l.liu@intel.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

2b678aa2

vfio: Don't leak a group reference if the group already exists · 325a31c9

由 Jason Gunthorpe 提交于 10月 15, 2021

If vfio_create_group() searches the group list and returns an already
existing group it does not put back the iommu_group reference that the
caller passed in.

Change the semantic of vfio_create_group() to not move the reference in
from the caller, but instead obtain a new reference inside and leave the
caller's reference alone. The two callers must now call iommu_group_put().

This is an unlikely race as the only caller that could hit it has already
searched the group list before attempting to create the group.

Fixes: cba3345c ("vfio: VFIO core")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

325a31c9

vfio: Do not open code the group list search in vfio_create_group() · 1ceabade

由 Jason Gunthorpe 提交于 10月 15, 2021

Split vfio_group_get_from_iommu() into __vfio_group_get_from_iommu() so
that vfio_create_group() can call it to consolidate this duplicated code.
Reviewed-by: NLiu Yi L <yi.l.liu@intel.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/2-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

1ceabade

vfio: Delete vfio_get/put_group from vfio_iommu_group_notifier() · 63b150fd

由 Jason Gunthorpe 提交于 10月 15, 2021

iommu_group_register_notifier()/iommu_group_unregister_notifier() are
built using a blocking_notifier_chain which integrates a rwsem. The
notifier function cannot be running outside its registration.

When considering how the notifier function interacts with create/destroy
of the group there are two fringe cases, the notifier starts before
list_add(&vfio.group_list) and the notifier runs after the kref
becomes 0.

Prior to vfio_create_group() unlocking and returning we have
   container_users == 0
   device_list == empty
And this cannot change until the mutex is unlocked.

After the kref goes to zero we must also have
   container_users == 0
   device_list == empty

Both are required because they are balanced operations and a 0 kref means
some caller became unbalanced. Add the missing assertion that
container_users must be zero as well.

These two facts are important because when checking each operation we see:

- IOMMU_GROUP_NOTIFY_ADD_DEVICE
   Empty device_list avoids the WARN_ON in vfio_group_nb_add_dev()
   0 container_users ends the call
- IOMMU_GROUP_NOTIFY_BOUND_DRIVER
   0 container_users ends the call

Finally, we have IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER, which only deletes
items from the unbound list. During creation this list is empty, during
kref == 0 nothing can read this list, and it will be freed soon.

Since the vfio_group_release() doesn't hold the appropriate lock to
manipulate the unbound_list and could race with the notifier, move the
cleanup to directly before the kfree.

This allows deleting all of the deferred group put code.
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NLiu Yi L <yi.l.liu@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/1-v3-2fdfe4ca2cc6+18c-vfio_group_cdev_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

63b150fd

01 10月, 2021 10 次提交

vfio: clean up the check for mediated device in vfio_iommu_type1 · c3c0fa9d

由 Christoph Hellwig 提交于 9月 24, 2021

Pass the group flags to ->attach_group and remove the messy check for
the bus type.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-12-hch@lst.deSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

c3c0fa9d

vfio: move the vfio_iommu_driver_ops interface out of <linux/vfio.h> · 8cc02d22

由 Christoph Hellwig 提交于 9月 24, 2021

Create a new private drivers/vfio/vfio.h header for the interface between
the VFIO core and the iommu drivers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-10-hch@lst.deSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

8cc02d22

vfio: remove unused method from vfio_iommu_driver_ops · 67462037

由 Christoph Hellwig 提交于 9月 24, 2021

The read, write and mmap methods are never implemented, so remove them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-9-hch@lst.deSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

67462037

vfio: simplify iommu group allocation for mediated devices · c68ea0d0

由 Christoph Hellwig 提交于 9月 24, 2021

Reuse the logic in vfio_noiommu_group_alloc to allocate a fake
single-device iommu group for mediated devices by factoring out a common
function, and replacing the noiommu boolean field in struct vfio_group
with an enum to distinguish the three different kinds of groups.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-8-hch@lst.deSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

c68ea0d0

vfio: remove the iommudata hack for noiommu groups · c04ac340

由 Christoph Hellwig 提交于 9月 24, 2021

Just pass a noiommu argument to vfio_create_group and set up the
->noiommu flag directly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-7-hch@lst.deSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

c04ac340

vfio: refactor noiommu group creation · 3af91771

由 Christoph Hellwig 提交于 9月 24, 2021

Split the actual noiommu group creation from vfio_iommu_group_get into a
new helper, and open code the rest of vfio_iommu_group_get in its only
caller. This creates an entirely separate and clear code path for the
noiommu group creation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-6-hch@lst.deSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

3af91771

vfio: factor out a vfio_group_find_or_alloc helper · 1362591f

由 Christoph Hellwig 提交于 9月 24, 2021

Factor out a helper to find or allocate the vfio_group to reduce the
spagetthi code in vfio_register_group_dev a little.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-5-hch@lst.deSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

1362591f

vfio: remove the iommudata check in vfio_noiommu_attach_group · c5b4ba97

由 Christoph Hellwig 提交于 9月 24, 2021

vfio_noiommu_attach_group has two callers:

 1) __vfio_container_attach_groups is called by vfio_ioctl_set_iommu,
    which just called vfio_iommu_driver_allowed
 2) vfio_group_set_container requires already checks ->noiommu on the
    vfio_group, which is propagated from the iommudata in
    vfio_create_group

so this check is entirely superflous and can be removed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-4-hch@lst.deSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

c5b4ba97

vfio: factor out a vfio_iommu_driver_allowed helper · b0062160

由 Christoph Hellwig 提交于 9月 24, 2021

Factor out a little helper to make the checks for the noiommu driver less
ugly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-3-hch@lst.deSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

b0062160

vfio: Move vfio_iommu_group_get() to vfio_register_group_dev() · 38a68934

由 Jason Gunthorpe 提交于 9月 24, 2021

We don't need to hold a reference to the group in the driver as well as
obtain a reference to the same group as the first thing
vfio_register_group_dev() does.

Since the drivers never use the group move this all into the core code.
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20210924155705.4258-2-hch@lst.deSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

38a68934

11 8月, 2021 3 次提交

vfio: Remove struct vfio_device_ops open/release · eb24c100

由 Jason Gunthorpe 提交于 8月 05, 2021

Nothing uses this anymore, delete it.
Signed-off-by: NYishai Hadas <yishaih@nvidia.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/14-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

eb24c100

vfio: Provide better generic support for open/release vfio_device_ops · 2fd585f4

由 Jason Gunthorpe 提交于 8月 05, 2021

Currently the driver ops have an open/release pair that is called once
each time a device FD is opened or closed. Add an additional set of
open/close_device() ops which are called when the device FD is opened for
the first time and closed for the last time.

An analysis shows that all of the drivers require this semantic. Some are
open coding it as part of their reflck implementation, and some are just
buggy and miss it completely.

To retain the current semantics PCI and FSL depend on, introduce the idea
of a "device set" which is a grouping of vfio_device's that share the same
lock around opening.

The device set is established by providing a 'set_id' pointer. All
vfio_device's that provide the same pointer will be joined to the same
singleton memory and lock across the whole set. This effectively replaces
the oddly named reflck.

After conversion the set_id will be sourced from:
 - A struct device from a fsl_mc_device (fsl)
 - A struct pci_slot (pci)
 - A struct pci_bus (pci)
 - The struct vfio_device (everything)

The design ensures that the above pointers are live as long as the
vfio_device is registered, so they form reliable unique keys to group
vfio_devices into sets.

This implementation uses xarray instead of searching through the driver
core structures, which simplifies the somewhat tricky locking in this
area.

Following patches convert all the drivers.
Signed-off-by: NYishai Hadas <yishaih@nvidia.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/4-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

2fd585f4

vfio: Introduce a vfio_uninit_group_dev() API call · ae03c377

由 Max Gurtovoy 提交于 8月 05, 2021

This pairs with vfio_init_group_dev() and allows undoing any state that is
stored in the vfio_device unrelated to registration. Add appropriately
placed calls to all the drivers.

The following patch will use this to add pre-registration state for the
device set.
Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/3-v4-9ea22c5e6afb+1adf-vfio_reflck_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

ae03c377

16 6月, 2021 1 次提交

vfio: centralize module refcount in subsystem layer · 9dcf01d9

由 Max Gurtovoy 提交于 5月 18, 2021

Remove code duplication and move module refcounting to the subsystem
module.
Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20210518192133.59195-2-mgurtovoy@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

9dcf01d9

07 4月, 2021 6 次提交

vfio: Remove device_data from the vfio bus driver API · 1e04ec14

由 Jason Gunthorpe 提交于 3月 30, 2021

There are no longer any users, so it can go away. Everything is using
container_of now.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Message-Id: <14-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

1e04ec14

vfio: Make vfio_device_ops pass a 'struct vfio_device *' instead of 'void *' · 6df62c5b

由 Jason Gunthorpe 提交于 3月 30, 2021

This is the standard kernel pattern, the ops associated with a struct get
the struct pointer in for typesafety. The expected design is to use
container_of to cleanly go from the subsystem level type to the driver
level type without having any type erasure in a void *.
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Message-Id: <12-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

6df62c5b

vfio/mdev: Use vfio_init/register/unregister_group_dev · 1ae1b20f

由 Jason Gunthorpe 提交于 3月 30, 2021

mdev gets little benefit because it doesn't actually do anything, however
it is the last user, so move the vfio_init/register/unregister_group_dev()
code here for now.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NLiu Yi L <yi.l.liu@intel.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Message-Id: <10-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

1ae1b20f

vfio: Split creation of a vfio_device into init and register ops · 0bfc6a4e

由 Jason Gunthorpe 提交于 3月 30, 2021

This makes the struct vfio_device part of the public interface so it
can be used with container_of and so forth, as is typical for a Linux
subystem.

This is the first step to bring some type-safety to the vfio interface by
allowing the replacement of 'void *' and 'struct device *' inputs with a
simple and clear 'struct vfio_device *'

For now the self-allocating vfio_add_group_dev() interface is kept so each
user can be updated as a separate patch.

The expected usage pattern is

  driver core probe() function:
     my_device = kzalloc(sizeof(*mydevice));
     vfio_init_group_dev(&my_device->vdev, dev, ops, mydevice);
     /* other driver specific prep */
     vfio_register_group_dev(&my_device->vdev);
     dev_set_drvdata(dev, my_device);

  driver core remove() function:
     my_device = dev_get_drvdata(dev);
     vfio_unregister_group_dev(&my_device->vdev);
     /* other driver specific tear down */
     kfree(my_device);

Allowing the driver to be able to use the drvdata and vfio_device to go
to/from its own data.

The pattern also makes it clear that vfio_register_group_dev() must be
last in the sequence, as once it is called the core code can immediately
start calling ops. The init/register gap is provided to allow for the
driver to do setup before ops can be called and thus avoid races.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NLiu Yi L <yi.l.liu@intel.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Message-Id: <3-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

0bfc6a4e

vfio: Simplify the lifetime logic for vfio_device · 5e42c999

由 Jason Gunthorpe 提交于 3月 30, 2021

The vfio_device is using a 'sleep until all refs go to zero' pattern for
its lifetime, but it is indirectly coded by repeatedly scanning the group
list waiting for the device to be removed on its own.

Switch this around to be a direct representation, use a refcount to count
the number of places that are blocking destruction and sleep directly on a
completion until that counter goes to zero. kfree the device after other
accesses have been excluded in vfio_del_group_dev(). This is a fairly
common Linux idiom.

Due to this we can now remove kref_put_mutex(), which is very rarely used
in the kernel. Here it is being used to prevent a zero ref device from
being seen in the group list. Instead allow the zero ref device to
continue to exist in the device_list and use refcount_inc_not_zero() to
exclude it once refs go to zero.

This patch is organized so the next patch will be able to alter the API to
allow drivers to provide the kfree.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Message-Id: <2-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

5e42c999

vfio: Remove extra put/gets around vfio_device->group · e572bfb2

由 Jason Gunthorpe 提交于 3月 30, 2021

The vfio_device->group value has a get obtained during
vfio_add_group_dev() which gets moved from the stack to vfio_device->group
in vfio_group_create_device().

The reference remains until we reach the end of vfio_del_group_dev() when
it is put back.

Thus anything that already has a kref on the vfio_device is guaranteed a
valid group pointer. Remove all the extra reference traffic.

It is tricky to see, but the get at the start of vfio_del_group_dev() is
actually pairing with the put hidden inside vfio_device_put() a few lines
below.

A later patch merges vfio_group_create_device() into vfio_add_group_dev()
which makes the ownership and error flow on the create side easier to
follow.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Message-Id: <1-v3-225de1400dfc+4e074-vfio1_jgg@nvidia.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

e572bfb2

02 2月, 2021 1 次提交

vfio: iommu driver notify callback · ec5e3294

由 Steve Sistare 提交于 1月 29, 2021

Define a vfio_iommu_driver_ops notify callback, for sending events to
the driver.  Drivers are not required to provide the callback, and
may ignore any events.  The handling of events is driver specific.

Define the CONTAINER_CLOSE event, called when the container's file
descriptor is closed.  This event signifies that no further state changes
will occur via container ioctl's.
Signed-off-by: NSteve Sistare <steven.sistare@oracle.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

ec5e3294

11 12月, 2020 1 次提交

vfio/type1: Add vfio_group_iommu_domain() · bdfae1c9

由 Lu Baolu 提交于 12月 09, 2020

Add the API for getting the domain from a vfio group. This could be used
by the physical device drivers which rely on the vfio/mdev framework for
mediated device user level access. The typical use case like below:

	unsigned int pasid;
	struct vfio_group *vfio_group;
	struct iommu_domain *iommu_domain;
	struct device *dev = mdev_dev(mdev);
	struct device *iommu_device = mdev_get_iommu_device(dev);

	if (!iommu_device ||
	    !iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX))
		return -EINVAL;

	vfio_group = vfio_group_get_external_user_from_dev(dev);
	if (IS_ERR_OR_NULL(vfio_group))
		return -EFAULT;

	iommu_domain = vfio_group_iommu_domain(vfio_group);
	if (IS_ERR_OR_NULL(iommu_domain)) {
		vfio_group_put_external_user(vfio_group);
		return -EFAULT;
	}

	pasid = iommu_aux_get_pasid(iommu_domain, iommu_device);
	if (pasid < 0) {
		vfio_group_put_external_user(vfio_group);
		return -EFAULT;
	}

	/* Program device context with pasid value. */
	...
Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

bdfae1c9

23 9月, 2020 1 次提交

vfio: fix a missed vfio group put in vfio_pin_pages · 28b13024

由 Yan Zhao 提交于 9月 16, 2020

When error occurs, need to put vfio group after a successful get.

Fixes: 95fc87b4 ("vfio: Selective dirty page tracking if IOMMU backed device pins pages")
Signed-off-by: NYan Zhao <yan.y.zhao@intel.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

28b13024

22 9月, 2020 1 次提交

vfio: add a singleton check for vfio_group_pin_pages · 7ef32e52

由 Yan Zhao 提交于 9月 16, 2020

Page pinning is used both to translate and pin device mappings for DMA
purpose, as well as to indicate to the IOMMU backend to limit the dirty
page scope to those pages that have been pinned, in the case of an IOMMU
backed device.
To support this, the vfio_pin_pages() interface limits itself to only
singleton groups such that the IOMMU backend can consider dirty page
scope only at the group level.  Implement the same requirement for the
vfio_group_pin_pages() interface.

Fixes: 95fc87b4 ("vfio: Selective dirty page tracking if IOMMU backed device pins pages")
Signed-off-by: NYan Zhao <yan.y.zhao@intel.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

7ef32e52

28 7月, 2020 1 次提交

vfio: Cleanup allowed driver naming · 26afdd98

由 Alex Williamson 提交于 7月 27, 2020

No functional change, avoid non-inclusive naming schemes.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

26afdd98

29 5月, 2020 1 次提交

vfio: Selective dirty page tracking if IOMMU backed device pins pages · 95fc87b4

由 Kirti Wankhede 提交于 5月 29, 2020

Added a check such that only singleton IOMMU groups can pin pages.
>From the point when vendor driver pins any pages, consider IOMMU group
dirty page scope to be limited to pinned pages.

To optimize to avoid walking list often, added flag
pinned_page_dirty_scope to indicate if all of the vfio_groups for each
vfio_domain in the domain_list dirty page scope is limited to pinned
pages. This flag is updated on first pinned pages request for that IOMMU
group and on attaching/detaching group.
Signed-off-by: NKirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: NNeo Jia <cjia@nvidia.com>
Reviewed-by: NYan Zhao <yan.y.zhao@intel.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

95fc87b4

24 3月, 2020 4 次提交

vfio: Include optional device match in vfio_device_ops callbacks · 5f3874c2

由 Alex Williamson 提交于 3月 24, 2020

Allow bus drivers to provide their own callback to match a device to
the user provided string.
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

5f3874c2

vfio: avoid inefficient operations on VFIO group in vfio_pin/unpin_pages · 40280cf7

由 Yan Zhao 提交于 3月 24, 2020

vfio_group_pin_pages() and vfio_group_unpin_pages() are introduced to
avoid inefficient search/check/ref/deref opertions associated with VFIO
group as those in each calling into vfio_pin_pages() and
vfio_unpin_pages().

VFIO group is taken as arg directly. The callers combine
search/check/ref/deref operations associated with VFIO group by calling
vfio_group_get_external_user()/vfio_group_get_external_user_from_dev()
beforehand, and vfio_group_put_external_user() afterwards.
Suggested-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NYan Zhao <yan.y.zhao@intel.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

40280cf7

vfio: introduce vfio_dma_rw to read/write a range of IOVAs · 8d46c0cc

由 Yan Zhao 提交于 3月 24, 2020

vfio_dma_rw will read/write a range of user space memory pointed to by
IOVA into/from a kernel buffer without enforcing pinning the user space
memory.

TODO: mark the IOVAs to user space memory dirty if they are written in
vfio_dma_rw().

Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: NYan Zhao <yan.y.zhao@intel.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

8d46c0cc

vfio: allow external user to get vfio group from device · c0560f51

由 Yan Zhao 提交于 3月 24, 2020

external user calls vfio_group_get_external_user_from_dev() with a device
pointer to get the VFIO group associated with this device.
The VFIO group is checked to be vialbe and have IOMMU set. Then
container user counter is increased and VFIO group reference is hold
to prevent the VFIO group from disposal before external user exits.

when the external user finishes using of the VFIO group, it calls
vfio_group_put_external_user() to dereference the VFIO group and the
container user counter.
Suggested-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NYan Zhao <yan.y.zhao@intel.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

c0560f51

23 10月, 2019 1 次提交

compat_ioctl: move drivers to compat_ptr_ioctl · 407e9ef7

由 Arnd Bergmann 提交于 9月 11, 2018

Each of these drivers has a copy of the same trivial helper function to
convert the pointer argument and then call the native ioctl handler.

We now have a generic implementation of that, so use it.
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NJiri Kosina <jkosina@suse.cz>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

407e9ef7

openeuler / Kernel 接近 3 年 前同步成功

openeuler / Kernel
接近 3 年前同步成功