1. 20 2月, 2019 6 次提交
    • J
      RDMA/device: Provide APIs from the core code to help unregistration · d0899892
      Jason Gunthorpe 提交于
      These APIs are intended to support drivers that exist outside the usual
      driver core probe()/remove() callbacks. Normally the driver core will
      prevent remove() from running concurrently with probe(), once this safety
      is lost drivers need more support to get the locking and lifetimes right.
      
      ib_unregister_driver() is intended to be used during module_exit of a
      driver using these APIs. It unregisters all the associated ib_devices.
      
      ib_unregister_device_and_put() is to be used by a driver-specific removal
      function (ie removal by name, removal from a netdev notifier, removal from
      netlink)
      
      ib_unregister_queued() is to be used from netdev notifier chains where
      RTNL is held.
      
      The locking is tricky here since once things become async it is possible
      to race unregister with registration. This is largely solved by relying on
      the registration refcount, unregistration will only ever work on something
      that has a positive registration refcount - and then an unregistration
      mutex serializes all competing unregistrations of the same device.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d0899892
    • J
      RDMA/device: Add ib_device_get_by_netdev() · 324e227e
      Jason Gunthorpe 提交于
      Several drivers need to find the ib_device from a given netdev. rxe needs
      this at speed in an unsleepable context, so choose to implement the
      translation using a RCU safe hash table.
      
      The hash table can have a many to one mapping. This is intended to support
      some future case where multiple IB drivers (ie iWarp and RoCE) connect to
      the same netdevs. driver_ids will need to be different to support this.
      
      In the process this makes the struct ib_device and ib_port_data RCU safe
      by deferring their kfrees.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      324e227e
    • J
      RDMA/device: Add ib_device_set_netdev() as an alternative to get_netdev · c2261dd7
      Jason Gunthorpe 提交于
      The associated netdev should not actually be very dynamic, so for most
      drivers there is no reason for a callback like this. Provide an API to
      inform the core code about the net dev affiliation and use a core
      maintained data structure instead.
      
      This allows the core code to be more aware of the ndev relationship which
      will allow some new APIs based around this.
      
      This also uses locking that makes some kind of sense, many drivers had a
      confusing RCU lock, or missing locking which isn't right.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      c2261dd7
    • J
      RDMA/cache: Move the cache per-port data into the main ib_port_data · 8faea9fd
      Jason Gunthorpe 提交于
      Like the other cases there no real reason to have another array just for
      the cache. This larger conversion gets its own patch.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      8faea9fd
    • J
      RDMA/device: Consolidate ib_device per_port data into one place · 8ceb1357
      Jason Gunthorpe 提交于
      There is no reason to have three allocations of per-port data. Combine
      them together and make the lifetime for all the per-port data match the
      struct ib_device.
      
      Following patches will require more port-specific data, now there is a
      good place to put it.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      8ceb1357
    • J
      RDMA: Add and use rdma_for_each_port · ea1075ed
      Jason Gunthorpe 提交于
      We have many loops iterating over all of the end port numbers on a struct
      ib_device, simplify them with a for_each helper.
      Reviewed-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      ea1075ed
  2. 19 2月, 2019 1 次提交
  3. 16 2月, 2019 1 次提交
  4. 09 2月, 2019 6 次提交
    • J
      RDMA/devices: Re-organize device.c locking · 921eab11
      Jason Gunthorpe 提交于
      The locking here started out with a single lock that covered everything
      and then has lately veered into crazy town.
      
      The fundamental problem is that several places need to iterate over a
      linked list, but also need to drop their locks to avoid deadlock during
      client callbacks.
      
      xarray's restartable iteration offers a simple solution to the
      problem. Once all the lists are xarrays we can drop locks in the places
      that need that and rely on xarray to provide consistency and locking for
      the data structure.
      
      The resulting simplification is that each of the three lists has a
      dedicated rwsem that must be held when working with the list it
      covers. One data structure is no longer covered by multiple locks.
      
      The sleeping semaphore is selected because the read side generally needs
      to be held over something sleeping, and using RCU reader locking in those
      cases is overkill.
      
      In the process this simplifies the entire registration/unregistration flow
      to be the expected list of setups and the reversed list of matching
      teardowns, and the registration lock 'refcount' can now be revised to be
      released after the ULPs are removed, providing a very sane semantic for
      this feature.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      921eab11
    • J
      RDMA/devices: Use xarray to store the client_data · 0df91bb6
      Jason Gunthorpe 提交于
      Now that we have a small ID for each client we can use xarray instead of
      linearly searching linked lists for client data. This will give much
      faster and scalable client data lookup, and will lets us revise the
      locking scheme.
      
      Since xarray can store 'going_down' using a mark just entirely eliminate
      the struct ib_client_data and directly store the client_data value in the
      xarray. However this does require a special iterator as we must still
      iterate over any NULL client_data values.
      
      Also eliminate the client_data_lock in favour of internal xarray locking.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      0df91bb6
    • J
      RDMA/devices: Use xarray to store the clients · e59178d8
      Jason Gunthorpe 提交于
      This gives each client a unique ID and will let us move client_data to use
      xarray, and revise the locking scheme.
      
      clients have to be add/removed in strict FIFO/LIFO order as they
      interdepend. To support this the client_ids are assigned to increase in
      FIFO order. The existing linked list is kept to support reverse iteration
      until xarray can get a reverse iteration API.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: NParav Pandit <parav@mellanox.com>
      e59178d8
    • J
      RDMA/device: Get rid of reg_state · 652432f3
      Jason Gunthorpe 提交于
      This really has no purpose anymore, refcount can be used to tell if the
      device is still registered. Keeping it around just invites mis-use.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: NParav Pandit <parav@mellanox.com>
      652432f3
    • L
      RDMA: Handle PD allocations by IB/core · 21a428a0
      Leon Romanovsky 提交于
      The PD allocations in IB/core allows us to simplify drivers and their
      error flows in their .alloc_pd() paths. The changes in .alloc_pd() go hand
      in had with relevant update in .dealloc_pd().
      
      We will use this opportunity and convert .dealloc_pd() to don't fail, as
      it was suggested a long time ago, failures are not happening as we have
      never seen a WARN_ON print.
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      21a428a0
    • L
      RDMA/core: Share driver structure size with core · 30471d4b
      Leon Romanovsky 提交于
      Add new macros to be used in drivers while registering ops structure and
      IB/core while calling allocation routines, so drivers won't need to
      perform kzalloc/kfree in their paths.
      
      The change in allocation stage allows us to initialize common fields prior
      to calling to drivers (e.g. restrack).
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      30471d4b
  5. 05 2月, 2019 3 次提交
  6. 31 1月, 2019 4 次提交
  7. 22 1月, 2019 1 次提交
  8. 15 1月, 2019 2 次提交
  9. 11 1月, 2019 1 次提交
  10. 09 1月, 2019 1 次提交
  11. 20 12月, 2018 2 次提交
  12. 19 12月, 2018 1 次提交
    • M
      IB/uverbs: Add support to advise_mr · ad8a4496
      Moni Shoua 提交于
      Add new ioctl method for the MR object - ADVISE_MR.
      
      This command can be used by users to give an advice or directions to the
      kernel about an address range that belongs to memory regions.
      
      A new ib_device callback, advise_mr(), is introduced here to suupport the
      new command. This command takes the following arguments:
      
      - pd:		The protection domain to which all memory regions belong
      - advice: 	The type of the advice
      	  	* IB_UVERBS_ADVISE_MR_ADVICE_PREFETCH - Pre-fetch a range of
      		an on-demand paging MR
      	  	* IB_UVERBS_ADVISE_MR_ADVICE_PREFETCH_WRITE - Pre-fetch a range
      		of an on-demand paging MR with write intention
      - flags:	The properties of the advice
      		* IB_UVERBS_ADVISE_MR_FLAG_FLUSH - Operation must end before
      		return to the caller
      - sg_list:	The list of memory ranges
      - num_sge:	The number of memory ranges in the list
      - attrs:	More attributes to be parsed by the provider
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Reviewed-by: NGuy Levi <guyle@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      ad8a4496
  13. 12 12月, 2018 5 次提交
  14. 04 12月, 2018 1 次提交
  15. 27 11月, 2018 2 次提交
  16. 23 11月, 2018 2 次提交
    • P
      RDMA/core: Sync unregistration with netlink commands · 01b67117
      Parav Pandit 提交于
      When the rdma device is getting removed, get resource info can race with
      device removal, as below:
      
            CPU-0                                  CPU-1
          --------                               --------
          rdma_nl_rcv_msg()
             nldev_res_get_cq_dumpit()
                mutex_lock(device_lock);
                get device reference
                mutex_unlock(device_lock);        [..]
                                                  ib_unregister_device()
                                                  /* Valid reference to
                                                   * device->dev exists.
                                                   */
                                                   ib_dealloc_device()
      
                [..]
                provider->fill_res_entry();
      
      Even though device object is not freed, fill_res_entry() can get called on
      device which doesn't have a driver anymore. Kernel core device reference
      count is not sufficient, as this only keeps the structure valid, and
      doesn't guarantee the driver is still loaded.
      
      Similar race can occur with device renaming and device removal, where
      device_rename() tries to rename a unregistered device. While this is fine
      for devices of a class which are not net namespace aware, but it is
      incorrect for net namespace aware class coming in subsequent series.  If a
      class is net namespace aware, then the below [1] call trace is observed in
      above situation.
      
      Therefore, to avoid the race, keep a reference count and let device
      unregistration wait until all netlink users drop the reference.
      
      [1] Call trace:
      kernfs: ns required in 'infiniband' for 'mlx5_0'
      WARNING: CPU: 18 PID: 44270 at fs/kernfs/dir.c:842 kernfs_find_ns+0x104/0x120
      libahci i2c_core mlxfw libata dca [last unloaded: devlink]
      RIP: 0010:kernfs_find_ns+0x104/0x120
      Call Trace:
      kernfs_find_and_get_ns+0x2e/0x50
      sysfs_rename_link_ns+0x40/0xb0
      device_rename+0xb2/0xf0
      ib_device_rename+0xb3/0x100 [ib_core]
      nldev_set_doit+0x165/0x190 [ib_core]
      rdma_nl_rcv_msg+0x249/0x250 [ib_core]
      ? netlink_deliver_tap+0x8f/0x3e0
      rdma_nl_rcv+0xd6/0x120 [ib_core]
      netlink_unicast+0x17c/0x230
      netlink_sendmsg+0x2f0/0x3e0
      sock_sendmsg+0x30/0x40
      __sys_sendto+0xdc/0x160
      
      Fixes: da5c8507 ("RDMA/nldev: add driver-specific resource tracking")
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      01b67117
    • J
      RDMA/uverbs: Use a linear list to describe the compiled-in uapi · 0cbf432d
      Jason Gunthorpe 提交于
      The 'tree' data structure is very hard to build at compile time, and this
      makes it very limited. The new radix tree based compiler can handle a more
      complex input language that does not require the compiler to perfectly
      group everything into a neat tree structure.
      
      Instead use a simple list to describe to input, where the list elements
      can be of various different 'opcodes' instructing the radix compiler what
      to do. Start out with opcodes chaining to other definition lists and
      chaining to the existing 'tree' definition.
      
      Replace the very top level of the 'object tree' with this list type and
      get rid of struct uverbs_object_tree_def and DECLARE_UVERBS_OBJECT_TREE.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      0cbf432d
  17. 22 11月, 2018 1 次提交