1. 21 9月, 2018 1 次提交
    • J
      RDMA/ucontext: Add a core API for mmaping driver IO memory · 5f9794dc
      Jason Gunthorpe 提交于
      To support disassociation and PCI hot unplug, we have to track all the
      VMAs that refer to the device IO memory. When disassociation occurs the
      VMAs have to be revised to point to the zero page, not the IO memory, to
      allow the physical HW to be unplugged.
      
      The three drivers supporting this implemented three different versions
      of this algorithm, all leaving something to be desired. This new common
      implementation has a few differences from the driver versions:
      
      - Track all VMAs, including splitting/truncating/etc. Tie the lifetime of
        the private data allocation to the lifetime of the vma. This avoids any
        tricks with setting vm_ops which Linus didn't like. (see link)
      - Support multiple mms, and support properly tracking mmaps triggered by
        processes other than the one first opening the uverbs fd. This makes
        fork behavior of disassociation enabled drivers the same as fork support
        in normal drivers.
      - Don't use crazy get_task stuff.
      - Simplify the approach for to racing between vm_ops close and
        disassociation, fixing the related bugs most of the driver
        implementations had. Since we are in core code the tracking list can be
        placed in struct ib_uverbs_ufile, which has a lifetime strictly longer
        than any VMAs created by mmap on the uverbs FD.
      
      Link: https://www.spinics.net/lists/stable/msg248747.html
      Link: https://lkml.kernel.org/r/CA+55aFxJTV_g46AQPoPXen-UPiqR1HGMZictt7VpC-SMFbm3Cw@mail.gmail.comSigned-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      5f9794dc
  2. 18 9月, 2018 1 次提交
    • J
      IB/rxe: Revise the ib_wr_opcode enum · 9a59739b
      Jason Gunthorpe 提交于
      This enum has become part of the uABI, as both RXE and the
      ib_uverbs_post_send() command expect userspace to supply values from this
      enum. So it should be properly placed in include/uapi/rdma.
      
      In userspace this enum is called 'enum ibv_wr_opcode' as part of
      libibverbs.h. That enum defines different values for IB_WR_LOCAL_INV,
      IB_WR_SEND_WITH_INV, and IB_WR_LSO. These were introduced (incorrectly, it
      turns out) into libiberbs in 2015.
      
      The kernel has changed its mind on the numbering for several of the IB_WC
      values over the years, but has remained stable on IB_WR_LOCAL_INV and
      below.
      
      Based on this we can conclude that there is no real user space user of the
      values beyond IB_WR_ATOMIC_FETCH_AND_ADD, as they have never worked via
      rdma-core. This is confirmed by inspection, only rxe uses the kernel enum
      and implements the latter operations. rxe has clearly never worked with
      these attributes from userspace. Other drivers that support these opcodes
      implement the functionality without calling out to the kernel.
      
      To make IB_WR_SEND_WITH_INV and related work for RXE in userspace we
      choose to renumber the IB_WR enum in the kernel to match the uABI that
      userspace has bee using since before Soft RoCE was merged. This is an
      overall simpler configuration for the whole software stack, and obviously
      can't break anything existing.
      Reported-by: NSeth Howell <seth.howell@intel.com>
      Tested-by: NSeth Howell <seth.howell@intel.com>
      Fixes: 8700e3e7 ("Soft RoCE driver")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      9a59739b
  3. 11 9月, 2018 2 次提交
  4. 07 9月, 2018 1 次提交
  5. 06 9月, 2018 3 次提交
  6. 11 8月, 2018 2 次提交
  7. 03 8月, 2018 1 次提交
    • J
      RDMA/netdev: Use priv_destructor for netdev cleanup · 9f49a5b5
      Jason Gunthorpe 提交于
      Now that the unregister_netdev flow for IPoIB no longer relies on external
      code we can now introduce the use of priv_destructor and
      needs_free_netdev.
      
      The rdma_netdev flow is switched to use the netdev common priv_destructor
      instead of the special free_rdma_netdev and the IPOIB ULP adjusted:
       - priv_destructor needs to switch to point to the ULP's destructor
         which will then call the rdma_ndev's in the right order
       - We need to be careful around the error unwind of register_netdev
         as it sometimes calls priv_destructor on failure
       - ULPs need to use ndo_init/uninit to ensure proper ordering
         of failures around register_netdev
      
      Switching to priv_destructor is a necessary pre-requisite to using
      the rtnl new_link mechanism.
      
      The VNIC user for rdma_netdev should also be revised, but that is left for
      another patch.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NDenis Drozdov <denisd@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      9f49a5b5
  8. 02 8月, 2018 2 次提交
    • J
      IB/uverbs: Do not pass struct ib_device to the ioctl methods · e83f0ecd
      Jason Gunthorpe 提交于
      This does the same as the patch before, except for ioctl. The rules are
      the same, but for the ioctl methods the core code handles setting up the
      uobject.
      
      - Retrieve the ib_dev from the uobject->context->device. This is
        safe under ioctl as the core has already done rdma_alloc_begin_uobject
        and so CREATE calls are entirely protected by the rwsem.
      - Retrieve the ib_dev from uobject->object
      - Call ib_uverbs_get_ucontext()
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      e83f0ecd
    • J
      IB/uverbs: Consolidate uobject destruction · 87ad80ab
      Jason Gunthorpe 提交于
      There are several flows that can destroy a uobject and each one is
      minimized and sprinkled throughout the code base, making it difficult to
      understand and very hard to modify the destroy path.
      
      Consolidate all of these into uverbs_destroy_uobject() and call it in all
      cases where a uobject has to be destroyed.
      
      This makes one change to the lifecycle, during any abort (eg when
      alloc_commit is not called) we always call out to alloc_abort, even if
      remove_commit needs to be called to delete a HW object.
      
      This also renames RDMA_REMOVE_DURING_CLEANUP to RDMA_REMOVE_ABORT to
      clarify its actual usage and revises some of the comments to reflect what
      the life cycle is for the type implementation.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      87ad80ab
  9. 31 7月, 2018 2 次提交
  10. 26 7月, 2018 1 次提交
    • J
      IB/uverbs: Rework the locking for cleaning up the ucontext · e951747a
      Jason Gunthorpe 提交于
      The locking here has always been a bit crazy and spread out, upon some
      careful analysis we can simplify things.
      
      Create a single function uverbs_destroy_ufile_hw() that internally handles
      all locking. This pulls together pieces of this process that were
      sprinkled all over the places into one place, and covers them with one
      lock.
      
      This eliminates several duplicate/confusing locks and makes the control
      flow in ib_uverbs_close() and ib_uverbs_free_hw_resources() extremely
      simple.
      
      Unfortunately we have to keep an extra mutex, ucontext_lock.  This lock is
      logically part of the rwsem and provides the 'down write, fail if write
      locked, wait if read locked' semantic we require.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      e951747a
  11. 25 7月, 2018 3 次提交
  12. 24 7月, 2018 1 次提交
  13. 11 7月, 2018 3 次提交
  14. 10 7月, 2018 2 次提交
  15. 05 7月, 2018 1 次提交
  16. 30 6月, 2018 1 次提交
    • Y
      IB: Improve uverbs_cleanup_ucontext algorithm · 1c77483e
      Yishai Hadas 提交于
      Improve uverbs_cleanup_ucontext algorithm to work properly when the
      topology graph of the objects cannot be determined at compile time.  This
      is the case with objects created via the devx interface in mlx5.
      
      Typically uverbs objects must be created in a strict topologically sorted
      order, so that LIFO ordering will generally cause them to be freed
      properly. There are only a few cases (eg memory windows) where objects can
      point to things out of the strict LIFO order.
      
      Instead of using an explicit ordering scheme where the HW destroy is not
      allowed to fail, go over the list multiple times and allow the destroy
      function to fail. If progress halts then a final, desperate, cleanup is
      done before leaking the memory. This indicates a driver bug.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      1c77483e
  17. 26 6月, 2018 3 次提交
  18. 20 6月, 2018 1 次提交
  19. 19 6月, 2018 9 次提交
    • S
      IB/core: add max_send_sge and max_recv_sge attributes · 33023fb8
      Steve Wise 提交于
      This patch replaces the ib_device_attr.max_sge with max_send_sge and
      max_recv_sge. It allows ulps to take advantage of devices that have very
      different send and recv sge depths.  For example cxgb4 has a max_recv_sge
      of 4, yet a max_send_sge of 16.  Splitting out these attributes allows
      much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
      API. Consider a large RDMA WRITE that has 16 scattergather entries.
      With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
      16, it can be done with 1 WRITE WR.
      Acked-by: NSagi Grimberg <sagi@grimberg.me>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Acked-by: NShiraz Saleem <shiraz.saleem@intel.com>
      Acked-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      33023fb8
    • B
      RDMA/core: Save kernel caller name when creating CQ using ib_create_cq() · 7350cdd0
      Bharat Potnuri 提交于
      Few kernel applications like SCST-iSER create CQ using ib_create_cq(),
      where accessing CQ structures using rdma restrack tool leads to below NULL
      pointer dereference. This patch saves caller kernel module name similar to
      ib_alloc_cq().
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8132ca70>] skip_spaces+0x30/0x30
      PGD 738bac067 PUD 8533f0067 PMD 0
      Oops: 0000 [#1] SMP
      R10: ffff88017fc03300 R11: 0000000000000246 R12: 0000000000000000
      R13: ffff88082fa5a668 R14: ffff88017475a000 R15: 0000000000000000
      FS:  00002b32726582c0(0000) GS:ffff88087fc40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000008491a1000 CR4: 00000000003607e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       [<ffffffffc05af69c>] ? fill_res_name_pid+0x7c/0x90 [ib_core]
       [<ffffffffc05af79f>] fill_res_cq_entry+0xef/0x170 [ib_core]
       [<ffffffffc05af4c4>] res_get_common_dumpit+0x3c4/0x480 [ib_core]
       [<ffffffffc05af5d3>] nldev_res_get_cq_dumpit+0x13/0x20 [ib_core]
       [<ffffffff815bc1e7>] netlink_dump+0x117/0x2e0
       [<ffffffff815bcb8b>] __netlink_dump_start+0x1ab/0x230
       [<ffffffffc059fead>] ibnl_rcv_msg+0x11d/0x1f0 [ib_core]
       [<ffffffffc05af5c0>] ? nldev_res_get_mr_dumpit+0x20/0x20 [ib_core]
       [<ffffffffc059fd90>] ? rdma_nl_multicast+0x30/0x30 [ib_core]
       [<ffffffff815bea49>] netlink_rcv_skb+0xa9/0xc0
       [<ffffffffc05a0018>] ibnl_rcv+0x98/0xb0 [ib_core]
       [<ffffffff815be132>] netlink_unicast+0xf2/0x1b0
       [<ffffffff815be50f>] netlink_sendmsg+0x31f/0x6a0
       [<ffffffff8156b580>] sock_sendmsg+0xb0/0xf0
       [<ffffffff816ace9e>] ? _raw_spin_unlock_bh+0x1e/0x20
       [<ffffffff8156f998>] ? release_sock+0x118/0x170
       [<ffffffff8156b731>] SYSC_sendto+0x121/0x1c0
       [<ffffffff81568340>] ? sock_alloc_file+0xa0/0x140
       [<ffffffff81221265>] ? __fd_install+0x25/0x60
       [<ffffffff8156c2ce>] SyS_sendto+0xe/0x10
       [<ffffffff816b6c2a>] system_call_fastpath+0x16/0x1b
      RIP  [<ffffffff8132ca70>] skip_spaces+0x30/0x30
      RSP <ffff88072be97760>
      CR2: 0000000000000000
      
      Cc: <stable@vger.kernel.org>
      Fixes: f66c8ba4 ("RDMA/core: Save kernel caller name when creating PD and CQ objects")
      Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NPotnuri Bharat Teja <bharat@chelsio.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      7350cdd0
    • J
      RDMA: Hold the sgid_attr inside the struct ib_ah/qp · 1a1f460f
      Jason Gunthorpe 提交于
      If the AH has a GRH then hold a reference to the sgid_attr inside the
      common struct.
      
      If the QP is modified with an AV that includes a GRH then also hold a
      reference to the sgid_attr inside the common struct.
      
      This informs the cache that the sgid_index is in-use so long as the AH or
      QP using it exists.
      
      This also means that all drivers can access the sgid_attr directly from
      the ah_attr instead of querying the cache during their UD post-send paths.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      1a1f460f
    • P
      RDMA: Convert drivers to use sgid_attr instead of sgid_index · 47ec3866
      Parav Pandit 提交于
      The core code now ensures that all driver callbacks that receive an
      rdma_ah_attrs will have a sgid_attr's pointer if there is a GRH present.
      
      Drivers can use this pointer instead of calling a query function with
      sgid_index. This simplifies the drivers and also avoids races where a
      gid_index lookup may return different data if it is changed.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      47ec3866
    • J
      IB{cm, core}: Introduce and use ah_attr copy, move, replace APIs · d97099fe
      Jason Gunthorpe 提交于
      Introduce AH attribute copy, move and replace APIs to be used by core and
      provider drivers.
      
      In CM code flow when ah attribute might be re-initialized twice while
      processing incoming request, or initialized once while from path record
      while sending out CM requests. Therefore use rdma_move_ah_attr API to
      handle such scenarios instead of memcpy().
      
      Provider drivers keeps a copy ah_attr during the lifetime of the ah.
      Therefore, use rdma_replace_ah_attr() which conditionally release
      reference to old ah_attr and holds reference to new attribute whose
      referrence is released when the AH is freed.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      d97099fe
    • J
      IB/core: Add a sgid_attr pointer to struct rdma_ah_attr · 8d9ec9ad
      Jason Gunthorpe 提交于
      The sgid_attr will ultimately replace the sgid_index in the ah_attr.
      This will allow for all layers to have a consistent view of what
      gid table entry was selected as processing runs through all stages of the
      stack.
      
      This commit introduces the pointer and ensures it is set before calling
      any driver callback that includes a struct ah_attr callback, allowing
      future patches to adjust both the drivers and the callers to use
      sgid_attr instead of sgid_index.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      8d9ec9ad
    • P
      IB: Replace ib_query_gid/ib_get_cached_gid with rdma_query_gid · 1dfce294
      Parav Pandit 提交于
      If the gid_attr argument is NULL then the functions behave identically to
      rdma_query_gid. ib_query_gid just calls ib_get_cached_gid, so everything
      can be consolidated to one function.
      
      Now that all callers either use rdma_query_gid() or ib_get_cached_gid(),
      ib_query_gid() API is removed.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      1dfce294
    • P
      RDMA: Use GID from the ib_gid_attr during the add_gid() callback · f4df9a7c
      Parav Pandit 提交于
      Now that ib_gid_attr contains the GID, make use of that in the add_gid()
      callback functions for the provider drivers to simplify the add_gid()
      implementations.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      f4df9a7c
    • P
      IB/core: Introduce GID entry reference counts · b150c386
      Parav Pandit 提交于
      In order to be able to expose pointers to the ib_gid_attrs in the GID
      table we need to make it so the value of the pointer cannot be
      changed. Thus each GID table entry gets a unique piece of kref'd memory
      that is written only during initialization and remains constant for its
      lifetime.
      
      This eventually will allow the struct ib_gid_attrs to be returned without
      copy from many of query the APIs, but it also provides a way to track when
      all users of a HW table index go away.
      
      For roce we no longer allow an in-use HW table index to be re-used for a
      new an different entry. When a GID table entry needs to be removed it is
      hidden from the find API, but remains as a valid HW index and all
      ib_gid_attr points remain valid. The HW index is not relased until all
      users put the kref.
      
      Later patches will broadly replace the use of the sgid_index integer with
      the kref'd structure.
      
      Ultimately this will prevent security problems where the OS changes the
      properties of a HW GID table entry while an active user object is still
      using the entry.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      b150c386