1. 31 1月, 2019 1 次提交
    • G
      RDMA: Add indication for in kernel API support to IB device · 6780c4fa
      Gal Pressman 提交于
      Drivers that do not provide kernel verbs support should not be used by ib
      kernel clients at all.
      
      In case a device does not implement all mandatory verbs for kverbs usage
      mark it as a non kverbs provider and prevent its usage for all clients
      except for uverbs.
      
      The device is marked as a non kverbs provider using the 'kverbs_provider'
      flag which should only be set by the core code.  The clients can choose
      whether kverbs are requested for its usage using the 'no_kverbs_req' flag
      which is currently set for uverbs only.
      
      This patch allows drivers to remove mandatory verbs stubs and simply set
      the callbacks to NULL. The IB device will be registered as a non-kverbs
      provider. Note that verbs that are required for the device registration
      process must be implemented.
      Signed-off-by: NGal Pressman <galpress@amazon.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      6780c4fa
  2. 11 1月, 2019 1 次提交
  3. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  4. 12 12月, 2018 1 次提交
  5. 04 12月, 2018 2 次提交
  6. 27 11月, 2018 5 次提交
  7. 23 11月, 2018 3 次提交
    • J
      RDMA/uverbs: Convert the write interface to use uverbs_api · d120c3c9
      Jason Gunthorpe 提交于
      This organizes the write commands into objects and links them to the
      uverbs_api data structure. The command path is reworked to use uapi
      instead of its internal structures.
      
      The command mask is moved from a runtime check to a registration time
      check in the uapi.
      
      Since the write interface does not have the object ID as part of the
      command, the radix bins are converted into linear lists to support the
      lookup.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      d120c3c9
    • J
      RDMA/uverbs: Add helpers to mark uapi functions as unsupported · 6829c1c2
      Jason Gunthorpe 提交于
      We have many cases where parts of the uapi are not supported in a driver,
      needs a certain protocol, or whatever. It is best to reflect this directly
      into the struct uverbs_api when it is built so that everything is simply
      blocked off, and future introspection can report a proper supported list.
      
      This is done by adding some additional helpers to the definition list
      language that disable objects based on a 'supported' call back, and a
      helper that disables based on a NULL struct ib_device function pointer.
      
      Disablement is global. For instance, if a driver disables an object then
      everything connected to that object is removed, including core methods.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      6829c1c2
    • J
      RDMA/uverbs: Use a linear list to describe the compiled-in uapi · 0cbf432d
      Jason Gunthorpe 提交于
      The 'tree' data structure is very hard to build at compile time, and this
      makes it very limited. The new radix tree based compiler can handle a more
      complex input language that does not require the compiler to perfectly
      group everything into a neat tree structure.
      
      Instead use a simple list to describe to input, where the list elements
      can be of various different 'opcodes' instructing the radix compiler what
      to do. Start out with opcodes chaining to other definition lists and
      chaining to the existing 'tree' definition.
      
      Replace the very top level of the 'object tree' with this list type and
      get rid of struct uverbs_object_tree_def and DECLARE_UVERBS_OBJECT_TREE.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      0cbf432d
  8. 17 10月, 2018 1 次提交
  9. 27 9月, 2018 1 次提交
  10. 21 9月, 2018 1 次提交
    • J
      RDMA/ucontext: Add a core API for mmaping driver IO memory · 5f9794dc
      Jason Gunthorpe 提交于
      To support disassociation and PCI hot unplug, we have to track all the
      VMAs that refer to the device IO memory. When disassociation occurs the
      VMAs have to be revised to point to the zero page, not the IO memory, to
      allow the physical HW to be unplugged.
      
      The three drivers supporting this implemented three different versions
      of this algorithm, all leaving something to be desired. This new common
      implementation has a few differences from the driver versions:
      
      - Track all VMAs, including splitting/truncating/etc. Tie the lifetime of
        the private data allocation to the lifetime of the vma. This avoids any
        tricks with setting vm_ops which Linus didn't like. (see link)
      - Support multiple mms, and support properly tracking mmaps triggered by
        processes other than the one first opening the uverbs fd. This makes
        fork behavior of disassociation enabled drivers the same as fork support
        in normal drivers.
      - Don't use crazy get_task stuff.
      - Simplify the approach for to racing between vm_ops close and
        disassociation, fixing the related bugs most of the driver
        implementations had. Since we are in core code the tracking list can be
        placed in struct ib_uverbs_ufile, which has a lifetime strictly longer
        than any VMAs created by mmap on the uverbs FD.
      
      Link: https://www.spinics.net/lists/stable/msg248747.html
      Link: https://lkml.kernel.org/r/CA+55aFxJTV_g46AQPoPXen-UPiqR1HGMZictt7VpC-SMFbm3Cw@mail.gmail.comSigned-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      5f9794dc
  11. 20 9月, 2018 2 次提交
  12. 13 9月, 2018 1 次提交
    • S
      RDMA/uverbs: Atomically flush and mark closed the comp event queue · 67e38168
      Steve Wise 提交于
      Currently a uverbs completion event queue is flushed of events in
      ib_uverbs_comp_event_close() with the queue spinlock held and then
      released.  Yet setting ev_queue->is_closed is not set until later in
      uverbs_hot_unplug_completion_event_file().
      
      In between the time ib_uverbs_comp_event_close() releases the lock and
      uverbs_hot_unplug_completion_event_file() acquires the lock, a completion
      event can arrive and be inserted into the event queue by
      ib_uverbs_comp_handler().
      
      This can cause a "double add" list_add warning or crash depending on the
      kernel configuration, or a memory leak because the event is never dequeued
      since the queue is already closed down.
      
      So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close().
      
      Cc: stable@vger.kernel.org
      Fixes: 1e7710f3 ("IB/core: Change completion channel to use the reworked objects schema")
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      67e38168
  13. 06 9月, 2018 3 次提交
  14. 13 8月, 2018 1 次提交
  15. 11 8月, 2018 2 次提交
  16. 02 8月, 2018 3 次提交
    • J
      IB/uverbs: Allow all DESTROY commands to succeed after disassociate · 0f50d88a
      Jason Gunthorpe 提交于
      The disassociate function was broken by design because it failed all
      commands. This prevents userspace from calling destroy on a uobject after
      it has detected a device fatal error and thus reclaiming the resources in
      userspace is prevented.
      
      This fix is now straightforward, when anything destroys a uobject that is
      not the user the object remains on the IDR with a NULL context and object
      pointer. All lookup locking modes other than DESTROY will fail. When the
      user ultimately calls the destroy function it is simply dropped from the
      IDR while any related information is returned.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      0f50d88a
    • J
      IB/uverbs: Do not block disassociate during write() · a9b66d64
      Jason Gunthorpe 提交于
      Now that all the callbacks are safe to run concurrently with
      disassociation this test can be eliminated. The ufile core infrastructure
      becomes entirely self contained and is not sensitive to disassociation.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      a9b66d64
    • J
      IB/uverbs: Do not pass struct ib_device to the write based methods · bbd51e88
      Jason Gunthorpe 提交于
      This is a step to get rid of the global check for disassociation. In this
      model, the ib_dev is not proven to be valid by the core code and cannot be
      provided to the method. Instead, every method decides if it is able to
      run after disassociation and obtains the ib_dev using one of three
      different approaches:
      
      - Call srcu_dereference on the udevice's ib_dev. As before, this means
        the method cannot be called after disassociation begins.
        (eg alloc ucontext)
      - Retrieve the ib_dev from the ucontext, via ib_uverbs_get_ucontext()
      - Retrieve the ib_dev from the uobject->object after checking
        under SRCU if disassociation has started (eg uobj_get)
      
      Largely, the code is all ready for this, the main work is to provide a
      ib_dev after calling uobj_alloc(). The few other places simply use
      ib_uverbs_get_ucontext() to get the ib_dev.
      
      This flexibility will let the next patches allow destroy to operate
      after disassociation.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      bbd51e88
  17. 26 7月, 2018 3 次提交
    • J
      IB/uverbs: Fix locking around struct ib_uverbs_file ucontext · 22fa27fb
      Jason Gunthorpe 提交于
      We have a parallel unlocked reader and writer with ib_uverbs_get_context()
      vs everything else, and nothing guarantees this works properly.
      
      Audit and fix all of the places that access ucontext to use one of the
      following locking schemes:
      - Call ib_uverbs_get_ucontext() under SRCU and check for failure
      - Access the ucontext through an struct ib_uobject context member
        while holding a READ or WRITE lock on the uobject.
        This value cannot be NULL and has no race.
      - Hold the ucontext_lock and check for ufile->ucontext !NULL
      
      This also re-implements ib_uverbs_get_ucontext() in a way that is safe
      against concurrent ib_uverbs_get_context() and disassociation.
      
      As a side effect, every access to ucontext in the commands is via
      ib_uverbs_get_context() with an error check, or via the uobject, so there
      is no longer any need for the core code to check ucontext on every command
      call. These checks are also removed.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      22fa27fb
    • J
      IB/uverbs: Rework the locking for cleaning up the ucontext · e951747a
      Jason Gunthorpe 提交于
      The locking here has always been a bit crazy and spread out, upon some
      careful analysis we can simplify things.
      
      Create a single function uverbs_destroy_ufile_hw() that internally handles
      all locking. This pulls together pieces of this process that were
      sprinkled all over the places into one place, and covers them with one
      lock.
      
      This eliminates several duplicate/confusing locks and makes the control
      flow in ib_uverbs_close() and ib_uverbs_free_hw_resources() extremely
      simple.
      
      Unfortunately we have to keep an extra mutex, ucontext_lock.  This lock is
      logically part of the rwsem and provides the 'down write, fail if write
      locked, wait if read locked' semantic we require.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      e951747a
    • J
      IB/uverbs: Revise and clarify the rwsem and uobjects_lock · 87064277
      Jason Gunthorpe 提交于
      Rename 'cleanup_rwsem' to 'hw_destroy_rwsem' which is held across any call
      to the type destroy function (aka 'hw' destroy). The main purpose of this
      lock is to prevent normal add and destroy from running concurrently with
      uverbs_cleanup_ufile()
      
      Since the uobjects list is always manipulated under the 'hw_destroy_rwsem'
      we can eliminate the uobjects_lock in the cleanup function. This allows
      converting that lock to a very simple spinlock with a narrow critical
      section.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      87064277
  18. 10 7月, 2018 5 次提交
  19. 05 7月, 2018 1 次提交
  20. 20 6月, 2018 1 次提交
  21. 13 6月, 2018 1 次提交