1. 02 8月, 2018 7 次提交
  2. 26 7月, 2018 10 次提交
    • J
      IB/uverbs: Fix locking around struct ib_uverbs_file ucontext · 22fa27fb
      Jason Gunthorpe 提交于
      We have a parallel unlocked reader and writer with ib_uverbs_get_context()
      vs everything else, and nothing guarantees this works properly.
      
      Audit and fix all of the places that access ucontext to use one of the
      following locking schemes:
      - Call ib_uverbs_get_ucontext() under SRCU and check for failure
      - Access the ucontext through an struct ib_uobject context member
        while holding a READ or WRITE lock on the uobject.
        This value cannot be NULL and has no race.
      - Hold the ucontext_lock and check for ufile->ucontext !NULL
      
      This also re-implements ib_uverbs_get_ucontext() in a way that is safe
      against concurrent ib_uverbs_get_context() and disassociation.
      
      As a side effect, every access to ucontext in the commands is via
      ib_uverbs_get_context() with an error check, or via the uobject, so there
      is no longer any need for the core code to check ucontext on every command
      call. These checks are also removed.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      22fa27fb
    • J
      IB/uverbs: Move the FD uobj type struct file allocation to alloc_commit · aba94548
      Jason Gunthorpe 提交于
      Allocating the struct file during alloc_begin creates this strange
      asymmetry with IDR, where the FD has two krefs pointing at it during the
      pre-commit phase. In particular this makes the abort process for FD very
      strange and confusing.
      
      For instance abort currently calls the type's destroy_object twice, and
      the fops release once if abort is done. This is very counter intuitive. No
      fops should be called until alloc_commit succeeds, and destroy_object
      should only ever be called once.
      
      Moving the struct file allocation to the alloc_commit is now simple, as we
      already support failure of rdma_alloc_commit_uobject, with all the
      required rollback pieces.
      
      This creates an understandable symmetry with IDR and simplifies/fixes the
      abort handling for FD types.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      aba94548
    • J
      IB/uverbs: Always propagate errors from rdma_alloc_commit_uobject() · 2c96eb7d
      Jason Gunthorpe 提交于
      The ioctl framework already does this correctly, but the write path did
      not. This is trivially fixed by simply using a standard pattern to return
      uobj_alloc_commit() as the last statement in every function.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      2c96eb7d
    • J
      IB/uverbs: Rework the locking for cleaning up the ucontext · e951747a
      Jason Gunthorpe 提交于
      The locking here has always been a bit crazy and spread out, upon some
      careful analysis we can simplify things.
      
      Create a single function uverbs_destroy_ufile_hw() that internally handles
      all locking. This pulls together pieces of this process that were
      sprinkled all over the places into one place, and covers them with one
      lock.
      
      This eliminates several duplicate/confusing locks and makes the control
      flow in ib_uverbs_close() and ib_uverbs_free_hw_resources() extremely
      simple.
      
      Unfortunately we have to keep an extra mutex, ucontext_lock.  This lock is
      logically part of the rwsem and provides the 'down write, fail if write
      locked, wait if read locked' semantic we require.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      e951747a
    • J
      IB/uverbs: Revise and clarify the rwsem and uobjects_lock · 87064277
      Jason Gunthorpe 提交于
      Rename 'cleanup_rwsem' to 'hw_destroy_rwsem' which is held across any call
      to the type destroy function (aka 'hw' destroy). The main purpose of this
      lock is to prevent normal add and destroy from running concurrently with
      uverbs_cleanup_ufile()
      
      Since the uobjects list is always manipulated under the 'hw_destroy_rwsem'
      we can eliminate the uobjects_lock in the cleanup function. This allows
      converting that lock to a very simple spinlock with a narrow critical
      section.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      87064277
    • J
      IB/uverbs: Clarify and revise uverbs_close_fd · e6d5d5dd
      Jason Gunthorpe 提交于
      The locking requirements here have changed slightly now that we can rely
      on the ib_uverbs_file always existing and containing all the necessary
      locking infrastructure.
      
      That means we can get rid of the cleanup_mutex usage (this was protecting
      the check on !uboj->context).
      
      Otherwise, follow the same pattern that IDR uses for destroy, acquire
      exclusive write access, then call destroy and the undo the 'lookup'.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      e6d5d5dd
    • J
      IB/uverbs: Revise the placement of get/puts on uobject · 5671f79b
      Jason Gunthorpe 提交于
      This wasn't wrong, but the placement of two krefs didn't make any
      sense. Follow some simple rules.
      
      - A kref is held inside uobjects_list
      - A kref is held inside the IDR
      - A kref is held inside file->private
      - A stack based kref is passed bettwen alloc_begin and
        alloc_abort/alloc_commit
      
      Any place we destroy one of the above pointers, we stick a put,
      or 'move' the kref into another pointer.
      
      The key functions have sensible semantics:
      - alloc_uobj fully initializes the common members in uobj, including
        the list
      - Get rid of the uverbs_idr_remove_uobj helper since IDR remove
        does require put, but it depends on the situation. Later
        patches will re-consolidate this differently.
      - alloc_abort always consumes the passed kref, done in the type
      - alloc_commit always consumes the passed kref, done in the type
      - rdma_remove_commit_uobject always pairs with a lookup_get
      
      After it is all done the only control flow change is to:
      - move a get from alloc_commit_fd_uobject to rdma_alloc_commit_uobject
      - add a put to remove_commit_idr_uobject
      - Consistenly use rdma_lookup_put in rdma_remove_commit_uobject at
        the right place
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      5671f79b
    • J
      IB/uverbs: Clarify the kref'ing ordering for alloc_commit · c561c288
      Jason Gunthorpe 提交于
      The alloc_commit callback makes the uobj visible to other threads,
      and it does so using a 'move' semantic of the uobj kref on the stack
      into the public storage (eg the IDR, uobject list and file_private_data)
      
      Once this is done another thread could start up and trigger deletion
      of the kref. Fortunately cleanup_rwsem happens to prevent this from
      being a bug, but that is a fantastically unclear side effect.
      
      Re-organize things so that alloc_commit is that last thing to touch
      the uobj, get rid of the sneaky implicit dependency on cleanup_rwsem,
      and add a comment reminding that uobj is no longer kref'd after
      alloc_commit.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      c561c288
    • J
      IB/uverbs: Handle IDR and FD types without truncation · 1250c304
      Jason Gunthorpe 提交于
      Our ABI for write() uses a s32 for FDs and a u32 for IDRs, but internally
      we ended up implicitly casting these ABI values into an 'int'. For ioctl()
      we use a s64 for FDs and a u64 for IDRs, again casting to an int.
      
      The various casts to int are all missing range checks which can cause
      userspace values that should be considered invalid to be accepted.
      
      Fix this by making the generic lookup routine accept a s64, which does not
      truncate the write API's u32/s32 or the ioctl API's s64. Then push the
      detailed range checking down to the actual type implementations to be
      shared by both interfaces.
      
      Finally, change the copy of the uobj->id to sign extend into a s64, so eg,
      if we ever wish to return a negative value for a FD it is carried
      properly.
      
      This ensures that userspace values are never weirdly interpreted due to
      the various trunctations and everything that is really out of range gets
      an EINVAL.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      1250c304
    • J
      IB/uverbs: Get rid of null_obj_type · 3df593bf
      Jason Gunthorpe 提交于
      If the method fails after calling rdma_explicit_destroy (eg if
      copy_to_user faults) then it will trigger a kernel oops:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      PGD 800000000548d067 P4D 800000000548d067 PUD 54a0067 PMD 0
      SMP PTI
      CPU: 0 PID: 359 Comm: ibv_rc_pingpong Not tainted 4.18.0-rc1+ #28
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      RIP: 0010:          (null)
      Code: Bad RIP value.
      RSP: 0018:ffffc900001a3bf0 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff88000603bd00 RCX: 0000000000000003
      RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88000603bd00
      RBP: 0000000000000001 R08: ffffc900001a3cf8 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffc900001a3cf0
      R13: 0000000000000000 R14: ffffc900001a3cf0 R15: 0000000000000000
      FS:  00007fb00dda8700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 000000000548e004 CR4: 00000000003606b0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       ? rdma_lookup_put_uobject+0x22/0x50 [ib_uverbs]
       ? uverbs_finalize_object+0x3b/0x60 [ib_uverbs]
       ? uverbs_finalize_attrs+0x128/0x140 [ib_uverbs]
       ? ib_uverbs_cmd_verbs+0x698/0x7c0 [ib_uverbs]
       ? find_held_lock+0x2d/0x90
       ? __might_fault+0x39/0x90
       ? ib_uverbs_ioctl+0x111/0x1f0 [ib_uverbs]
       ? do_vfs_ioctl+0xa0/0x6d0
       ? trace_hardirqs_on_caller+0xed/0x180
       ? _raw_spin_unlock_irq+0x24/0x40
       ? syscall_trace_enter+0x138/0x1d0
       ? ksys_ioctl+0x35/0x60
       ? __x64_sys_ioctl+0x11/0x20
       ? do_syscall_64+0x5b/0x1c0
       ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      This is because the type was replaced with the null_type during explicit
      destroy that cannot complete the destruction.
      
      One of the side effects of replacing the type is to make the object
      handle totally unreachable - so no other command could attempt to use
      it, even though it remains on the uboject list.
      
      We can get the same end result by just fully destroying the object inside
      rdma_explicit_destroy and leaving the caller the residual kref for the
      uobj with no attached HW object, and no presence in the ubojects list.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      3df593bf
  3. 10 7月, 2018 5 次提交
  4. 05 7月, 2018 1 次提交
  5. 30 6月, 2018 1 次提交
    • Y
      IB: Improve uverbs_cleanup_ucontext algorithm · 1c77483e
      Yishai Hadas 提交于
      Improve uverbs_cleanup_ucontext algorithm to work properly when the
      topology graph of the objects cannot be determined at compile time.  This
      is the case with objects created via the devx interface in mlx5.
      
      Typically uverbs objects must be created in a strict topologically sorted
      order, so that LIFO ordering will generally cause them to be freed
      properly. There are only a few cases (eg memory windows) where objects can
      point to things out of the strict LIFO order.
      
      Instead of using an explicit ordering scheme where the HW destroy is not
      allowed to fail, go over the list multiple times and allow the destroy
      function to fail. If progress halts then a final, desperate, cleanup is
      done before leaking the memory. This indicates a driver bug.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      1c77483e
  6. 20 6月, 2018 2 次提交
  7. 01 3月, 2018 1 次提交
  8. 16 2月, 2018 4 次提交
    • J
      IB/uverbs: Fix unbalanced unlock on error path for rdma_explicit_destroy · ec6f8401
      Jason Gunthorpe 提交于
      If remove_commit fails then the lock is left locked while the uobj still
      exists. Eventually the kernel will deadlock.
      
      lockdep detects this and says:
      
       test/4221 is leaving the kernel with locks still held!
       1 lock held by test/4221:
        #0:  (&ucontext->cleanup_rwsem){.+.+}, at: [<000000001e5c7523>] rdma_explicit_destroy+0x37/0x120 [ib_uverbs]
      
      Fixes: 4da70da2 ("IB/core: Explicitly destroy an object while keeping uobject")
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      ec6f8401
    • J
      IB/uverbs: Improve lockdep_check · 104f268d
      Jason Gunthorpe 提交于
      This is really being used as an assert that the expected usecnt
      is being held and implicitly that the usecnt is valid. Rename it to
      assert_uverbs_usecnt and tighten the checks to only accept valid
      values of usecnt (eg 0 and < -1 are invalid).
      
      The tigher checkes make the assertion cover more cases and is more
      likely to find bugs via syzkaller/etc.
      
      Fixes: 38321256 ("IB/core: Add support for idr types")
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      104f268d
    • L
      RDMA/uverbs: Protect from races between lookup and destroy of uobjects · 6623e3e3
      Leon Romanovsky 提交于
      The race is between lookup_get_idr_uobject and
      uverbs_idr_remove_uobj -> uverbs_uobject_put.
      
      We deliberately do not call sychronize_rcu after the idr_remove in
      uverbs_idr_remove_uobj for performance reasons, instead we call
      kfree_rcu() during uverbs_uobject_put.
      
      However, this means we can obtain pointers to uobj's that have
      already been released and must protect against krefing them
      using kref_get_unless_zero.
      
      ==================================================================
      BUG: KASAN: use-after-free in copy_ah_attr_from_uverbs.isra.2+0x860/0xa00
      Read of size 4 at addr ffff88005fda1ac8 by task syz-executor2/441
      
      CPU: 1 PID: 441 Comm: syz-executor2 Not tainted 4.15.0-rc2+ #56
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      Call Trace:
      dump_stack+0x8d/0xd4
      print_address_description+0x73/0x290
      kasan_report+0x25c/0x370
      ? copy_ah_attr_from_uverbs.isra.2+0x860/0xa00
      copy_ah_attr_from_uverbs.isra.2+0x860/0xa00
      ? uverbs_try_lock_object+0x68/0xc0
      ? modify_qp.isra.7+0xdc4/0x10e0
      modify_qp.isra.7+0xdc4/0x10e0
      ib_uverbs_modify_qp+0xfe/0x170
      ? ib_uverbs_query_qp+0x970/0x970
      ? __lock_acquire+0xa11/0x1da0
      ib_uverbs_write+0x55a/0xad0
      ? ib_uverbs_query_qp+0x970/0x970
      ? ib_uverbs_query_qp+0x970/0x970
      ? ib_uverbs_open+0x760/0x760
      ? futex_wake+0x147/0x410
      ? sched_clock_cpu+0x18/0x180
      ? check_prev_add+0x1680/0x1680
      ? do_futex+0x3b6/0xa30
      ? sched_clock_cpu+0x18/0x180
      __vfs_write+0xf7/0x5c0
      ? ib_uverbs_open+0x760/0x760
      ? kernel_read+0x110/0x110
      ? lock_acquire+0x370/0x370
      ? __fget+0x264/0x3b0
      vfs_write+0x18a/0x460
      SyS_write+0xc7/0x1a0
      ? SyS_read+0x1a0/0x1a0
      ? trace_hardirqs_on_thunk+0x1a/0x1c
      entry_SYSCALL_64_fastpath+0x18/0x85
      RIP: 0033:0x448e29
      RSP: 002b:00007f443fee0c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007f443fee16bc RCX: 0000000000448e29
      RDX: 0000000000000078 RSI: 00000000209f8000 RDI: 0000000000000012
      RBP: 000000000070bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 0000000000008e98 R14: 00000000006ebf38 R15: 0000000000000000
      
      Allocated by task 1:
      kmem_cache_alloc_trace+0x16c/0x2f0
      mlx5_alloc_cmd_msg+0x12e/0x670
      cmd_exec+0x419/0x1810
      mlx5_cmd_exec+0x40/0x70
      mlx5_core_mad_ifc+0x187/0x220
      mlx5_MAD_IFC+0xd7/0x1b0
      mlx5_query_mad_ifc_gids+0x1f3/0x650
      mlx5_ib_query_gid+0xa4/0xc0
      ib_query_gid+0x152/0x1a0
      ib_query_port+0x21e/0x290
      mlx5_port_immutable+0x30f/0x490
      ib_register_device+0x5dd/0x1130
      mlx5_ib_add+0x3e7/0x700
      mlx5_add_device+0x124/0x510
      mlx5_register_interface+0x11f/0x1c0
      mlx5_ib_init+0x56/0x61
      do_one_initcall+0xa3/0x250
      kernel_init_freeable+0x309/0x3b8
      kernel_init+0x14/0x180
      ret_from_fork+0x24/0x30
      
      Freed by task 1:
      kfree+0xeb/0x2f0
      mlx5_free_cmd_msg+0xcd/0x140
      cmd_exec+0xeba/0x1810
      mlx5_cmd_exec+0x40/0x70
      mlx5_core_mad_ifc+0x187/0x220
      mlx5_MAD_IFC+0xd7/0x1b0
      mlx5_query_mad_ifc_gids+0x1f3/0x650
      mlx5_ib_query_gid+0xa4/0xc0
      ib_query_gid+0x152/0x1a0
      ib_query_port+0x21e/0x290
      mlx5_port_immutable+0x30f/0x490
      ib_register_device+0x5dd/0x1130
      mlx5_ib_add+0x3e7/0x700
      mlx5_add_device+0x124/0x510
      mlx5_register_interface+0x11f/0x1c0
      mlx5_ib_init+0x56/0x61
      do_one_initcall+0xa3/0x250
      kernel_init_freeable+0x309/0x3b8
      kernel_init+0x14/0x180
      ret_from_fork+0x24/0x30
      
      The buggy address belongs to the object at ffff88005fda1ab0
      which belongs to the cache kmalloc-32 of size 32
      The buggy address is located 24 bytes inside of
      32-byte region [ffff88005fda1ab0, ffff88005fda1ad0)
      The buggy address belongs to the page:
      page:00000000d5655c19 count:1 mapcount:0 mapping: (null)
      index:0xffff88005fda1fc0
      flags: 0x4000000000000100(slab)
      raw: 4000000000000100 0000000000000000 ffff88005fda1fc0 0000000180550008
      raw: ffffea00017f6780 0000000400000004 ffff88006c803980 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
      ffff88005fda1980: fc fc fb fb fb fb fc fc fb fb fb fb fc fc fb fb
      ffff88005fda1a00: fb fb fc fc fb fb fb fb fc fc 00 00 00 00 fc fc
      ffff88005fda1a80: fb fb fb fb fc fc fb fb fb fb fc fc fb fb fb fb
      ffff88005fda1b00: fc fc 00 00 00 00 fc fc fb fb fb fb fc fc fb fb
      ffff88005fda1b80: fb fb fc fc fb fb fb fb fc fc fb fb fb fb fc fc
      ==================================================================@
      
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: <stable@vger.kernel.org> # 4.11
      Fixes: 38321256 ("IB/core: Add support for idr types")
      Reported-by: NNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      6623e3e3
    • J
      IB/uverbs: Hold the uobj write lock after allocate · d9dc7a35
      Jason Gunthorpe 提交于
      This clarifies the design intention that time between allocate and
      commit has the uobj exclusive to the caller. We already guarantee
      this by delaying publishing the uobj pointer via idr_insert,
      fd_install, list_add, etc.
      
      Additionally holding the usecnt lock during this period provides
      extra clarity and more protection against future mistakes.
      
      Fixes: 38321256 ("IB/core: Add support for idr types")
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d9dc7a35
  9. 31 8月, 2017 2 次提交
    • M
      IB/core: Explicitly destroy an object while keeping uobject · 4da70da2
      Matan Barak 提交于
      When some objects are destroyed, we need to extract their status at
      destruction. After object's destruction, this status
      (e.g. events_reported) relies in the uobject. In order to have the
      latest and correct status, the underlying object should be destroyed,
      but we should keep the uobject alive and read this information off the
      uobject. We introduce a rdma_explicit_destroy function. This function
      destroys the class type object (for example, the IDR class type which
      destroys the underlying object as well) and then convert the uobject
      to be of a null class type. This uobject will then be destroyed as any
      other uobject once uverbs_finalize_object[s] is called.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      4da70da2
    • M
      IB/core: Add new ioctl interface · fac9658c
      Matan Barak 提交于
      In this ioctl interface, processing the command starts from
      properties of the command and fetching the appropriate user objects
      before calling the handler.
      
      Parsing and validation is done according to a specifier declared by
      the driver's code. In the driver, all supported objects are declared.
      These objects are separated to different object namepsaces. Dividing
      objects to namespaces is done at initialization by using the higher
      bits of the object ids. This initialization can mix objects declared
      in different places to one parsing tree using in this ioctl interface.
      
      For each object we list all supported methods. Similarly to objects,
      methods are separated to method namespaces too. Namespacing is done
      similarly to the objects case. This could be used in order to add
      methods to an existing object.
      
      Each method has a specific handler, which could be either a default
      handler or a driver specific handler.
      Along with the handler, a bunch of attributes are specified as well.
      Similarly to objects and method, attributes are namespaced and hashed
      by their ids at initialization too. All supported attributes are
      subject to automatic fetching and validation. These attributes include
      the command, response and the method's related objects' ids.
      
      When these entities (objects, methods and attributes) are used, the
      high bits of the entities ids are used in order to calculate the hash
      bucket index. Then, these high bits are masked out in order to have a
      zero based index. Since we use these high bits for both bucketing and
      namespacing, we get a compact representation and O(1) array access.
      This is mandatory for efficient dispatching.
      
      Each attribute has a type (PTR_IN, PTR_OUT, IDR and FD) and a length.
      Attributes could be validated through some attributes, like:
      (*) Minimum size / Exact size
      (*) Fops for FD
      (*) Object type for IDR
      
      If an IDR/fd attribute is specified, the kernel also states the object
      type and the required access (NEW, WRITE, READ or DESTROY).
      All uobject/fd management is done automatically by the infrastructure,
      meaning - the infrastructure will fail concurrent commands that at
      least one of them requires concurrent access (WRITE/DESTROY),
      synchronize actions with device removals (dissociate context events)
      and take care of reference counting (increase/decrease) for concurrent
      actions invocation. The reference counts on the actual kernel objects
      shall be handled by the handlers.
      
       objects
      +--------+
      |        |
      |        |   methods                                                                +--------+
      |        |   ns         method      method_spec                           +-----+   |len     |
      +--------+  +------+[d]+-------+   +----------------+[d]+------------+    |attr1+-> |type    |
      | object +> |method+-> | spec  +-> +  attr_buckets  +-> |default_chain+--> +-----+   |idr_type|
      +--------+  +------+   |handler|   |                |   +------------+    |attr2|   |access  |
      |        |  |      |   +-------+   +----------------+   |driver chain|    +-----+   +--------+
      |        |  |      |                                    +------------+
      |        |  +------+
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      |        |
      +--------+
      
      [d] = Hash ids to groups using the high order bits
      
      The right types table is also chosen by using the high bits from
      the ids. Currently we have either default or driver specific groups.
      
      Once validation and object fetching (or creation) completed, we call
      the handler:
      int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
                     struct uverbs_attr_bundle *ctx);
      
      ctx bundles attributes of different namespaces. Each element there
      is an array of attributes which corresponds to one namespaces of
      attributes. For example, in the usually used case:
      
       ctx                               core
      +----------------------------+     +------------+
      | core:                      +---> | valid      |
      +----------------------------+     | cmd_attr   |
      | driver:                    |     +------------+
      |----------------------------+--+  | valid      |
                                      |  | cmd_attr   |
                                      |  +------------+
                                      |  | valid      |
                                      |  | obj_attr   |
                                      |  +------------+
                                      |
                                      |  drivers
                                      |  +------------+
                                      +> | valid      |
                                         | cmd_attr   |
                                         +------------+
                                         | valid      |
                                         | cmd_attr   |
                                         +------------+
                                         | valid      |
                                         | obj_attr   |
                                         +------------+
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      fac9658c
  10. 30 8月, 2017 2 次提交
    • M
      IB/core: Add support to finalize objects in one transaction · f43dbebf
      Matan Barak 提交于
      The new ioctl based infrastructure either commits or rollbacks
      all objects of the method as one transaction. In order to do
      that, we introduce a notion of dealing with a collection of
      objects that are related to a specific method.
      
      This also requires adding a notion of a method and attribute.
      A method contains a hash of attributes, where each bucket
      contains several attributes. The attributes are hashed according
      to their namespace which resides in the four upper bits of the id.
      
      For example, an object could be a CQ, which has an action of CREATE_CQ.
      This action has multiple attributes. For example, the CQ's new handle
      and the comp_channel. Each layer in this hierarchy - objects, methods
      and attributes is split into namespaces. The basic example for that is
      one namespace representing the default entities and another one
      representing the driver specific entities.
      
      When declaring these methods and attributes, we actually declare
      their specifications. When a method is executed, we actually
      allocates some space to hold auxiliary information. This auxiliary
      information contains meta-data about the required objects, such
      as pointers to their type information, pointers to the uobjects
      themselves (if exist), etc.
      The specification, along with the auxiliary information we allocated
      and filled is given to the finalize_objects function.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      f43dbebf
    • M
      IB/core: Add a generic way to execute an operation on a uobject · a0aa309c
      Matan Barak 提交于
      The ioctl infrastructure treats all user-objects in the same manner.
      It gets objects ids from the user-space and by using the object type
      and type attributes mentioned in the object specification, it executes
      this required method. Passing an object id from the user-space as
      an attribute is carried out in three stages. The first is carried out
      before the actual handler and the last is carried out afterwards.
      
      The different supported operations are read, write, destroy and create.
      In the first stage, the former three actions just fetches the object
      from the repository (by using its id) and locks it. The last action
      allocates a new uobject. Afterwards, the second stage is carried out
      when the handler itself carries out the required modification of the
      object. The last stage is carried out after the handler finishes and
      commits the result. The former two operations just unlock the object.
      Destroy calls the "free object" operation, taking into account the
      object's type and releases the uobject as well. Creation just adds the
      new uobject to the repository, making the object visible to the
      application.
      
      In order to abstract these details from the ioctl infrastructure
      layer, we add uverbs_get_uobject_from_context and
      uverbs_finalize_object functions which corresponds to the first
      and last stages respectively.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      a0aa309c
  11. 20 4月, 2017 3 次提交
  12. 06 4月, 2017 2 次提交
    • M
      IB/core: Add support for fd objects · cf8966b3
      Matan Barak 提交于
      The completion channel we use in verbs infrastructure is FD based.
      Previously, we had a separate way to manage this object. Since we
      strive for a single way to manage any kind of object in this
      infrastructure, we conceptually treat all objects as subclasses
      of ib_uobject.
      
      This commit adds the necessary mechanism to support FD based objects
      like their IDR counterparts. FD objects release need to be synchronized
      with context release. We use the cleanup_mutex on the uverbs_file for
      that.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      cf8966b3
    • M
      IB/core: Add support for idr types · 38321256
      Matan Barak 提交于
      The new ioctl infrastructure supports driver specific objects.
      Each such object type has a hot unplug function, allocation size and
      an order of destruction.
      
      When a ucontext is created, a new list is created in this ib_ucontext.
      This list contains all objects created under this ib_ucontext.
      When a ib_ucontext is destroyed, we traverse this list several time
      destroying the various objects by the order mentioned in the object
      type description. If few object types have the same destruction order,
      they are destroyed in an order opposite to their creation.
      
      Adding an object is done in two parts.
      First, an object is allocated and added to idr tree. Then, the
      command's handlers (in downstream patches) could work on this object
      and fill in its required details.
      After a successful command, the commit part is called and the user
      objects become ucontext visible. If the handler failed, alloc_abort
      should be called.
      
      Removing an uboject is done by calling lookup_get with the write flag
      and finalizing it with destroy_commit. A major change from the previous
      code is that we actually destroy the kernel object itself in
      destroy_commit (rather than just the uobject).
      
      We should make sure idr (per-uverbs-file) and list (per-ucontext) could
      be accessed concurrently without corrupting them.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      38321256