1. 02 6月, 2018 5 次提交
  2. 06 4月, 2018 3 次提交
  3. 05 4月, 2018 6 次提交
  4. 04 4月, 2018 5 次提交
    • P
      RDMA: Use ib_gid_attr during GID modification · 414448d2
      Parav Pandit 提交于
      Now that ib_gid_attr contains device, port and index, simplify the
      provider APIs add_gid() and del_gid() to use device, port and index
      fields from the ib_gid_attr attributes structure.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      414448d2
    • P
      IB/core: Refactor GID modify code for RoCE · 598ff6ba
      Parav Pandit 提交于
      Code is refactored to prepare separate functions for RoCE which can do more
      complex operations related to reference counting, while still
      maintainining code readability. This includes
      (a) Simplification to not perform netdevice checks and modifications
      for IB link layer.
      (b) Do not add RoCE GID entry which has NULL netdevice; instead return
      an error.
      (c) If GID addition fails at provider level add_gid(), do not add the
      entry in the cache and keep the entry marked as INVALID.
      (d) Simplify and reuse the ib_cache_gid_add()/del() routines so that they
      can be used even for modifying default GIDs. This avoid some code
      duplication in modifying default GIDs.
      (e) find_gid() routine refers to the data entry flags to qualify a GID
      as valid or invalid GID rather than depending on attributes and zeroness
      of the GID content.
      (f) gid_table_reserve_default() sets the GID default attribute at
      beginning while setting up the GID table. There is no need to use
      default_gid flag in low level functions such as write_gid(), add_gid(),
      del_gid(), as they never need to update the DEFAULT property of the GID
      entry while during GID table update.
      
      As as result of this refactor, reserved GID 0:0:0:0:0:0:0:0 is no longer
      searchable as described below.
      
      A unicast GID entry of 0:0:0:0:0:0:0:0 is Reserved GID as per the IB
      spec version 1.3 section 4.1.1, point (6) whose snippet is below.
      
      "The unicast GID address 0:0:0:0:0:0:0:0 is reserved - referred to as
      the Reserved GID. It shall never be assigned to any endport. It shall
      not be used as a destination address or in a global routing header
      (GRH)."
      
      GID table cache now only stores valid GID entries. Before this patch,
      Reserved GID 0:0:0:0:0:0:0:0 was searchable in the GID table using
      ib_find_cached_gid_by_port() and other similar find routines.
      
      Zero GID is no longer searchable as it shall not to be present in GRH or
      path recored entry as described in IB spec version 1.3 section 4.1.1,
      point (6), section 12.7.10 and section 12.7.20.
      
      ib_cache_update() is simplified to check link layer once, use unified
      locking scheme for all link layers, removed temporary gid table
      allocation/free logic.
      
      Additionally,
      (a) Expand ib_gid_attr to store port and index so that GID query
      routines can get port and index information from the attribute structure.
      (b) Expand ib_gid_attr to store device as well so that in future code when
      GID reference counting is done, device is used to reach back to the GID
      table entry.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      598ff6ba
    • P
      IB/core: Simplify ib_query_gid to always refer to cache · f35faa4b
      Parav Pandit 提交于
      Currently following inconsistencies exist.
      1. ib_query_gid() returns GID from the software cache for a RoCE port
      and returns GID from the HCA for an IB port.
      This is incorrect because software GID cache is maintained regardless
      of HCA port type.
      
      2. GID is queries from the HCA via ib_query_gid and updated in the
      software cache for IB link layer. Both of them might not be in sync.
      
      ULPs such as SRP initiator, SRP target, IPoIB driver have historically
      used ib_query_gid() API to query the GID. However CM used cached version
      during CM processing, When software cache was introduced, this
      inconsitency remained.
      
      In order to simplify, improve readability and avoid link layer
      specific above inconsistencies, this patch brings following changes.
      
      1. ib_query_gid() always refers to the cache layer regardless of link
      layer.
      
      2. cache module who reads the GID entry from HCA and builds the cache,
      directly invokes the HCA provider verb's query_gid() callback function.
      
      3. ib_query_port() is being called in early stage where GID cache is not
      yet build while reading port immutable property. Therefore it needs to
      read the default GID from the HCA for IB link layer to publish the
      subnet prefix.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      f35faa4b
    • P
      RDMA/providers: Simplify query_gid callback of RoCE providers · 0e1f9b92
      Parav Pandit 提交于
      ib_query_gid() fetches the GID from the software cache maintained in
      ib_core for RoCE ports.
      
      Therefore, simplify the provider drivers for RoCE to treat query_gid()
      callback as never called for RoCE, and only require non-RoCE devices to
      implement it.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      0e1f9b92
    • R
      RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device · 8435168d
      Roland Dreier 提交于
      Check to make sure that ctx->cm_id->device is set before we use it.
      Otherwise userspace can trigger a NULL dereference by doing
      RDMA_USER_CM_CMD_SET_OPTION on an ID that is not bound to a device.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: <syzbot+a67bc93e14682d92fc2f@syzkaller.appspotmail.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      8435168d
  5. 30 3月, 2018 5 次提交
  6. 29 3月, 2018 2 次提交
    • S
      RDMA/CMA: Add rdma_port_space to UAPI · 2253fc0c
      Steve Wise 提交于
      Since the rdma_port_space enum is being passed between user and kernel for
      user cm_id setup, we need it in a UAPI header.  So add it to
      rdma_user_cm.h.
      
      This also fixes the cm_id restrack changes which pass up the port space
      value via the RDMA_NLDEV_ATTR_RES_PS attribute.
      
      Fixes: 00313983 ("RDMA/nldev: provide detailed CM_ID information")
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      2253fc0c
    • R
      RDMA/ucma: Introduce safer rdma_addr_size() variants · 84652aef
      Roland Dreier 提交于
      There are several places in the ucma ABI where userspace can pass in a
      sockaddr but set the address family to AF_IB.  When that happens,
      rdma_addr_size() will return a size bigger than sizeof struct sockaddr_in6,
      and the ucma kernel code might end up copying past the end of a buffer
      not sized for a struct sockaddr_ib.
      
      Fix this by introducing new variants
      
          int rdma_addr_size_in6(struct sockaddr_in6 *addr);
          int rdma_addr_size_kss(struct __kernel_sockaddr_storage *addr);
      
      that are type-safe for the types used in the ucma ABI and return 0 if the
      size computed is bigger than the size of the type passed in.  We can use
      these new variants to check what size userspace has passed in before
      copying any addresses.
      
      Reported-by: <syzbot+6800425d54ed3ed8135d@syzkaller.appspotmail.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      84652aef
  7. 28 3月, 2018 11 次提交
    • P
      IB/core: Refer to RoCE port property to decide building cache · 190fb9c4
      Parav Pandit 提交于
      IB core maintains the GID cache entries for the GID table.
      This cache table has to be maintained regardless of HCA's
      support of GID table.
      For IB and iWarp ports, cache is created by querying the HCA.
      For RoCE cache is created based on netdev events.
      
      Therefore just refer to the RoCE port property of the {device, port} to
      decide whether to build cache by querying HCA or from netdev events.
      There is no need to check if HCA support GID table or not.
      
      ib_cache_update() referred to RoCE attribute before validating
      port. Though in all current callers port is valid, it is incorrect
      to query RoCE port property before validating the port. Therefore,
      rdma_protocol_roce() check is done after rdma_is_port_valid() verifies
      that port is valid.
      
      Fixes: 115b68aa ("IB/ocrdma: Removed GID add/del null routines")
      Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      190fb9c4
    • P
      IB/core: Search GID only for IB link layer · 22d24f75
      Parav Pandit 提交于
      Even though API is only used by IPoIB driver, its incorrect to refer
      RoCE GID table property to search for GID.
      
      Look for only IB link layer to search for the GID.
      
      Fixes: dbb12562 ("IB/{core, ipoib}: Simplify ib_find_gid to search only for IB link layer")
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      22d24f75
    • P
      IB/core: Refer to RoCE port property instead of GID table property · 4ab7cb4b
      Parav Pandit 提交于
      ib_find_gid_by_filter() searches GID with filter only for RoCE link
      layer regardless of HCA's support for GID table.
      Therefore, right way to lookup is compare RoCE port property and not
      the GID table property.
      
      Fixes: 99b27e3b ("IB/cache: Add ib_find_gid_by_filter cache API")
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      4ab7cb4b
    • P
      IB/core: Generate GID change event regardless of RoCE GID table property · 3401857e
      Parav Pandit 提交于
      Due to following reasons, GID table event is generated regardless of GID
      table property.
      
      1. GID table cache is maintained at ib core layer regardless of link layer.
      2. GID change event has no relation with IB link layer.
      3. GID change event also doesn't depend on whether HCA supports GID table
      or not.
      
      Fixes: f3906bd3 ("IB/core: Refactor GID cache's ib_dispatch_event")
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      3401857e
    • P
      IB/cm: Block processing alternate path handling RoCE Rx cm messages · 97c45c2c
      Parav Pandit 提交于
      Due to below reasons, it is better to not support alternate path receive
      messages for RoCE in near term.
      
      1. Alternate path for RoCE is not supported at rdmacm layer.
      2. It is not supported in uverbs/core layer for RoCE.
      3. Alternate path for IPv6 for link local address cannot resolve route
      determinstically without a valid incoming interface id whose usecase
      make sense only with dual port mode.
      4. init_av_from_path while processing LAP messages for IB and RoCE can
      lead to adding duplicate entry of AV into the port list, leads to list
      corruption.
      5. rdma-core userspace a well known userspace implementation has removed
      support of libucm which use ucm.ko module, which is the only module that
      can trigger alternate path related messages.
      6. ucm kernel module is requested to be removed from the IB core in
      patch [1].
      
      [1] https://patchwork.kernel.org/patch/10268503/Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      97c45c2c
    • M
      IB/core: Protect against concurrent access to hardware stats · e945130b
      Mark Bloch 提交于
      Currently access to hardware stats buffer isn't protected, this can
      result in multiple writes and reads at the same time to the same
      memory location. This can lead to providing an incorrect value to
      the user. Add a mutex to protect against it.
      
      Fixes: b40f4757 ("IB/core: Make device counter infrastructure dynamic")
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      e945130b
    • J
      RDMA/ucma: Fix uABI structure layouts for 32/64 compat · 611cb92b
      Jason Gunthorpe 提交于
      The rdma_ucm_event_resp is a different length on 32 and 64 bit compiles.
      
      The kernel requires it to be the expected length or longer so 32 bit
      builds running on a 64 bit kernel will not work.
      
      Retain full compat by having all kernels accept a struct with or without
      the trailing reserved field.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      611cb92b
    • L
      RDMA/ucma: Check that device exists prior to accessing it · c8d3bcbf
      Leon Romanovsky 提交于
      Ensure that device exists prior to accessing its properties.
      
      Reported-by: <syzbot+71655d44855ac3e76366@syzkaller.appspotmail.com>
      Fixes: 75216638 ("RDMA/cma: Export rdma cm interface to userspace")
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      c8d3bcbf
    • L
      RDMA/ucma: Check that device is connected prior to access it · 4b658d1b
      Leon Romanovsky 提交于
      Add missing check that device is connected prior to access it.
      
      [   55.358652] BUG: KASAN: null-ptr-deref in rdma_init_qp_attr+0x4a/0x2c0
      [   55.359389] Read of size 8 at addr 00000000000000b0 by task qp/618
      [   55.360255]
      [   55.360432] CPU: 1 PID: 618 Comm: qp Not tainted 4.16.0-rc1-00071-gcaf61b1b #91
      [   55.361693] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
      [   55.363264] Call Trace:
      [   55.363833]  dump_stack+0x5c/0x77
      [   55.364215]  kasan_report+0x163/0x380
      [   55.364610]  ? rdma_init_qp_attr+0x4a/0x2c0
      [   55.365238]  rdma_init_qp_attr+0x4a/0x2c0
      [   55.366410]  ucma_init_qp_attr+0x111/0x200
      [   55.366846]  ? ucma_notify+0xf0/0xf0
      [   55.367405]  ? _get_random_bytes+0xea/0x1b0
      [   55.367846]  ? urandom_read+0x2f0/0x2f0
      [   55.368436]  ? kmem_cache_alloc_trace+0xd2/0x1e0
      [   55.369104]  ? refcount_inc_not_zero+0x9/0x60
      [   55.369583]  ? refcount_inc+0x5/0x30
      [   55.370155]  ? rdma_create_id+0x215/0x240
      [   55.370937]  ? _copy_to_user+0x4f/0x60
      [   55.371620]  ? mem_cgroup_commit_charge+0x1f5/0x290
      [   55.372127]  ? _copy_from_user+0x5e/0x90
      [   55.372720]  ucma_write+0x174/0x1f0
      [   55.373090]  ? ucma_close_id+0x40/0x40
      [   55.373805]  ? __lru_cache_add+0xa8/0xd0
      [   55.374403]  __vfs_write+0xc4/0x350
      [   55.374774]  ? kernel_read+0xa0/0xa0
      [   55.375173]  ? fsnotify+0x899/0x8f0
      [   55.375544]  ? fsnotify_unmount_inodes+0x170/0x170
      [   55.376689]  ? __fsnotify_update_child_dentry_flags+0x30/0x30
      [   55.377522]  ? handle_mm_fault+0x174/0x320
      [   55.378169]  vfs_write+0xf7/0x280
      [   55.378864]  SyS_write+0xa1/0x120
      [   55.379270]  ? SyS_read+0x120/0x120
      [   55.379643]  ? mm_fault_error+0x180/0x180
      [   55.380071]  ? task_work_run+0x7d/0xd0
      [   55.380910]  ? __task_pid_nr_ns+0x120/0x140
      [   55.381366]  ? SyS_read+0x120/0x120
      [   55.381739]  do_syscall_64+0xeb/0x250
      [   55.382143]  entry_SYSCALL_64_after_hwframe+0x21/0x86
      [   55.382841] RIP: 0033:0x7fc2ef803e99
      [   55.383227] RSP: 002b:00007fffcc5f3be8 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
      [   55.384173] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc2ef803e99
      [   55.386145] RDX: 0000000000000057 RSI: 0000000020000080 RDI: 0000000000000003
      [   55.388418] RBP: 00007fffcc5f3c00 R08: 0000000000000000 R09: 0000000000000000
      [   55.390542] R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000400480
      [   55.392916] R13: 00007fffcc5f3cf0 R14: 0000000000000000 R15: 0000000000000000
      [   55.521088] Code: e5 4d 1e ff 48 89 df 44 0f b6 b3 b8 01 00 00 e8 65 50 1e ff 4c 8b 2b 49
      8d bd b0 00 00 00 e8 56 50 1e ff 41 0f b6 c6 48 c1 e0 04 <49> 03 85 b0 00 00 00 48 8d 78 08
      48 89 04 24 e8 3a 4f 1e ff 48
      [   55.525980] RIP: rdma_init_qp_attr+0x52/0x2c0 RSP: ffff8801e2c2f9d8
      [   55.532648] CR2: 00000000000000b0
      [   55.534396] ---[ end trace 70cee64090251c0b ]---
      
      Fixes: 75216638 ("RDMA/cma: Export rdma cm interface to userspace")
      Fixes: d541e455 ("IB/core: Convert ah_attr from OPA to IB when copying to user")
      Reported-by: <syzbot+7b62c837c2516f8f38c8@syzkaller.appspotmail.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      4b658d1b
    • J
      RDMA/rdma_cm: Fix use after free race with process_one_req · 9137108c
      Jason Gunthorpe 提交于
      process_one_req() can race with rdma_addr_cancel():
      
                 CPU0                                 CPU1
                 ====                                 ====
       process_one_work()
        debug_work_deactivate(work);
        process_one_req()
                                              rdma_addr_cancel()
      	                                  mutex_lock(&lock);
       			    	           set_timeout(&req->work,..);
                                                    __queue_work()
      				   	       debug_work_activate(work);
      	                                  mutex_unlock(&lock);
      
         mutex_lock(&lock);
      [..]
      	list_del(&req->list);
         mutex_unlock(&lock);
      [..]
      
         // ODEBUG explodes since the work is still queued.
         kfree(req);
      
      Causing ODEBUG to detect the use after free:
      
      ODEBUG: free active (active state 0) object type: work_struct hint: process_one_req+0x0/0x6c0 include/net/dst.h:165
      WARNING: CPU: 0 PID: 79 at lib/debugobjects.c:291 debug_print_object+0x166/0x220 lib/debugobjects.c:288
      kvm: emulating exchange as write
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 0 PID: 79 Comm: kworker/u4:3 Not tainted 4.16.0-rc6+ #361
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: ib_addr process_one_req
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x24d lib/dump_stack.c:53
       panic+0x1e4/0x41c kernel/panic.c:183
       __warn+0x1dc/0x200 kernel/panic.c:547
       report_bug+0x1f4/0x2b0 lib/bug.c:186
       fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
       fixup_bug arch/x86/kernel/traps.c:247 [inline]
       do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
       invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
      RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288
      RSP: 0000:ffff8801d966f210 EFLAGS: 00010086
      RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff815acd6e
      RDX: 0000000000000000 RSI: 1ffff1003b2cddf2 RDI: 0000000000000000
      RBP: ffff8801d966f250 R08: 0000000000000000 R09: 1ffff1003b2cddc8
      R10: ffffed003b2cde71 R11: ffffffff86f39a98 R12: 0000000000000001
      R13: ffffffff86f15540 R14: ffffffff86408700 R15: ffffffff8147c0a0
       __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
       debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
       kfree+0xc7/0x260 mm/slab.c:3799
       process_one_req+0x2e7/0x6c0 drivers/infiniband/core/addr.c:592
       process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
       worker_thread+0x223/0x1990 kernel/workqueue.c:2247
       kthread+0x33c/0x400 kernel/kthread.c:238
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
      
      Fixes: 5fff41e1 ("IB/core: Fix race condition in resolving IP to MAC")
      Reported-by: <syzbot+3b4acab09b6463472d0a@syzkaller.appspotmail.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      9137108c
    • K
      net: Drop pernet_operations::async · 2f635cee
      Kirill Tkhai 提交于
      Synchronous pernet_operations are not allowed anymore.
      All are asynchronous. So, drop the structure member.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f635cee
  8. 24 3月, 2018 1 次提交
    • P
      IB/cma: Resolve route only while receiving CM requests · 114cc9c4
      Parav Pandit 提交于
      Currently CM request for RoCE follows following flow.
      rdma_create_id()
      rdma_resolve_addr()
      rdma_resolve_route()
      For RC QPs:
      rdma_connect()
      ->cma_connect_ib()
        ->ib_send_cm_req()
          ->cm_init_av_by_path()
            ->ib_init_ah_attr_from_path()
      For UD QPs:
      rdma_connect()
      ->cma_resolve_ib_udp()
        ->ib_send_cm_sidr_req()
          ->cm_init_av_by_path()
            ->ib_init_ah_attr_from_path()
      
      In both the flows, route is already resolved before sending CM requests.
      Therefore, code is refactored to avoid resolving route second time in
      ib_cm layer.
      ib_init_ah_attr_from_path() is extended to resolve route when it is not
      yet resolved for RoCE link layer. This is achieved by caller setting
      route_resolved field in path record whenever it has route already
      resolved.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      114cc9c4
  9. 23 3月, 2018 2 次提交