1. 28 9月, 2021 2 次提交
  2. 24 9月, 2021 3 次提交
    • L
      RDMA/usnic: Lock VF with mutex instead of spinlock · a86cd017
      Leon Romanovsky 提交于
      Usnic VF doesn't need lock in atomic context to create QPs, so it is safe
      to use mutex instead of spinlock. Such change fixes the following smatch
      error.
      
      Smatch static checker warning:
      
         lib/kobject.c:289 kobject_set_name_vargs()
          warn: sleeping in atomic context
      
      Fixes: 514aee66 ("RDMA: Globally allocate and release QP memory")
      Link: https://lore.kernel.org/r/2a0e295786c127e518ebee8bb7cafcb819a625f6.1631520231.git.leonro@nvidia.comReported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: NHåkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      a86cd017
    • J
      RDMA/hns: Work around broken constant propagation in gcc 8 · 14351f08
      Jason Gunthorpe 提交于
      gcc 8.3 and 5.4 throw this:
      
      In function 'modify_qp_init_to_rtr',
      ././include/linux/compiler_types.h:322:38: error: call to '__compiletime_assert_1859' declared with attribute error: FIELD_PREP: value too large for the field
        _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      [..]
      drivers/infiniband/hw/hns/hns_roce_common.h:91:52: note: in expansion of macro 'FIELD_PREP'
         *((__le32 *)ptr + (field_h) / 32) |= cpu_to_le32(FIELD_PREP(   \
                                                          ^~~~~~~~~~
      drivers/infiniband/hw/hns/hns_roce_common.h:95:39: note: in expansion of macro '_hr_reg_write'
       #define hr_reg_write(ptr, field, val) _hr_reg_write(ptr, field, val)
                                             ^~~~~~~~~~~~~
      drivers/infiniband/hw/hns/hns_roce_hw_v2.c:4412:2: note: in expansion of macro 'hr_reg_write'
        hr_reg_write(context, QPC_LP_PKTN_INI, lp_pktn_ini);
      
      Because gcc has miscalculated the constantness of lp_pktn_ini:
      
      	mtu = ib_mtu_enum_to_int(ib_mtu);
      	if (WARN_ON(mtu < 0)) [..]
      	lp_pktn_ini = ilog2(MAX_LP_MSG_LEN / mtu);
      
      Since mtu is limited to {256,512,1024,2048,4096} lp_pktn_ini is between 4
      and 8 which is compatible with the 4 bit field in the FIELD_PREP.
      
      Work around this broken compiler by adding a 'can never be true'
      constraint on lp_pktn_ini's value which clears out the problem.
      
      Fixes: f0cb411a ("RDMA/hns: Use new interface to modify QP context")
      Link: https://lore.kernel.org/r/0-v1-c773ecb137bc+11f-hns_gcc8_jgg@nvidia.comReported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      14351f08
    • J
      RDMA/cma: Ensure rdma_addr_cancel() happens before issuing more requests · 305d568b
      Jason Gunthorpe 提交于
      The FSM can run in a circle allowing rdma_resolve_ip() to be called twice
      on the same id_priv. While this cannot happen without going through the
      work, it violates the invariant that the same address resolution
      background request cannot be active twice.
      
             CPU 1                                  CPU 2
      
      rdma_resolve_addr():
        RDMA_CM_IDLE -> RDMA_CM_ADDR_QUERY
        rdma_resolve_ip(addr_handler)  #1
      
      			 process_one_req(): for #1
                                addr_handler():
                                  RDMA_CM_ADDR_QUERY -> RDMA_CM_ADDR_BOUND
                                  mutex_unlock(&id_priv->handler_mutex);
                                  [.. handler still running ..]
      
      rdma_resolve_addr():
        RDMA_CM_ADDR_BOUND -> RDMA_CM_ADDR_QUERY
        rdma_resolve_ip(addr_handler)
          !! two requests are now on the req_list
      
      rdma_destroy_id():
       destroy_id_handler_unlock():
        _destroy_id():
         cma_cancel_operation():
          rdma_addr_cancel()
      
                                // process_one_req() self removes it
      		          spin_lock_bh(&lock);
                                 cancel_delayed_work(&req->work);
      	                   if (!list_empty(&req->list)) == true
      
            ! rdma_addr_cancel() returns after process_on_req #1 is done
      
         kfree(id_priv)
      
      			 process_one_req(): for #2
                                addr_handler():
      	                    mutex_lock(&id_priv->handler_mutex);
                                  !! Use after free on id_priv
      
      rdma_addr_cancel() expects there to be one req on the list and only
      cancels the first one. The self-removal behavior of the work only happens
      after the handler has returned. This yields a situations where the
      req_list can have two reqs for the same "handle" but rdma_addr_cancel()
      only cancels the first one.
      
      The second req remains active beyond rdma_destroy_id() and will
      use-after-free id_priv once it inevitably triggers.
      
      Fix this by remembering if the id_priv has called rdma_resolve_ip() and
      always cancel before calling it again. This ensures the req_list never
      gets more than one item in it and doesn't cost anything in the normal flow
      that never uses this strange error path.
      
      Link: https://lore.kernel.org/r/0-v1-3bc675b8006d+22-syz_cancel_uaf_jgg@nvidia.com
      Cc: stable@vger.kernel.org
      Fixes: e51060f0 ("IB: IP address based RDMA connection manager")
      Reported-by: syzbot+dc3dfba010d7671e05f5@syzkaller.appspotmail.com
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      305d568b
  3. 23 9月, 2021 1 次提交
    • J
      RDMA/cma: Do not change route.addr.src_addr.ss_family · bc0bdc5a
      Jason Gunthorpe 提交于
      If the state is not idle then rdma_bind_addr() will immediately fail and
      no change to global state should happen.
      
      For instance if the state is already RDMA_CM_LISTEN then this will corrupt
      the src_addr and would cause the test in cma_cancel_operation():
      
      		if (cma_any_addr(cma_src_addr(id_priv)) && !id_priv->cma_dev)
      
      To view a mangled src_addr, eg with a IPv6 loopback address but an IPv4
      family, failing the test.
      
      This would manifest as this trace from syzkaller:
      
        BUG: KASAN: use-after-free in __list_add_valid+0x93/0xa0 lib/list_debug.c:26
        Read of size 8 at addr ffff8881546491e0 by task syz-executor.1/32204
      
        CPU: 1 PID: 32204 Comm: syz-executor.1 Not tainted 5.12.0-rc8-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Call Trace:
         __dump_stack lib/dump_stack.c:79 [inline]
         dump_stack+0x141/0x1d7 lib/dump_stack.c:120
         print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
         __kasan_report mm/kasan/report.c:399 [inline]
         kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
         __list_add_valid+0x93/0xa0 lib/list_debug.c:26
         __list_add include/linux/list.h:67 [inline]
         list_add_tail include/linux/list.h:100 [inline]
         cma_listen_on_all drivers/infiniband/core/cma.c:2557 [inline]
         rdma_listen+0x787/0xe00 drivers/infiniband/core/cma.c:3751
         ucma_listen+0x16a/0x210 drivers/infiniband/core/ucma.c:1102
         ucma_write+0x259/0x350 drivers/infiniband/core/ucma.c:1732
         vfs_write+0x28e/0xa30 fs/read_write.c:603
         ksys_write+0x1ee/0x250 fs/read_write.c:658
         do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Which is indicating that an rdma_id_private was destroyed without doing
      cma_cancel_listens().
      
      Instead of trying to re-use the src_addr memory to indirectly create an
      any address build one explicitly on the stack and bind to that as any
      other normal flow would do.
      
      Link: https://lore.kernel.org/r/0-v1-9fbb33f5e201+2a-cma_listen_jgg@nvidia.com
      Cc: stable@vger.kernel.org
      Fixes: 732d41c5 ("RDMA/cma: Make the locking for automatic state transition more clear")
      Reported-by: syzbot+6bb0528b13611047209c@syzkaller.appspotmail.com
      Tested-by: NHao Sun <sunhao.th@gmail.com>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      bc0bdc5a
  4. 21 9月, 2021 4 次提交
  5. 15 9月, 2021 2 次提交
    • T
      RDMA/cma: Fix listener leak in rdma_cma_listen_on_all() failure · ca465e1f
      Tao Liu 提交于
      If cma_listen_on_all() fails it leaves the per-device ID still on the
      listen_list but the state is not set to RDMA_CM_ADDR_BOUND.
      
      When the cmid is eventually destroyed cma_cancel_listens() is not called
      due to the wrong state, however the per-device IDs are still holding the
      refcount preventing the ID from being destroyed, thus deadlocking:
      
       task:rping state:D stack:   0 pid:19605 ppid: 47036 flags:0x00000084
       Call Trace:
        __schedule+0x29a/0x780
        ? free_unref_page_commit+0x9b/0x110
        schedule+0x3c/0xa0
        schedule_timeout+0x215/0x2b0
        ? __flush_work+0x19e/0x1e0
        wait_for_completion+0x8d/0xf0
        _destroy_id+0x144/0x210 [rdma_cm]
        ucma_close_id+0x2b/0x40 [rdma_ucm]
        __destroy_id+0x93/0x2c0 [rdma_ucm]
        ? __xa_erase+0x4a/0xa0
        ucma_destroy_id+0x9a/0x120 [rdma_ucm]
        ucma_write+0xb8/0x130 [rdma_ucm]
        vfs_write+0xb4/0x250
        ksys_write+0xb5/0xd0
        ? syscall_trace_enter.isra.19+0x123/0x190
        do_syscall_64+0x33/0x40
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Ensure that cma_listen_on_all() atomically unwinds its action under the
      lock during error.
      
      Fixes: c80a0c52 ("RDMA/cma: Add missing error handling of listen_id")
      Link: https://lore.kernel.org/r/20210913093344.17230-1-thomas.liu@ucloud.cnSigned-off-by: NTao Liu <thomas.liu@ucloud.cn>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      ca465e1f
    • C
      IB/cma: Do not send IGMP leaves for sendonly Multicast groups · 2cc74e1e
      Christoph Lameter 提交于
      ROCE uses IGMP for Multicast instead of the native Infiniband system where
      joins are required in order to post messages on the Multicast group.  On
      Ethernet one can send Multicast messages to arbitrary addresses without
      the need to subscribe to a group.
      
      So ROCE correctly does not send IGMP joins during rdma_join_multicast().
      
      F.e. in cma_iboe_join_multicast() we see:
      
         if (addr->sa_family == AF_INET) {
                      if (gid_type == IB_GID_TYPE_ROCE_UDP_ENCAP) {
                              ib.rec.hop_limit = IPV6_DEFAULT_HOPLIMIT;
                              if (!send_only) {
                                      err = cma_igmp_send(ndev, &ib.rec.mgid,
                                                          true);
                              }
                      }
              } else {
      
      So the IGMP join is suppressed as it is unnecessary.
      
      However no such check is done in destroy_mc(). And therefore leaving a
      sendonly multicast group will send an IGMP leave.
      
      This means that the following scenario can lead to a multicast receiver
      unexpectedly being unsubscribed from a MC group:
      
      1. Sender thread does a sendonly join on MC group X. No IGMP join
         is sent.
      
      2. Receiver thread does a regular join on the same MC Group x.
         IGMP join is sent and the receiver begins to get messages.
      
      3. Sender thread terminates and destroys MC group X.
         IGMP leave is sent and the receiver no longer receives data.
      
      This patch adds the same logic for sendonly joins to destroy_mc() that is
      also used in cma_iboe_join_multicast().
      
      Fixes: ab15c95a ("IB/core: Support for CMA multicast join flags")
      Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2109081340540.668072@gentwo.deSigned-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      2cc74e1e
  6. 14 9月, 2021 1 次提交
  7. 08 9月, 2021 5 次提交
  8. 30 8月, 2021 1 次提交
  9. 26 8月, 2021 15 次提交
  10. 25 8月, 2021 3 次提交
  11. 24 8月, 2021 3 次提交