1. 26 9月, 2018 3 次提交
    • P
      RDMA/core: Set right entry state before releasing reference · 5c5702e2
      Parav Pandit 提交于
      Currently add_modify_gid() for IB link layer has followong issue
      in cache update path.
      
      When GID update event occurs, core releases reference to the GID
      table without updating its state and/or entry pointer.
      
      CPU-0                              CPU-1
      ------                             -----
      ib_cache_update()                    IPoIB ULP
         add_modify_gid()                   [..]
            put_gid_entry()
            refcnt = 0, but
            state = valid,
            entry is valid.
            (work item is not yet executed).
                                         ipoib_create_ah()
                                           rdma_create_ah()
                                              rdma_get_gid_attr() <--
                                         	Tries to acquire gid_attr
                                              which has refcnt = 0.
                                         	This is incorrect.
      
      GID entry state and entry pointer is provides the accurate GID enty
      state. Such fields must be updated with rwlock to protect against
      readers and, such fields must be in sane state before refcount can drop
      to zero. Otherwise above race condition can happen leading to
      use-after-free situation.
      
      Following backtrace has been observed when cache update for an IB port
      is triggered while IPoIB ULP is creating an AH.
      
      Therefore, when updating GID entry, first mark a valid entry as invalid
      through state and set the barrier so that no callers can acquired
      the GID entry, followed by release reference to it.
      
      refcount_t: increment on 0; use-after-free.
      WARNING: CPU: 4 PID: 29106 at lib/refcount.c:153 refcount_inc_checked+0x30/0x50
      Workqueue: ib-comp-unb-wq ib_cq_poll_work [ib_core]
      RIP: 0010:refcount_inc_checked+0x30/0x50
      RSP: 0018:ffff8802ad36f600 EFLAGS: 00010082
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000002 RSI: 0000000000000008 RDI: ffffffff86710100
      RBP: ffff8802d6e60a30 R08: ffffed005d67bf8b R09: ffffed005d67bf8b
      R10: 0000000000000001 R11: ffffed005d67bf8a R12: ffff88027620cee8
      R13: ffff8802d6e60988 R14: ffff8802d6e60a78 R15: 0000000000000202
      FS: 0000000000000000(0000) GS:ffff8802eb200000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3ab35e5c88 CR3: 00000002ce84a000 CR4: 00000000000006e0
      IPv6: ADDRCONF(NETDEV_CHANGE): ib1: link becomes ready
      Call Trace:
      rdma_get_gid_attr+0x220/0x310 [ib_core]
      ? lock_acquire+0x145/0x3a0
      rdma_fill_sgid_attr+0x32c/0x470 [ib_core]
      rdma_create_ah+0x89/0x160 [ib_core]
      ? rdma_fill_sgid_attr+0x470/0x470 [ib_core]
      ? ipoib_create_ah+0x52/0x260 [ib_ipoib]
      ipoib_create_ah+0xf5/0x260 [ib_ipoib]
      ipoib_mcast_join_complete+0xbbe/0x2540 [ib_ipoib]
      
      Fixes: b150c386 ("IB/core: Introduce GID entry reference counts")
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      5c5702e2
    • Y
      IB/mlx5: Destroy the DEVX object upon error flow · e8ef090a
      Yishai Hadas 提交于
      Upon DEVX object creation the object must be destroyed upon a follows
      error flow.
      
      Fixes: 7efce369 ("IB/mlx5: Add obj create and destroy functionality")
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      e8ef090a
    • M
      IB/uverbs: Free uapi on destroy · a9360abd
      Mark Bloch 提交于
      Make sure we free struct uverbs_api once we clean the radix tree. It was
      allocated by uverbs_alloc_api().
      
      Fixes: 9ed3e5f4 ("IB/uverbs: Build the specs into a radix tree at runtime")
      Reported-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      a9360abd
  2. 24 9月, 2018 1 次提交
    • S
      RDMA/bnxt_re: Fix system crash during RDMA resource initialization · de5c95d0
      Selvin Xavier 提交于
      bnxt_re_ib_reg acquires and releases the rtnl lock whenever it accesses
      the L2 driver.
      
      The following sequence can trigger a crash
      
      Acquires the rtnl_lock ->
      	Registers roce driver callback with L2 driver ->
      		release the rtnl lock
      bnxt_re acquires the rtnl_lock ->
      	Request for MSIx vectors ->
      		release the rtnl_lock
      
      Issue happens when bnxt_re proceeds with remaining part of initialization
      and L2 driver invokes bnxt_ulp_irq_stop as a part of bnxt_open_nic.
      
      The crash is in bnxt_qplib_nq_stop_irq as the NQ structures are
      not initialized yet,
      
      <snip>
      [ 3551.726647] BUG: unable to handle kernel NULL pointer dereference at (null)
      [ 3551.726656] IP: [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re]
      [ 3551.726674] PGD 0
      [ 3551.726679] Oops: 0002 1 SMP
      ...
      [ 3551.726822] Hardware name: Dell Inc. PowerEdge R720/08RW36, BIOS 2.4.3 07/09/2014
      [ 3551.726826] task: ffff97e30eec5ee0 ti: ffff97e3173bc000 task.ti: ffff97e3173bc000
      [ 3551.726829] RIP: 0010:[<ffffffffc0840ee9>] [<ffffffffc0840ee9>]
      bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re]
      ...
      [ 3551.726872] Call Trace:
      [ 3551.726886] [<ffffffffc082cb9e>] bnxt_re_stop_irq+0x4e/0x70 [bnxt_re]
      [ 3551.726899] [<ffffffffc07d6a53>] bnxt_ulp_irq_stop+0x43/0x70 [bnxt_en]
      [ 3551.726908] [<ffffffffc07c82f4>] bnxt_reserve_rings+0x174/0x1e0 [bnxt_en]
      [ 3551.726917] [<ffffffffc07cafd8>] __bnxt_open_nic+0x368/0x9a0 [bnxt_en]
      [ 3551.726925] [<ffffffffc07cb62b>] bnxt_open_nic+0x1b/0x50 [bnxt_en]
      [ 3551.726934] [<ffffffffc07cc62f>] bnxt_setup_mq_tc+0x11f/0x260 [bnxt_en]
      [ 3551.726943] [<ffffffffc07d5f58>] bnxt_dcbnl_ieee_setets+0xb8/0x1f0 [bnxt_en]
      [ 3551.726954] [<ffffffff890f983a>] dcbnl_ieee_set+0x9a/0x250
      [ 3551.726966] [<ffffffff88fd6d21>] ? __alloc_skb+0xa1/0x2d0
      [ 3551.726972] [<ffffffff890f72fa>] dcb_doit+0x13a/0x210
      [ 3551.726981] [<ffffffff89003ff7>] rtnetlink_rcv_msg+0xa7/0x260
      [ 3551.726989] [<ffffffff88ffdb00>] ? rtnl_unicast+0x20/0x30
      [ 3551.726996] [<ffffffff88bf9dc8>] ? __kmalloc_node_track_caller+0x58/0x290
      [ 3551.727002] [<ffffffff890f7326>] ? dcb_doit+0x166/0x210
      [ 3551.727007] [<ffffffff88fd6d0d>] ? __alloc_skb+0x8d/0x2d0
      [ 3551.727012] [<ffffffff89003f50>] ? rtnl_newlink+0x880/0x880
      ...
      [ 3551.727104] [<ffffffff8911f7d5>] system_call_fastpath+0x1c/0x21
      ...
      [ 3551.727164] RIP [<ffffffffc0840ee9>] bnxt_qplib_nq_stop_irq+0x59/0xb0 [bnxt_re]
      [ 3551.727175] RSP <ffff97e3173bf788>
      [ 3551.727177] CR2: 0000000000000000
      
      Avoid this inconsistent state and  system crash by acquiring
      the rtnl lock for the entire duration of device initialization.
      Re-factor the code to remove the rtnl lock from the individual function
      and acquire and release it from the caller.
      
      Fixes: 1ac5a404 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
      Fixes: 6e04b103 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes")
      Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      de5c95d0
  3. 21 9月, 2018 5 次提交
    • M
      IB/hfi1: Fix destroy_qp hang after a link down · b4a4957d
      Michael J. Ruhl 提交于
      rvt_destroy_qp() cannot complete until all in process packets have
      been released from the underlying hardware.  If a link down event
      occurs, an application can hang with a kernel stack similar to:
      
      cat /proc/<app PID>/stack
       quiesce_qp+0x178/0x250 [hfi1]
       rvt_reset_qp+0x23d/0x400 [rdmavt]
       rvt_destroy_qp+0x69/0x210 [rdmavt]
       ib_destroy_qp+0xba/0x1c0 [ib_core]
       nvme_rdma_destroy_queue_ib+0x46/0x80 [nvme_rdma]
       nvme_rdma_free_queue+0x3c/0xd0 [nvme_rdma]
       nvme_rdma_destroy_io_queues+0x88/0xd0 [nvme_rdma]
       nvme_rdma_error_recovery_work+0x52/0xf0 [nvme_rdma]
       process_one_work+0x17a/0x440
       worker_thread+0x126/0x3c0
       kthread+0xcf/0xe0
       ret_from_fork+0x58/0x90
       0xffffffffffffffff
      
      quiesce_qp() waits until all outstanding packets have been freed.
      This wait should be momentary.  During a link down event, the cleanup
      handling does not ensure that all packets caught by the link down are
      flushed properly.
      
      This is caused by the fact that the freeze path and the link down
      event is handled the same.  This is not correct.  The freeze path
      waits until the HFI is unfrozen and then restarts PIO.  A link down
      is not a freeze event.  The link down path cannot restart the PIO
      until link is restored.  If the PIO path is restarted before the link
      comes up, the application (QP) using the PIO path will hang (until
      link is restored).
      
      Fix by separating the linkdown path from the freeze path and use the
      link down path for link down events.
      
      Close a race condition sc_disable() by acquiring both the progress
      and release locks.
      
      Close a race condition in sc_stop() by moving the setting of the flag
      bits under the alloc lock.
      
      Cc: <stable@vger.kernel.org> # 4.9.x+
      Fixes: 77241056 ("IB/hfi1: add driver files")
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      b4a4957d
    • M
      IB/hfi1: Fix context recovery when PBC has an UnsupportedVL · d623500b
      Michael J. Ruhl 提交于
      If a packet stream uses an UnsupportedVL (virtual lane), the send
      engine will not send the packet, and it will not indicate that an
      error has occurred.  This will cause the packet stream to block.
      
      HFI has 8 virtual lanes available for packet streams.  Each lane can
      be enabled or disabled using the UnsupportedVL mask.  If a lane is
      disabled, adding a packet to the send context must be disallowed.
      
      The current mask for determining unsupported VLs defaults to 0 (allow
      all).  This is incorrect.  Only the VLs that are defined should be
      allowed.
      
      Determine which VLs are disabled (mtu == 0), and set the appropriate
      unsupported bit in the mask.  The correct mask will allow the send
      engine to error on the invalid VL, and error recovery will work
      correctly.
      
      Cc: <stable@vger.kernel.org> # 4.9.x+
      Fixes: 77241056 ("IB/hfi1: add driver files")
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NLukasz Odzioba <lukasz.odzioba@intel.com>
      Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d623500b
    • M
      IB/hfi1: Invalid user input can result in crash · 94694d18
      Michael J. Ruhl 提交于
      If the number of packets in a user sdma request does not match
      the actual iovectors being sent, sdma_cleanup can be called on
      an uninitialized request structure, resulting in a crash similar
      to this:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      IP: [<ffffffffc0ae8bb7>] __sdma_txclean+0x57/0x1e0 [hfi1]
      PGD 8000001044f61067 PUD 1052706067 PMD 0
      Oops: 0000 [#1] SMP
      CPU: 30 PID: 69912 Comm: upsm Kdump: loaded Tainted: G           OE
      ------------   3.10.0-862.el7.x86_64 #1
      Hardware name: Intel Corporation S2600KPR/S2600KPR, BIOS
      SE5C610.86B.01.01.0019.101220160604 10/12/2016
      task: ffff8b331c890000 ti: ffff8b2ed1f98000 task.ti: ffff8b2ed1f98000
      RIP: 0010:[<ffffffffc0ae8bb7>]  [<ffffffffc0ae8bb7>] __sdma_txclean+0x57/0x1e0
      [hfi1]
      RSP: 0018:ffff8b2ed1f9bab0  EFLAGS: 00010286
      RAX: 0000000000008b2b RBX: ffff8b2adf6e0000 RCX: 0000000000000000
      RDX: 00000000000000a0 RSI: ffff8b2e9eedc540 RDI: ffff8b2adf6e0000
      RBP: ffff8b2ed1f9bad8 R08: 0000000000000000 R09: ffffffffc0b04a06
      R10: ffff8b331c890190 R11: ffffe6ed00bf1840 R12: ffff8b3315480000
      R13: ffff8b33154800f0 R14: 00000000fffffff2 R15: ffff8b2e9eedc540
      FS:  00007f035ac47740(0000) GS:ffff8b331e100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000008 CR3: 0000000c03fe6000 CR4: 00000000001607e0
      Call Trace:
       [<ffffffffc0b0570d>] user_sdma_send_pkts+0xdcd/0x1990 [hfi1]
       [<ffffffff9fe75fb0>] ? gup_pud_range+0x140/0x290
       [<ffffffffc0ad3105>] ? hfi1_mmu_rb_insert+0x155/0x1b0 [hfi1]
       [<ffffffffc0b0777b>] hfi1_user_sdma_process_request+0xc5b/0x11b0 [hfi1]
       [<ffffffffc0ac193a>] hfi1_aio_write+0xba/0x110 [hfi1]
       [<ffffffffa001a2bb>] do_sync_readv_writev+0x7b/0xd0
       [<ffffffffa001bede>] do_readv_writev+0xce/0x260
       [<ffffffffa022b089>] ? tty_ldisc_deref+0x19/0x20
       [<ffffffffa02268c0>] ? n_tty_ioctl+0xe0/0xe0
       [<ffffffffa001c105>] vfs_writev+0x35/0x60
       [<ffffffffa001c2bf>] SyS_writev+0x7f/0x110
       [<ffffffffa051f7d5>] system_call_fastpath+0x1c/0x21
      Code: 06 49 c7 47 18 00 00 00 00 0f 87 89 01 00 00 5b 41 5c 41 5d 41 5e 41 5f
      5d c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 4e 10 48 89 fb <48> 8b 51 08 49 89 d4
      83 e2 0c 41 81 e4 00 e0 00 00 48 c1 ea 02
      RIP  [<ffffffffc0ae8bb7>] __sdma_txclean+0x57/0x1e0 [hfi1]
       RSP <ffff8b2ed1f9bab0>
      CR2: 0000000000000008
      
      There are two exit points from user_sdma_send_pkts().  One (free_tx)
      merely frees the slab entry and one (free_txreq) cleans the sdma_txreq
      prior to freeing the slab entry.   The free_txreq variation can only be
      called after one of the sdma_init*() variations has been called.
      
      In the panic case, the slab entry had been allocated but not inited.
      
      Fix the issue by exiting through free_tx thus avoiding sdma_clean().
      
      Cc: <stable@vger.kernel.org> # 4.9.x+
      Fixes: 77241056 ("IB/hfi1: add driver files")
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NLukasz Odzioba <lukasz.odzioba@intel.com>
      Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      94694d18
    • I
      IB/hfi1: Fix SL array bounds check · 0dbfaa9f
      Ira Weiny 提交于
      The SL specified by a user needs to be a valid SL.
      
      Add a range check to the user specified SL value which protects from
      running off the end of the SL to SC table.
      
      CC: stable@vger.kernel.org
      Fixes: 77241056 ("IB/hfi1: add driver files")
      Signed-off-by: NIra Weiny <ira.weiny@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      0dbfaa9f
    • M
      RDMA/uverbs: Fix validity check for modify QP · 4eeed368
      Majd Dibbiny 提交于
      Uverbs shouldn't enforce QP state in the command unless the user set the QP
      state bit in the attribute mask.
      
      In addition, only copy qp attr fields which have the corresponding bit set
      in the attribute mask over to the internal attr structure.
      
      Fixes: 88de869b ("RDMA/uverbs: Ensure validity of current QP state value")
      Fixes: bc38a6ab ("[PATCH] IB uverbs: core implementation")
      Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      4eeed368
  4. 20 9月, 2018 1 次提交
  5. 14 9月, 2018 1 次提交
    • C
      ucma: fix a use-after-free in ucma_resolve_ip() · 5fe23f26
      Cong Wang 提交于
      There is a race condition between ucma_close() and ucma_resolve_ip():
      
      CPU0				CPU1
      ucma_resolve_ip():		ucma_close():
      
      ctx = ucma_get_ctx(file, cmd.id);
      
              list_for_each_entry_safe(ctx, tmp, &file->ctx_list, list) {
                      mutex_lock(&mut);
                      idr_remove(&ctx_idr, ctx->id);
                      mutex_unlock(&mut);
      		...
                      mutex_lock(&mut);
                      if (!ctx->closing) {
                              mutex_unlock(&mut);
                              rdma_destroy_id(ctx->cm_id);
      		...
                      ucma_free_ctx(ctx);
      
      ret = rdma_resolve_addr();
      ucma_put_ctx(ctx);
      
      Before idr_remove(), ucma_get_ctx() could still find the ctx
      and after rdma_destroy_id(), rdma_resolve_addr() may still
      access id_priv pointer. Also, ucma_put_ctx() may use ctx after
      ucma_free_ctx() too.
      
      ucma_close() should call ucma_put_ctx() too which tests the
      refcnt and waits for the last one releasing it. The similar
      pattern is already used by ucma_destroy_id().
      
      Reported-and-tested-by: syzbot+da2591e115d57a9cbb8b@syzkaller.appspotmail.com
      Reported-by: syzbot+cfe3c1e8ef634ba8964b@syzkaller.appspotmail.com
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Leon Romanovsky <leon@kernel.org>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      5fe23f26
  6. 13 9月, 2018 1 次提交
    • S
      RDMA/uverbs: Atomically flush and mark closed the comp event queue · 67e38168
      Steve Wise 提交于
      Currently a uverbs completion event queue is flushed of events in
      ib_uverbs_comp_event_close() with the queue spinlock held and then
      released.  Yet setting ev_queue->is_closed is not set until later in
      uverbs_hot_unplug_completion_event_file().
      
      In between the time ib_uverbs_comp_event_close() releases the lock and
      uverbs_hot_unplug_completion_event_file() acquires the lock, a completion
      event can arrive and be inserted into the event queue by
      ib_uverbs_comp_handler().
      
      This can cause a "double add" list_add warning or crash depending on the
      kernel configuration, or a memory leak because the event is never dequeued
      since the queue is already closed down.
      
      So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close().
      
      Cc: stable@vger.kernel.org
      Fixes: 1e7710f3 ("IB/core: Change completion channel to use the reworked objects schema")
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      67e38168
  7. 11 9月, 2018 1 次提交
  8. 07 9月, 2018 2 次提交
  9. 06 9月, 2018 3 次提交
    • P
      RDMA/uverbs: Fix error cleanup path of ib_uverbs_add_one() · 08e74be1
      Parav Pandit 提交于
      If ib_uverbs_create_uapi() fails, dev_num should be freed from the bitmap.
      
      Fixes: 7d96c9b1 ("IB/uverbs: Have the core code create the uverbs_root_spec")
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      08e74be1
    • S
      bnxt_re: Fix couple of memory leaks that could lead to IOMMU call traces · f40f299b
      Somnath Kotur 提交于
      1. DMA-able memory allocated for Shadow QP was not being freed.
      2. bnxt_qplib_alloc_qp_hdr_buf() had a bug wherein the SQ pointer was
         erroneously pointing to the RQ. But since the corresponding
         free_qp_hdr_buf() was correct, memory being free was less than what was
         allocated.
      
      Fixes: 1ac5a404 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
      Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      f40f299b
    • A
      IB/ipoib: Avoid a race condition between start_xmit and cm_rep_handler · 816e846c
      Aaron Knister 提交于
      Inside of start_xmit() the call to check if the connection is up and the
      queueing of the packets for later transmission is not atomic which leaves
      a window where cm_rep_handler can run, set the connection up, dequeue
      pending packets and leave the subsequently queued packets by start_xmit()
      sitting on neigh->queue until they're dropped when the connection is torn
      down. This only applies to connected mode. These dropped packets can
      really upset TCP, for example, and cause multi-minute delays in
      transmission for open connections.
      
      Here's the code in start_xmit where we check to see if the connection is
      up:
      
             if (ipoib_cm_get(neigh)) {
                     if (ipoib_cm_up(neigh)) {
                             ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
                             goto unref;
                     }
             }
      
      The race occurs if cm_rep_handler execution occurs after the above
      connection check (specifically if it gets to the point where it acquires
      priv->lock to dequeue pending skb's) but before the below code snippet in
      start_xmit where packets are queued.
      
             if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
                     push_pseudo_header(skb, phdr->hwaddr);
                     spin_lock_irqsave(&priv->lock, flags);
                     __skb_queue_tail(&neigh->queue, skb);
                     spin_unlock_irqrestore(&priv->lock, flags);
             } else {
                     ++dev->stats.tx_dropped;
                     dev_kfree_skb_any(skb);
             }
      
      The patch acquires the netif tx lock in cm_rep_handler for the section
      where it sets the connection up and dequeues and retransmits deferred
      skb's.
      
      Fixes: 839fcaba ("IPoIB: Connected mode experimental support")
      Cc: stable@vger.kernel.org
      Signed-off-by: NAaron Knister <aaron.s.knister@nasa.gov>
      Tested-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      816e846c
  10. 05 9月, 2018 3 次提交
    • S
      iw_cxgb4: only allow 1 flush on user qps · 308aa2b8
      Steve Wise 提交于
      Once the qp has been flushed, it cannot be flushed again.  The user qp
      flush logic wasn't enforcing it however.  The bug can cause
      touch-after-free crashes like:
      
      Unable to handle kernel paging request for data at address 0x000001ec
      Faulting instruction address: 0xc008000016069100
      Oops: Kernel access of bad area, sig: 11 [#1]
      ...
      NIP [c008000016069100] flush_qp+0x80/0x480 [iw_cxgb4]
      LR [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
      Call Trace:
      [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4]
      [c00800001606e868] c4iw_ib_modify_qp+0x118/0x200 [iw_cxgb4]
      [c0080000119eae80] ib_security_modify_qp+0xd0/0x3d0 [ib_core]
      [c0080000119c4e24] ib_modify_qp+0xc4/0x2c0 [ib_core]
      [c008000011df0284] iwcm_modify_qp_err+0x44/0x70 [iw_cm]
      [c008000011df0fec] destroy_cm_id+0xcc/0x370 [iw_cm]
      [c008000011ed4358] rdma_destroy_id+0x3c8/0x520 [rdma_cm]
      [c0080000134b0540] ucma_close+0x90/0x1b0 [rdma_ucm]
      [c000000000444da4] __fput+0xe4/0x2f0
      
      So fix flush_qp() to only flush the wq once.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      308aa2b8
    • A
      IB/core: Release object lock if destroy failed · e4ff3d22
      Artemy Kovalyov 提交于
      The object lock was supposed to always be released during destroy, but
      when the destruction retry series was integrated with the destroy series
      it created a failure path that missed the unlock.
      
      Keep with convention, if destroy fails the caller must undo all locking.
      
      Fixes: 87ad80ab ("IB/uverbs: Consolidate uobject destruction")
      Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      e4ff3d22
    • J
      RDMA/ucma: check fd type in ucma_migrate_id() · 0d23ba60
      Jann Horn 提交于
      The current code grabs the private_data of whatever file descriptor
      userspace has supplied and implicitly casts it to a `struct ucma_file *`,
      potentially causing a type confusion.
      
      This is probably fine in practice because the pointer is only used for
      comparisons, it is never actually dereferenced; and even in the
      comparisons, it is unlikely that a file from another filesystem would have
      a ->private_data pointer that happens to also be valid in this context.
      But ->private_data is not always guaranteed to be a valid pointer to an
      object owned by the file's filesystem; for example, some filesystems just
      cram numbers in there.
      
      Check the type of the supplied file descriptor to be safe, analogous to how
      other places in the kernel do it.
      
      Fixes: 88314e4d ("RDMA/cma: add support for rdma_migrate_id()")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      0d23ba60
  11. 31 8月, 2018 8 次提交
    • A
      clk: x86: Set default parent to 48Mhz · bded6c03
      Akshu Agrawal 提交于
      System clk provided in ST soc can be set to:
      48Mhz, non-spread
      25Mhz, spread
      To get accurate rate, we need it to set it at non-spread
      option which is 48Mhz.
      Signed-off-by: NAkshu Agrawal <akshu.agrawal@amd.com>
      Reviewed-by: NDaniel Kurtz <djkurtz@chromium.org>
      Fixes: 421bf6a1 ("clk: x86: Add ST oscout platform clock")
      Signed-off-by: NStephen Boyd <sboyd@kernel.org>
      bded6c03
    • W
      i2c: sh_mobile: fix leak when using DMA bounce buffer · cebc07d8
      Wolfram Sang 提交于
      We only freed the bounce buffer after successful DMA, missing the cases
      where DMA setup may have gone wrong. Use a better location which always
      gets called after each message and use 'stop_after_dma' as a flag for a
      successful transfer.
      Signed-off-by: NWolfram Sang <wsa+renesas@sang-engineering.com>
      Reviewed-by: NNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      cebc07d8
    • W
      i2c: sh_mobile: define start_ch() void as it only returns 0 anyhow · 531db501
      Wolfram Sang 提交于
      After various refactoring over the years, start_ch() doesn't return
      errno anymore, so make the function return void. This saves the error
      handling when calling it which in turn eases cleanup of resources of a
      future patch.
      Signed-off-by: NWolfram Sang <wsa+renesas@sang-engineering.com>
      Reviewed-by: NNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      531db501
    • W
      i2c: refactor function to release a DMA safe buffer · 82fe39a6
      Wolfram Sang 提交于
      a) rename to 'put' instead of 'release' to match 'get' when obtaining
         the buffer
      b) change the argument order to have the buffer as first argument
      c) add a new argument telling the function if the message was
         transferred. This allows the function to be used also in cases
         where setting up DMA failed, so the buffer needs to be freed without
         syncing to the message buffer.
      
      Also convert the only user.
      Signed-off-by: NWolfram Sang <wsa+renesas@sang-engineering.com>
      Reviewed-by: NNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      82fe39a6
    • J
      i2c: algos: bit: make the error messages grepable · 1204d12a
      Jan Kundrát 提交于
      Yep, I went looking for one of these, and I wasn't able to find it
      easily.  That's worse than a line which is 82-chars long, IMHO.
      Signed-off-by: NJan Kundrát <jan.kundrat@cesnet.cz>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      1204d12a
    • H
      i2c: designware: Re-init controllers with pm_disabled set on resume · 9d9a152e
      Hans de Goede 提交于
      On Bay Trail and Cherry Trail devices we set the pm_disabled flag for I2C
      busses which the OS shares with the PUNIT as these need special handling.
      Until now we called dev_pm_syscore_device(dev, true) for I2C controllers
      with this flag set to keep these I2C controllers always on.
      
      After commit 12864ff8 ("ACPI / LPSS: Avoid PM quirks on suspend and
      resume from hibernation"), this no longer works. This commit modifies
      lpss_iosf_exit_d3_state() to only run if lpss_iosf_enter_d3_state() has ran
      before it, so that it does not run on a resume from hibernate (or from S3).
      
      On these systems the conditions for lpss_iosf_enter_d3_state() to run
      never become true, so lpss_iosf_exit_d3_state() never gets called and
      the 2 LPSS DMA controllers never get forced into D0 mode, instead they
      are left in their default automatic power-on when needed mode.
      
      The not forcing of D0 mode for the DMA controllers enables these systems
      to properly enter S0ix modes, which is a good thing.
      
      But after entering S0ix modes the I2C controller connected to the PMIC
      no longer works, leading to e.g. broken battery monitoring.
      
      The _PS3 method for this I2C controller looks like this:
      
                  Method (_PS3, 0, NotSerialized)  // _PS3: Power State 3
                  {
                      If ((((PMID == 0x04) || (PMID == 0x05)) || (PMID == 0x06)))
                      {
                          Return (Zero)
                      }
      
                      PSAT |= 0x03
                      Local0 = PSAT /* \_SB_.I2C5.PSAT */
                  }
      
      Where PMID = 0x05, so we enter the Return (Zero) path on these systems.
      
      So even if we were to not call dev_pm_syscore_device(dev, true) the
      I2C controller will be left in D0 rather then be switched to D3.
      
      Yet on other Bay and Cherry Trail devices S0ix is not entered unless *all*
      I2C controllers are in D3 mode. This combined with the I2C controller no
      longer working now that we reach S0ix states on these systems leads to me
      believing that the PUNIT itself puts the I2C controller in D3 when all
      other conditions for entering S0ix states are true.
      
      Since now the I2C controller is put in D3 over a suspend/resume we must
      re-initialize it afterwards and that does indeed fix it no longer working.
      
      This commit implements this fix by:
      
      1) Making the suspend_late callback a no-op if pm_disabled is set and
      making the resume_early callback skip the clock re-enable (since it now was
      not disabled) while still doing the necessary I2C controller re-init.
      
      2) Removing the dev_pm_syscore_device(dev, true) call, so that the suspend
      and resume callbacks are actually called. Normally this would cause the
      ACPI pm code to call _PS3 putting the I2C controller in D3, wreaking havoc
      since it is shared with the PUNIT, but in this special case the _PS3 method
      is a no-op so we can safely allow a "fake" suspend / resume.
      
      Fixes: 12864ff8 ("ACPI / LPSS: Avoid PM quirks on suspend and resume ...")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=200861
      Cc: 4.15+ <stable@vger.kernel.org> # 4.15+
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Acked-by: NJarkko Nikula <jarkko.nikula@linux.intel.com>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      9d9a152e
    • M
      i2c: i801: Allow ACPI AML access I/O ports not reserved for SMBus · 7fd6d98b
      Mika Westerberg 提交于
      Commit 7ae81952cda ("i2c: i801: Allow ACPI SystemIO OpRegion to conflict
      with PCI BAR") made it possible for AML code to access SMBus I/O ports
      by installing custom SystemIO OpRegion handler and blocking i80i driver
      access upon first AML read/write to this OpRegion.
      
      However, while ThinkPad T560 does have SystemIO OpRegion declared under
      the SMBus device, it does not access any of the SMBus registers:
      
          Device (SMBU)
          {
              ...
      
              OperationRegion (SMBP, PCI_Config, 0x50, 0x04)
              Field (SMBP, DWordAcc, NoLock, Preserve)
              {
                  ,   5,
                  TCOB,   11,
                  Offset (0x04)
              }
      
              Name (TCBV, 0x00)
              Method (TCBS, 0, NotSerialized)
              {
                  If ((TCBV == 0x00))
                  {
                  TCBV = (\_SB.PCI0.SMBU.TCOB << 0x05)
                  }
      
                  Return (TCBV) /* \_SB_.PCI0.SMBU.TCBV */
              }
      
              OperationRegion (TCBA, SystemIO, TCBS (), 0x10)
              Field (TCBA, ByteAcc, NoLock, Preserve)
              {
                  Offset (0x04),
                  ,   9,
                  CPSC,   1
              }
          }
      
      Problem with the current approach is that it blocks all I/O port access
      and because this system has touchpad connected to the SMBus controller
      after first AML access (happens during suspend/resume cycle) the
      touchpad fails to work anymore.
      
      Fix this so that we allow ACPI AML I/O port access if it does not touch
      the region reserved for the SMBus.
      
      Fixes: 7ae81952cda ("i2c: i801: Allow ACPI SystemIO OpRegion to conflict with PCI BAR")
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=200737Reported-by: NYussuf Khalil <dev@pp3345.net>
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: NJean Delvare <jdelvare@suse.de>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      7fd6d98b
    • R
      of: add node name compare helper functions · f42b0e18
      Rob Herring 提交于
      In preparation to remove device_node.name pointer, add helper functions
      for node name comparisons which are a common pattern throughout the kernel.
      
      Cc: Frank Rowand <frowand.list@gmail.com>
      Signed-off-by: NRob Herring <robh@kernel.org>
      f42b0e18
  12. 30 8月, 2018 3 次提交
  13. 29 8月, 2018 7 次提交
  14. 28 8月, 2018 1 次提交