1. 11 6月, 2019 1 次提交
    • Z
      net: rds: fix memory leak in rds_ib_flush_mr_pool · 7700d5af
      Zhu Yanjun 提交于
      [ Upstream commit 85cb928787eab6a2f4ca9d2a798b6f3bed53ced1 ]
      
      When the following tests last for several hours, the problem will occur.
      
      Server:
          rds-stress -r 1.1.1.16 -D 1M
      Client:
          rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30
      
      The following will occur.
      
      "
      Starting up....
      tsks   tx/s   rx/s  tx+rx K/s    mbi K/s    mbo K/s tx us/c   rtt us cpu
      %
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
      "
      >From vmcore, we can find that clean_list is NULL.
      
      >From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker.
      Then rds_ib_mr_pool_flush_worker calls
      "
       rds_ib_flush_mr_pool(pool, 0, NULL);
      "
      Then in function
      "
      int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
                               int free_all, struct rds_ib_mr **ibmr_ret)
      "
      ibmr_ret is NULL.
      
      In the source code,
      "
      ...
      list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
      if (ibmr_ret)
              *ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);
      
      /* more than one entry in llist nodes */
      if (clean_nodes->next)
              llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
      ...
      "
      When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next
      instead of clean_nodes is added in clean_list.
      So clean_nodes is discarded. It can not be used again.
      The workqueue is executed periodically. So more and more clean_nodes are
      discarded. Finally the clean_list is NULL.
      Then this problem will occur.
      
      Fixes: 1bc144b6 ("net, rds, Replace xlist in net/rds/xlist.h with llist")
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7700d5af
  2. 02 5月, 2019 1 次提交
    • Z
      net: rds: exchange of 8K and 1M pool · ed1866aa
      Zhu Yanjun 提交于
      [ Upstream commit 4b9fc7146249a6e0e3175d0acc033fdcd2bfcb17 ]
      
      Before the commit 490ea596 ("RDS: IB: move FMR code to its own file"),
      when the dirty_count is greater than 9/10 of max_items of 8K pool,
      1M pool is used, Vice versa. After the commit 490ea596 ("RDS: IB: move
      FMR code to its own file"), the above is removed. When we make the
      following tests.
      
      Server:
        rds-stress -r 1.1.1.16 -D 1M
      
      Client:
        rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M
      
      The following will appear.
      "
      connecting to 1.1.1.16:4000
      negotiated options, tasks will start in 2 seconds
      Starting up..header from 1.1.1.166:4001 to id 4001 bogus
      ..
      tsks  tx/s  rx/s tx+rx K/s  mbi K/s  mbo K/s tx us/c  rtt us
      cpu %
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
         1    0    0     0.00     0.00     0.00    0.00 0.00 -1.00
      ...
      "
      So this exchange between 8K and 1M pool is added back.
      
      Fixes: commit 490ea596 ("RDS: IB: move FMR code to its own file")
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed1866aa
  3. 02 8月, 2018 1 次提交
  4. 27 7月, 2018 1 次提交
    • A
      RDS: RDMA: Fix the NULL-ptr deref in rds_ib_get_mr · 9e630bcb
      Avinash Repaka 提交于
      Registration of a memory region(MR) through FRMR/fastreg(unlike FMR)
      needs a connection/qp. With a proxy qp, this dependency on connection
      will be removed, but that needs more infrastructure patches, which is a
      work in progress.
      
      As an intermediate fix, the get_mr returns EOPNOTSUPP when connection
      details are not populated. The MR registration through sendmsg() will
      continue to work even with fast registration, since connection in this
      case is formed upfront.
      
      This patch fixes the following crash:
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 4244 Comm: syzkaller468044 Not tainted 4.16.0-rc6+ #361
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:rds_ib_get_mr+0x5c/0x230 net/rds/ib_rdma.c:544
      RSP: 0018:ffff8801b059f890 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8801b07e1300 RCX: ffffffff8562d96e
      RDX: 000000000000000d RSI: 0000000000000001 RDI: 0000000000000068
      RBP: ffff8801b059f8b8 R08: ffffed0036274244 R09: ffff8801b13a1200
      R10: 0000000000000004 R11: ffffed0036274243 R12: ffff8801b13a1200
      R13: 0000000000000001 R14: ffff8801ca09fa9c R15: 0000000000000000
      FS:  00007f4d050af700(0000) GS:ffff8801db300000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f4d050aee78 CR3: 00000001b0d9b006 CR4: 00000000001606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __rds_rdma_map+0x710/0x1050 net/rds/rdma.c:271
       rds_get_mr_for_dest+0x1d4/0x2c0 net/rds/rdma.c:357
       rds_setsockopt+0x6cc/0x980 net/rds/af_rds.c:347
       SYSC_setsockopt net/socket.c:1849 [inline]
       SyS_setsockopt+0x189/0x360 net/socket.c:1828
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x4456d9
      RSP: 002b:00007f4d050aedb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 00000000006dac3c RCX: 00000000004456d9
      RDX: 0000000000000007 RSI: 0000000000000114 RDI: 0000000000000004
      RBP: 00000000006dac38 R08: 00000000000000a0 R09: 0000000000000000
      R10: 0000000020000380 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007fffbfb36d6f R14: 00007f4d050af9c0 R15: 0000000000000005
      Code: fa 48 c1 ea 03 80 3c 02 00 0f 85 cc 01 00 00 4c 8b bb 80 04 00 00
      48
      b8 00 00 00 00 00 fc ff df 49 8d 7f 68 48 89 fa 48 c1 ea 03 <80> 3c 02
      00 0f
      85 9c 01 00 00 4d 8b 7f 68 48 b8 00 00 00 00 00
      RIP: rds_ib_get_mr+0x5c/0x230 net/rds/ib_rdma.c:544 RSP:
      ffff8801b059f890
      ---[ end trace 7e1cea13b85473b0 ]---
      
      Reported-by: syzbot+b51c77ef956678a65834@syzkaller.appspotmail.com
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NAvinash Repaka <avinash.repaka@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e630bcb
  5. 24 7月, 2018 2 次提交
  6. 06 10月, 2017 1 次提交
  7. 05 7月, 2017 1 次提交
  8. 15 6月, 2016 1 次提交
  9. 11 6月, 2016 1 次提交
  10. 03 3月, 2016 5 次提交
  11. 06 10月, 2015 4 次提交
  12. 01 10月, 2015 1 次提交
  13. 26 8月, 2015 6 次提交
  14. 15 7月, 2015 1 次提交
    • W
      rds: rds_ib_device.refcount overflow · 4fabb594
      Wengang Wang 提交于
      Fixes: 3e0249f9 ("RDS/IB: add refcount tracking to struct rds_ib_device")
      
      There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
      failed(mr pool running out). this lead to the refcount overflow.
      
      A complain in line 117(see following) is seen. From vmcore:
      s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
      That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
      to return ERR_PTR(-EAGAIN).
      
      115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
      116 {
      117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
      118         if (atomic_dec_and_test(&rds_ibdev->refcount))
      119                 queue_work(rds_wq, &rds_ibdev->free_work);
      120 }
      
      fix is to drop refcount when rds_ib_alloc_fmr failed.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      4fabb594
  15. 27 8月, 2014 1 次提交
  16. 16 9月, 2011 1 次提交
  17. 01 2月, 2011 1 次提交
    • T
      rds/ib: use system_wq instead of rds_ib_fmr_wq · c534a107
      Tejun Heo 提交于
      With cmwq, there's no reason to use dedicated rds_ib_fmr_wq - it's not
      in the memory reclaim path and the maximum number of concurrent work
      items is bound by the number of devices.  Drop it and use system_wq
      instead.  This rds_ib_fmr_init/exit() noops.  Both removed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Andy Grover <andy.grover@oracle.com>
      c534a107
  18. 21 10月, 2010 1 次提交
  19. 20 9月, 2010 1 次提交
  20. 09 9月, 2010 8 次提交
    • Z
      RDS/IB: protect the list of IB devices · ea819867
      Zach Brown 提交于
      The RDS IB device list wasn't protected by any locking.  Traversal in
      both the get_mr and FMR flushing paths could race with additon and
      removal.
      
      List manipulation is done with RCU primatives and is protected by the
      write side of a rwsem.  The list traversal in the get_mr fast path is
      protected by a rcu read critical section.  The FMR list traversal is
      more problematic because it can block while traversing the list.  We
      protect this with the read side of the rwsem.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      ea819867
    • C
      RDS: flush fmrs before allocating new ones · 8576f374
      Chris Mason 提交于
      Flushing FMRs is somewhat expensive, and is currently kicked off when
      the interrupt handler notices that we are getting low.  The result of
      this is that FMR flushing only happens from the interrupt cpus.
      
      This spreads the load more effectively by triggering flushes just before
      we allocate a new FMR.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      8576f374
    • Z
      RDS: remove __init and __exit annotation · ef87b7ea
      Zach Brown 提交于
      The trivial amount of memory saved isn't worth the cost of dealing with section
      mismatches.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      ef87b7ea
    • Z
      RDS/IB: create a work queue for FMR flushing · 515e079d
      Zach Brown 提交于
      This patch moves the FMR flushing work in to its own mult-threaded work queue.
      This is to maintain performance in preparation for returning the main krdsd
      work queue back to a single threaded work queue to avoid deep-rooted
      concurrency bugs.
      
      This is also good because it further separates FMRs, which might be removed
      some day, from the rest of the code base.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      515e079d
    • Z
      RDS/IB: destroy connections on rmmod · 8aeb1ba6
      Zach Brown 提交于
      IB connections were not being destroyed during rmmod.
      
      First, recently IB device removal callback was changed to disconnect
      connections that used the removing device rather than destroying them.  So
      connections with devices during rmmod were not being destroyed.
      
      Second, rds_ib_destroy_nodev_conns() was being called before connections are
      disassociated with devices.  It would almost never find connections in the
      nodev list.
      
      We first get rid of rds_ib_destroy_conns(), which is no longer called, and
      refactor the existing caller into the main body of the function and get rid of
      the list and lock wrappers.
      
      Then we call rds_ib_destroy_nodev_conns() *after* ib_unregister_client() has
      removed the IB device from all the conns and put the conns on the nodev list.
      
      The result is that IB connections are destroyed by rmmod.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      8aeb1ba6
    • A
      RDS: whitespace · c9455d99
      Andy Grover 提交于
      c9455d99
    • C
      RDS: use delayed work for the FMR flushes · 7a0ff5db
      Chris Mason 提交于
      Using a delayed work queue helps us make sure a healthy number of FMRs
      have queued up over the limit.  It makes for a large improvement in RDMA
      iops.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7a0ff5db
    • C
      rds: recycle FMRs through lockless lists · 6fa70da6
      Chris Mason 提交于
      FRM allocation and recycling is performance critical and fairly lock
      intensive.  The current code has a per connection lock that all
      processes bang on and it becomes a major bottleneck on large systems.
      
      This changes things to use a number of cmpxchg based lists instead,
      allowing us to go through the whole FMR lifecycle without locking inside
      RDS.
      
      Zach Brown pointed out that our usage of cmpxchg for xlist removal is
      racey if someone manages to remove and add back an FMR struct into the list
      while another CPU can see the FMR's address at the head of the list.
      
      The second CPU might assume the list hasn't changed when in fact any
      number of operations might have happened in between the deletion and
      reinsertion.
      
      This commit maintains a per cpu count of CPUs that are currently
      in xlist removal, and establishes a grace period to make sure that
      nobody can see an entry we have just removed from the list.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6fa70da6