1. 06 10月, 2017 1 次提交
  2. 05 7月, 2017 1 次提交
  3. 15 6月, 2016 1 次提交
  4. 11 6月, 2016 1 次提交
  5. 03 3月, 2016 5 次提交
  6. 06 10月, 2015 4 次提交
  7. 01 10月, 2015 1 次提交
  8. 26 8月, 2015 6 次提交
  9. 15 7月, 2015 1 次提交
    • W
      rds: rds_ib_device.refcount overflow · 4fabb594
      Wengang Wang 提交于
      Fixes: 3e0249f9 ("RDS/IB: add refcount tracking to struct rds_ib_device")
      
      There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
      failed(mr pool running out). this lead to the refcount overflow.
      
      A complain in line 117(see following) is seen. From vmcore:
      s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
      That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
      to return ERR_PTR(-EAGAIN).
      
      115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
      116 {
      117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
      118         if (atomic_dec_and_test(&rds_ibdev->refcount))
      119                 queue_work(rds_wq, &rds_ibdev->free_work);
      120 }
      
      fix is to drop refcount when rds_ib_alloc_fmr failed.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      4fabb594
  10. 27 8月, 2014 1 次提交
  11. 16 9月, 2011 1 次提交
  12. 01 2月, 2011 1 次提交
    • T
      rds/ib: use system_wq instead of rds_ib_fmr_wq · c534a107
      Tejun Heo 提交于
      With cmwq, there's no reason to use dedicated rds_ib_fmr_wq - it's not
      in the memory reclaim path and the maximum number of concurrent work
      items is bound by the number of devices.  Drop it and use system_wq
      instead.  This rds_ib_fmr_init/exit() noops.  Both removed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Andy Grover <andy.grover@oracle.com>
      c534a107
  13. 21 10月, 2010 1 次提交
  14. 20 9月, 2010 1 次提交
  15. 09 9月, 2010 14 次提交
    • Z
      RDS/IB: protect the list of IB devices · ea819867
      Zach Brown 提交于
      The RDS IB device list wasn't protected by any locking.  Traversal in
      both the get_mr and FMR flushing paths could race with additon and
      removal.
      
      List manipulation is done with RCU primatives and is protected by the
      write side of a rwsem.  The list traversal in the get_mr fast path is
      protected by a rcu read critical section.  The FMR list traversal is
      more problematic because it can block while traversing the list.  We
      protect this with the read side of the rwsem.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      ea819867
    • C
      RDS: flush fmrs before allocating new ones · 8576f374
      Chris Mason 提交于
      Flushing FMRs is somewhat expensive, and is currently kicked off when
      the interrupt handler notices that we are getting low.  The result of
      this is that FMR flushing only happens from the interrupt cpus.
      
      This spreads the load more effectively by triggering flushes just before
      we allocate a new FMR.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      8576f374
    • Z
      RDS: remove __init and __exit annotation · ef87b7ea
      Zach Brown 提交于
      The trivial amount of memory saved isn't worth the cost of dealing with section
      mismatches.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      ef87b7ea
    • Z
      RDS/IB: create a work queue for FMR flushing · 515e079d
      Zach Brown 提交于
      This patch moves the FMR flushing work in to its own mult-threaded work queue.
      This is to maintain performance in preparation for returning the main krdsd
      work queue back to a single threaded work queue to avoid deep-rooted
      concurrency bugs.
      
      This is also good because it further separates FMRs, which might be removed
      some day, from the rest of the code base.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      515e079d
    • Z
      RDS/IB: destroy connections on rmmod · 8aeb1ba6
      Zach Brown 提交于
      IB connections were not being destroyed during rmmod.
      
      First, recently IB device removal callback was changed to disconnect
      connections that used the removing device rather than destroying them.  So
      connections with devices during rmmod were not being destroyed.
      
      Second, rds_ib_destroy_nodev_conns() was being called before connections are
      disassociated with devices.  It would almost never find connections in the
      nodev list.
      
      We first get rid of rds_ib_destroy_conns(), which is no longer called, and
      refactor the existing caller into the main body of the function and get rid of
      the list and lock wrappers.
      
      Then we call rds_ib_destroy_nodev_conns() *after* ib_unregister_client() has
      removed the IB device from all the conns and put the conns on the nodev list.
      
      The result is that IB connections are destroyed by rmmod.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      8aeb1ba6
    • A
      RDS: whitespace · c9455d99
      Andy Grover 提交于
      c9455d99
    • C
      RDS: use delayed work for the FMR flushes · 7a0ff5db
      Chris Mason 提交于
      Using a delayed work queue helps us make sure a healthy number of FMRs
      have queued up over the limit.  It makes for a large improvement in RDMA
      iops.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7a0ff5db
    • C
      rds: recycle FMRs through lockless lists · 6fa70da6
      Chris Mason 提交于
      FRM allocation and recycling is performance critical and fairly lock
      intensive.  The current code has a per connection lock that all
      processes bang on and it becomes a major bottleneck on large systems.
      
      This changes things to use a number of cmpxchg based lists instead,
      allowing us to go through the whole FMR lifecycle without locking inside
      RDS.
      
      Zach Brown pointed out that our usage of cmpxchg for xlist removal is
      racey if someone manages to remove and add back an FMR struct into the list
      while another CPU can see the FMR's address at the head of the list.
      
      The second CPU might assume the list hasn't changed when in fact any
      number of operations might have happened in between the deletion and
      reinsertion.
      
      This commit maintains a per cpu count of CPUs that are currently
      in xlist removal, and establishes a grace period to make sure that
      nobody can see an entry we have just removed from the list.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6fa70da6
    • Z
      RDS/IB: add refcount tracking to struct rds_ib_device · 3e0249f9
      Zach Brown 提交于
      The RDS IB client .remove callback used to free the rds_ibdev for the given
      device unconditionally.  This could race other users of the struct.  This patch
      adds refcounting so that we only free the rds_ibdev once all of its users are
      done.
      
      Many rds_ibdev users are tied to connections.  We give the connection a
      reference and change these users to reference the device in the connection
      instead of looking it up in the IB client data.  The only user of the IB client
      data remaining is the first lookup of the device as connections are built up.
      
      Incrementing the reference count of a device found in the IB client data could
      race with final freeing so we use an RCU grace period to make sure that freeing
      won't happen until those lookups are done.
      
      MRs need the rds_ibdev to get at the pool that they're freed in to.  They exist
      outside a connection and many MRs can reference different devices from one
      socket, so it was natural to have each MR hold a reference.  MR refs can be
      dropped from interrupt handlers and final device teardown can block so we push
      it off to a work struct.  Pool teardown had to be fixed to cancel its pending
      work instead of deadlocking waiting for all queued work, including itself, to
      finish.
      
      MRs get their reference from the global device list, which gets a reference.
      It is left unprotected by locks and remains racy.  A simple global lock would
      be a significant bottleneck.  More scalable (complicated) locking should be
      done carefully in a later patch.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      3e0249f9
    • C
      rds: Use RCU for the bind lookup searches · 38a4e5e6
      Chris Mason 提交于
      The RDS bind lookups are somewhat expensive in terms of CPU
      time and locking overhead.  This commit changes them into a
      faster RCU based hash tree instead of the rbtrees they were using
      before.
      
      On large NUMA systems it is a significant improvement.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      38a4e5e6
    • A
      RDS/IB: add _to_node() macros for numa and use {k,v}malloc_node() · e4c52c98
      Andy Grover 提交于
      Allocate send/recv rings in memory that is node-local to the HCA.
      This significantly helps performance.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      e4c52c98
    • A
      4a81802b
    • C
      rds: rcu-ize rds_ib_get_device() · 764f2dd9
      Chris Mason 提交于
      rds_ib_get_device is called very often as we turn an
      ip address into a corresponding device structure.  It currently
      take a global spinlock as it walks different lists to find active
      devices.
      
      This commit changes the lists over to RCU, which isn't very complex
      because they are not updated very often at all.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      764f2dd9
    • A
      RDS: Implement atomic operations · 15133f6e
      Andy Grover 提交于
      Implement a CMSG-based interface to do FADD and CSWP ops.
      
      Alter send routines to handle atomic ops.
      
      Add atomic counters to stats.
      
      Add xmit_atomic() to struct rds_transport
      
      Inline rds_ib_send_unmap_rdma into unmap_rm
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      15133f6e