1. 09 11月, 2013 1 次提交
    • D
      IB/cma: Use cached gids · 29f27e84
      Doug Ledford 提交于
      The cma_acquire_dev function was changed by commit 3c86aa70
      ("RDMA/cm: Add RDMA CM support for IBoE devices") to use find_gid_port()
      because multiport devices might have either IB or IBoE formatted gids.
      The old function assumed that all ports on the same device used the
      same GID format.
      
      However, when it was changed to use find_gid_port(), we inadvertently
      lost usage of the GID cache.  This turned out to be a very costly
      change.  In our testing, each iteration through each index of the GID
      table takes roughly 35us.  When you have multiple devices in a system,
      and the GID you are looking for is on one of the later devices, the
      code loops through all of the GID indexes on all of the early devices
      before it finally succeeds on the target device.  This pathological
      search behavior combined with 35us per GID table index retrieval
      results in results such as the following from the cmtime application
      that's part of the latest librdmacm git repo:
      
      ib1:
      step              total ms     max ms     min us  us / conn
      create id    :       29.42       0.04       1.00       2.94
      bind addr    :   186705.66      19.00   18556.00   18670.57
      resolve addr :       41.93       9.68     619.00       4.19
      resolve route:      486.93       0.48     101.00      48.69
      create qp    :     4021.95       6.18     330.00     402.20
      connect      :    68350.39   68588.17   24632.00    6835.04
      disconnect   :     1460.43     252.65-1862269.00     146.04
      destroy      :       41.16       0.04       2.00       4.12
      
      ib0:
      step              total ms     max ms     min us  us / conn
      create id    :       28.61       0.68       1.00       2.86
      bind addr    :     2178.86       2.95     201.00     217.89
      resolve addr :       51.26      16.85     845.00       5.13
      resolve route:      620.08       0.43      92.00      62.01
      create qp    :     3344.40       6.36     273.00     334.44
      connect      :     6435.99    6368.53    7844.00     643.60
      disconnect   :     5095.38     321.90     757.00     509.54
      destroy      :       37.13       0.02       2.00       3.71
      
      Clearly, both the bind address and connect operations suffer
      a huge penalty for being anything other than the default
      GID on the first port in the system.
      
      After applying this patch, the numbers now look like this:
      
      ib1:
      step              total ms     max ms     min us  us / conn
      create id    :       30.15       0.03       1.00       3.01
      bind addr    :       80.27       0.04       7.00       8.03
      resolve addr :       43.02      13.53     589.00       4.30
      resolve route:      482.90       0.45     100.00      48.29
      create qp    :     3986.55       5.80     330.00     398.66
      connect      :     7141.53    7051.29    5005.00     714.15
      disconnect   :     5038.85     193.63     918.00     503.88
      destroy      :       37.02       0.04       2.00       3.70
      
      ib0:
      step              total ms     max ms     min us  us / conn
      create id    :       34.27       0.05       1.00       3.43
      bind addr    :       26.45       0.04       1.00       2.64
      resolve addr :       38.25      10.54     760.00       3.82
      resolve route:      604.79       0.43      97.00      60.48
      create qp    :     3314.95       6.34     273.00     331.49
      connect      :    12399.26   12351.10    8609.00    1239.93
      disconnect   :     5096.76     270.72    1015.00     509.68
      destroy      :       37.10       0.03       2.00       3.71
      
      It's worth noting that we still suffer a bit of a penalty on
      connect to the wrong device, but the penalty is much less than
      it used to be.  Follow on patches deal with this penalty.
      
      Many thanks to Neil Horman for helping to track the source of
      slow function that allowed us to track down the fact that
      the original patch I mentioned above backed out cache usage
      and identify just how much that impacted the system.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      29f27e84
  2. 13 8月, 2013 1 次提交
  3. 31 7月, 2013 3 次提交
  4. 15 7月, 2013 1 次提交
  5. 21 6月, 2013 20 次提交
  6. 29 5月, 2013 1 次提交
  7. 28 2月, 2013 2 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
    • T
      IB/core: convert to idr_alloc() · 3b069c5d
      Tejun Heo 提交于
      Convert to the much saner new idr interface.
      
      v2: Mike triggered WARN_ON() in idr_preload() because send_mad(),
          which may be used from non-process context, was calling
          idr_preload() unconditionally.  Preload iff @gfp_mask has
          __GFP_WAIT.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NSean Hefty <sean.hefty@intel.com>
      Reported-by: N"Marciniszyn, Mike" <mike.marciniszyn@intel.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b069c5d
  8. 30 11月, 2012 1 次提交
    • S
      RDMA/cm: Change return value from find_gid_port() · 63f05be2
      shefty 提交于
      Problem reported by Dan Carpenter <dan.carpenter@oracle.com>:
      
      The patch 3c86aa70: "RDMA/cm: Add RDMA CM support for IBoE
      devices" from Oct 13, 2010, leads to the following warning:
      net/sunrpc/xprtrdma/svc_rdma_transport.c:722 svc_rdma_create()
      	 error: passing non neg 1 to ERR_PTR
      
      This bug would result in a NULL dereference.  svc_rdma_create() is
      supposed to return ERR_PTRs or valid pointers, but instead it returns
      ERR_PTRs, valid pointers and 1.
      
      The call tree is:
      
      svc_rdma_create()
         => rdma_bind_addr()
            => cma_acquire_dev()
               => find_gid_port()
      
      rdma_bind_addr() should return a valid errno.  Fix this by having
      find_gid_port() also return a valid errno.  If we can't find the
      specified GID on a given port, return -EADDRNOTAVAIL, rather than
      -EAGAIN, to better indicate the error.  We also drop using the
      special return value of '1' and instead pass through the error
      returned by the underlying verbs call.  On such errors, rather
      than aborting the search,  we simply continue to check the next
      device/port.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      63f05be2
  9. 07 10月, 2012 1 次提交
  10. 05 10月, 2012 1 次提交
  11. 01 10月, 2012 1 次提交
  12. 28 7月, 2012 1 次提交
  13. 09 7月, 2012 4 次提交
  14. 20 6月, 2012 1 次提交
  15. 09 5月, 2012 1 次提交
    • S
      RDMA/cma: Fix lockdep false positive recursive locking · b6cec8aa
      Sean Hefty 提交于
      The following lockdep problem was reported by Or Gerlitz <ogerlitz@mellanox.com>:
      
          [ INFO: possible recursive locking detected ]
          3.3.0-32035-g1b2649e-dirty #4 Not tainted
          ---------------------------------------------
          kworker/5:1/418 is trying to acquire lock:
           (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0138a41>] rdma_destroy_i    d+0x33/0x1f0 [rdma_cm]
      
          but task is already holding lock:
           (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0135130>] cma_disable_ca    llback+0x24/0x45 [rdma_cm]
      
          other info that might help us debug this:
           Possible unsafe locking scenario:
      
                 CPU0
                 ----
            lock(&id_priv->handler_mutex);
            lock(&id_priv->handler_mutex);
      
           *** DEADLOCK ***
      
           May be due to missing lock nesting notation
      
          3 locks held by kworker/5:1/418:
           #0:  (ib_cm){.+.+.+}, at: [<ffffffff81042ac1>] process_one_work+0x210/0x4a    6
           #1:  ((&(&work->work)->work)){+.+.+.}, at: [<ffffffff81042ac1>] process_on    e_work+0x210/0x4a6
           #2:  (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0135130>] cma_disab    le_callback+0x24/0x45 [rdma_cm]
      
          stack backtrace:
          Pid: 418, comm: kworker/5:1 Not tainted 3.3.0-32035-g1b2649e-dirty #4
          Call Trace:
           [<ffffffff8102b0fb>] ? console_unlock+0x1f4/0x204
           [<ffffffff81068771>] __lock_acquire+0x16b5/0x174e
           [<ffffffff8106461f>] ? save_trace+0x3f/0xb3
           [<ffffffff810688fa>] lock_acquire+0xf0/0x116
           [<ffffffffa0138a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
           [<ffffffff81364351>] mutex_lock_nested+0x64/0x2ce
           [<ffffffffa0138a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm]
           [<ffffffff81065a78>] ? trace_hardirqs_on_caller+0x11e/0x155
           [<ffffffff81065abc>] ? trace_hardirqs_on+0xd/0xf
           [<ffffffffa0138a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm]
           [<ffffffffa0139c02>] cma_req_handler+0x418/0x644 [rdma_cm]
           [<ffffffffa012ee88>] cm_process_work+0x32/0x119 [ib_cm]
           [<ffffffffa0130299>] cm_req_handler+0x928/0x982 [ib_cm]
           [<ffffffffa01302f3>] ? cm_req_handler+0x982/0x982 [ib_cm]
           [<ffffffffa0130326>] cm_work_handler+0x33/0xfe5 [ib_cm]
           [<ffffffff81065a78>] ? trace_hardirqs_on_caller+0x11e/0x155
           [<ffffffffa01302f3>] ? cm_req_handler+0x982/0x982 [ib_cm]
           [<ffffffff81042b6e>] process_one_work+0x2bd/0x4a6
           [<ffffffff81042ac1>] ? process_one_work+0x210/0x4a6
           [<ffffffff813669f3>] ? _raw_spin_unlock_irq+0x2b/0x40
           [<ffffffff8104316e>] worker_thread+0x1d6/0x350
           [<ffffffff81042f98>] ? rescuer_thread+0x241/0x241
           [<ffffffff81046a32>] kthread+0x84/0x8c
           [<ffffffff8136e854>] kernel_thread_helper+0x4/0x10
           [<ffffffff81366d59>] ? retint_restore_args+0xe/0xe
           [<ffffffff810469ae>] ? __init_kthread_worker+0x56/0x56
           [<ffffffff8136e850>] ? gs_change+0xb/0xb
      
      The actual locking is fine, since we're dealing with different locks,
      but from the same lock class.  cma_disable_callback() acquires the
      listening id mutex, whereas rdma_destroy_id() acquires the mutex for
      the new connection id.  To fix this, delay the call to
      rdma_destroy_id() until we've released the listening id mutex.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b6cec8aa