提交 · ea819867b788728aca60717e4fdacb3df771f670 · openanolis / cloud-kernel

09 9月, 2010 13 次提交

RDS/IB: protect the list of IB devices · ea819867

由 Zach Brown 提交于 7月 15, 2010

The RDS IB device list wasn't protected by any locking.  Traversal in
both the get_mr and FMR flushing paths could race with additon and
removal.

List manipulation is done with RCU primatives and is protected by the
write side of a rwsem.  The list traversal in the get_mr fast path is
protected by a rcu read critical section.  The FMR list traversal is
more problematic because it can block while traversing the list.  We
protect this with the read side of the rwsem.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

ea819867

RDS: remove __init and __exit annotation · ef87b7ea

由 Zach Brown 提交于 7月 09, 2010

The trivial amount of memory saved isn't worth the cost of dealing with section
mismatches.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

ef87b7ea

RDS/IB: create a work queue for FMR flushing · 515e079d

由 Zach Brown 提交于 7月 06, 2010

This patch moves the FMR flushing work in to its own mult-threaded work queue.
This is to maintain performance in preparation for returning the main krdsd
work queue back to a single threaded work queue to avoid deep-rooted
concurrency bugs.

This is also good because it further separates FMRs, which might be removed
some day, from the rest of the code base.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

515e079d

RDS/IB: destroy connections on rmmod · 8aeb1ba6

由 Zach Brown 提交于 6月 25, 2010

IB connections were not being destroyed during rmmod.

First, recently IB device removal callback was changed to disconnect
connections that used the removing device rather than destroying them. So
connections with devices during rmmod were not being destroyed.

Second, rds_ib_destroy_nodev_conns() was being called before connections are
disassociated with devices. It would almost never find connections in the
nodev list.

We first get rid of rds_ib_destroy_conns(), which is no longer called, and
refactor the existing caller into the main body of the function and get rid of
the list and lock wrappers.

Then we call rds_ib_destroy_nodev_conns() *after* ib_unregister_client() has
removed the IB device from all the conns and put the conns on the nodev list.

The result is that IB connections are destroyed by rmmod.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

8aeb1ba6

RDS/IB: wait for IB dev freeing work to finish during rmmod · 24fa163a

由 Zach Brown 提交于 6月 25, 2010

The RDS IB client removal callback can queue work to drop the final reference
to an IB device. We have to make sure that this function has returned before
we complete rmmod or the work threads can try to execute freed code.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

24fa163a

RDS/IB: disconnect when IB devices are removed · fc19de38

由 Zach Brown 提交于 5月 24, 2010

Currently IB device removal destroys connections which are associated with the
device.  This prevents connections from being re-established when replacement
devices are added.

Instead we'll queue shutdown work on the connections as their devices are
removed.  When we see that devices are added we triger connection attempts on
all connections that don't currently have a device.

The result is that RDS sockets can resume device-independent work (bcopy, not
RDMA) across IB device removal and restoration.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

fc19de38

RDS/IB: add refcount tracking to struct rds_ib_device · 3e0249f9

由 Zach Brown 提交于 5月 18, 2010

The RDS IB client .remove callback used to free the rds_ibdev for the given
device unconditionally. This could race other users of the struct. This patch
adds refcounting so that we only free the rds_ibdev once all of its users are
done.

Many rds_ibdev users are tied to connections. We give the connection a
reference and change these users to reference the device in the connection
instead of looking it up in the IB client data. The only user of the IB client
data remaining is the first lookup of the device as connections are built up.

Incrementing the reference count of a device found in the IB client data could
race with final freeing so we use an RCU grace period to make sure that freeing
won't happen until those lookups are done.

MRs need the rds_ibdev to get at the pool that they're freed in to. They exist
outside a connection and many MRs can reference different devices from one
socket, so it was natural to have each MR hold a reference. MR refs can be
dropped from interrupt handlers and final device teardown can block so we push
it off to a work struct. Pool teardown had to be fixed to cancel its pending
work instead of deadlocking waiting for all queued work, including itself, to
finish.

MRs get their reference from the global device list, which gets a reference.
It is left unprotected by locks and remains racy. A simple global lock would
be a significant bottleneck. More scalable (complicated) locking should be
done carefully in a later patch.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

3e0249f9

RDS/IB: add _to_node() macros for numa and use {k,v}malloc_node() · e4c52c98

由 Andy Grover 提交于 4月 23, 2010

Allocate send/recv rings in memory that is node-local to the HCA.
This significantly helps performance.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>

e4c52c98

rds: rcu-ize rds_ib_get_device() · 764f2dd9

由 Chris Mason 提交于 4月 22, 2010

rds_ib_get_device is called very often as we turn an
ip address into a corresponding device structure.  It currently
take a global spinlock as it walks different lists to find active
devices.

This commit changes the lists over to RCU, which isn't very complex
because they are not updated very often at all.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

764f2dd9

RDS: Stop supporting old cong map sending method · 77dd550e

由 Andy Grover 提交于 3月 22, 2010

We now ask the transport to give us a rm for the congestion
map, and then we handle it normally. Previously, the
transport defined a function that we would call to send
a congestion map.

Convert TCP and loop transports to new cong map method.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>

77dd550e

A
RDS: inc_purge() transport function unused - remove it · 809fa148
由 Andy Grover 提交于 1月 12, 2010
```
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
```
809fa148

RDS: Base init_depth and responder_resources on hw values · 40589e74

由 Andy Grover 提交于 1月 12, 2010

Instead of using a constant for initiator_depth and
responder_resources, read the per-QP values when the
device is enumerated, and then use these values when creating
the connection.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>

40589e74

RDS: Implement atomic operations · 15133f6e

由 Andy Grover 提交于 1月 12, 2010

Implement a CMSG-based interface to do FADD and CSWP ops.

Alter send routines to handle atomic ops.

Add atomic counters to stats.

Add xmit_atomic() to struct rds_transport

Inline rds_ib_send_unmap_rdma into unmap_rm
Signed-off-by: NAndy Grover <andy.grover@oracle.com>

15133f6e

30 3月, 2010 1 次提交

include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6

由 Tejun Heo 提交于 3月 24, 2010

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

5a0e3ad6

20 11月, 2009 1 次提交

RDMA/cm: fix loopback address support · 6f8372b6

由 Sean Hefty 提交于 11月 19, 2009

The RDMA CM is intended to support the use of a loopback address
when establishing a connection; however, the behavior of the CM
when loopback addresses are used is confusing and does not always
work, depending on whether loopback was specified by the server,
the client, or both.

The defined behavior of rdma_bind_addr is to associate an RDMA
device with an rdma_cm_id, as long as the user specified a non-
zero address.  (ie they weren't just trying to reserve a port)
Currently, if the loopback address is passed to rdam_bind_addr,
no device is associated with the rdma_cm_id.  Fix this.

If a loopback address is specified by the client as the destination
address for a connection, it will fail to establish a connection.
This is true even if the server is listing across all addresses or
on the loopback address itself.  The issue is that the server tries
to translate the IP address carried in the REQ message to a local
net_device address, which fails.  The translation is not needed in
this case, since the REQ carries the actual HW address that should
be used.

Finally, cleanup loopback support to be more transport neutral.
Replace separate calls to get/set the sgid and dgid from the
device address to a single call that behaves correctly depending
on the format of the device address.  And support both IPv4 and
IPv6 address formats.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>

[ Fixed RDS build by s/ib_addr_get/rdma_addr_get/  - Roland ]
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

6f8372b6

24 8月, 2009 1 次提交

RDS: Track transports via an array, not a list · 335776bd

由 Andy Grover 提交于 8月 21, 2009

Now that transports can be loaded in arbitrary order,
it is important for rds_trans_get_preferred() to look
for them in a particular order, instead of walking the list
until it finds a transport that works for a given address.
Now, each transport registers for a specific transport slot,
and these are ordered so that preferred transports come first,
and then if they are not loaded, other transports are queried.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

335776bd

20 7月, 2009 2 次提交

RDS/IB: Always use PAGE_SIZE for FMR page size · a870d627

由 Andy Grover 提交于 7月 17, 2009

While FMRs allow significant flexibility in what size of pages they can use,
we really just want FMR pages to match CPU page size. Roland says we can
count on this always being supported, so this simplifies things.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a870d627

RDS: Set retry_count to 2 and make modifiable via modparam · 3ba23ade

由 Andy Grover 提交于 7月 17, 2009

This will be default cause IB connections to failover faster,
but allow a longer retry count to be used if desired.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ba23ade

10 4月, 2009 1 次提交

ERR_PTR() dereference in net/rds/ib.c · 94713bab

由 Dan Carpenter 提交于 4月 09, 2009

rdma_create_id() doesn't return NULL, only ERR_PTR().

Found by smatch (http://repo.or.cz/w/smatch.git).  Compile tested.

regards,
dan carpenter
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94713bab

02 4月, 2009 1 次提交

RDS: Rewrite connection cleanup, fixing oops on rmmod · 745cbcca

由 Andy Grover 提交于 4月 01, 2009

This fixes a bug where a connection was unexpectedly
not on *any* list while being destroyed. It also
cleans up some code duplication and regularizes some
function names.

* Grab appropriate lock in conn_free() and explain in comment
* Ensure via locking that a conn is never not on either
  a dev's list or the nodev list
* Add rds_xx_remove_conn() to match rds_xx_add_conn()
* Make rds_xx_add_conn() return void
* Rename remove_{,nodev_}conns() to
  destroy_{,nodev_}conns() and unify their implementation
  in a helper function
* Document lock ordering as nodev conn_lock before
  dev_conn_lock
Reported-by: NYosef Etigin <yosefe@voltaire.com>
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

745cbcca

27 2月, 2009 1 次提交

RDS/IB: Infiniband transport · ec16227e

由 Andy Grover 提交于 2月 24, 2009

Registers as an RDS transport and an IB client, and uses IB CM
API to allocate ids, queue pairs, and the rest of that fun stuff.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec16227e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功