提交 · dcfd041c8710320d59fce322fd901bddaf912ae8 · openanolis / cloud-kernel

03 3月, 2016 1 次提交

RDS: IB: Remove the RDS_IB_SEND_OP dependency · dcfd041c

由 santosh.shilimkar@oracle.com 提交于 3月 01, 2016

This helps to combine asynchronous fastreg MR completion handler
with send completion handler.

No functional change.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcfd041c

08 10月, 2015 1 次提交

IB: split struct ib_send_wr · e622f2f4

由 Christoph Hellwig 提交于 10月 08, 2015

This patch split up struct ib_send_wr so that all non-trivial verbs
use their own structure which embedds struct ib_send_wr.  This dramaticly
shrinks the size of a WR for most common operations:

sizeof(struct ib_send_wr) (old):	96

sizeof(struct ib_send_wr):		48
sizeof(struct ib_rdma_wr):		64
sizeof(struct ib_atomic_wr):		96
sizeof(struct ib_ud_wr):		88
sizeof(struct ib_fast_reg_wr):		88
sizeof(struct ib_bind_mw_wr):		96
sizeof(struct ib_sig_handover_wr):	80

And with Sagi's pending MR rework the fast registration WR will also be
down to a reasonable size:

sizeof(struct ib_fastreg_wr):		64
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
Tested-by: NHaggai Eran <haggaie@mellanox.com>
Tested-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>

e622f2f4

06 10月, 2015 3 次提交

RDS: IB: split mr pool to improve 8K messages performance · 06766513

由 Santosh Shilimkar 提交于 9月 10, 2015

8K message sizes are pretty important usecase for RDS current
workloads so we make provison to have 8K mrs available from the pool.
Based on number of SG's in the RDS message, we pick a pool to use.

Also to make sure that we don't under utlise mrs when say 8k messages
are dominating which could lead to 8k pull being exhausted, we fall-back
to 1m pool till 8k pool recovers for use.

This helps to at least push ~55 kB/s bidirectional data which
is a nice improvement.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

06766513

RDS: IB: split send completion handling and do batch ack · 0c28c045

由 Santosh Shilimkar 提交于 9月 06, 2015

Similar to what we did with receive CQ completion handling, we split
the transmit completion handler so that it lets us implement batched
work completion handling.

We re-use the cq_poll routine and makes use of RDS_IB_SEND_OP to
identify the send vs receive completion event handler invocation.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

0c28c045

RDS: IB: ack more receive completions to improve performance · f4f943c9

由 Santosh Shilimkar 提交于 9月 06, 2015

For better performance, we split the receive completion IRQ handler. That
lets us acknowledge several WCE events in one call. We also limit the WC
to max 32 to avoid latency. Acknowledging several completions in one call
instead of several calls each time will provide better performance since
less mutual exclusion locks are being performed.

In next patch, send completion is also split which re-uses the poll_cq()
and hence the code is moved to ib_cm.c
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

f4f943c9

01 10月, 2015 1 次提交

RDS: use kfree_rcu in rds_ib_remove_ipaddr · 59fe4606

由 Santosh Shilimkar 提交于 2月 03, 2012

synchronize_rcu() slowing down un-necessarily the socket shutdown
path. It is used just kfree() the ip addresses in rds_ib_remove_ipaddr()
which is perfect usecase for kfree_rcu();

So lets use that to gain some speedup.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

59fe4606

31 8月, 2015 1 次提交

rds/ib: Remove ib_get_dma_mr calls · e5580242

由 Jason Gunthorpe 提交于 7月 30, 2015

The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.
Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e5580242

26 8月, 2015 2 次提交

RDS: push FMR pool flush work to its own worker · ad1d7dc0

由 santosh.shilimkar@oracle.com 提交于 8月 25, 2015

RDS FMR flush operation and also it races with connect/reconect
which happes a lot with RDS. FMR flush being on common rds_wq aggrevates
the problem. Lets push RDS FMR pool flush work to its own worker.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad1d7dc0

RDS: make sure we post recv buffers · 73ce4317

由 santosh.shilimkar@oracle.com 提交于 8月 22, 2015

If we get an ENOMEM during rds_ib_recv_refill, we might never come
back and refill again later. Patch makes sure to kick krdsd into
helping out.

To achieve this we add RDS_RECV_REFILL flag and update in the refill
path based on that so that at least some therad will keep posting
receive buffers.

Since krdsd and softirq both might race for refill, we decide to
schedule on work queue based on ring_low instead of ring_empty.
Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73ce4317

22 6月, 2015 1 次提交

net: rds: use for_each_sg() for scatterlist parsing · d2a9ec64

由 Fabian Frederick 提交于 6月 16, 2015

This patch also renames sg to sglist and aligns function parameters.
See Documentation/DMA-API.txt - Part Id for scatterlist details
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2a9ec64

19 5月, 2015 1 次提交

RDS: Switch to generic logging helpers · 3c88f3dc

由 Sagi Grimberg 提交于 5月 18, 2015

Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3c88f3dc

24 11月, 2014 1 次提交
- A
  rds: switch ->inc_copy_to_user() to passing iov_iter · c310e72c
  由 Al Viro 提交于 11月 20, 2014
```
instances get considerably simpler from that...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  c310e72c
20 11月, 2012 1 次提交

net: rds: use this_cpu_* per-cpu helper · ae4b46e9

由 Shan Wei 提交于 11月 12, 2012

Signed-off-by: NShan Wei <davidshan@tencent.com>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae4b46e9

30 5月, 2012 1 次提交

rds_rdma: don't assume infiniband device is PCI · a0c6ffbc

由 Thadeu Lima de Souza Cascardo 提交于 5月 28, 2012

RDS code assumes that the struct ib_device dma_device member, which is a
pointer, points to a struct device embedded in a struct pci_dev.

This is not the case for ehca, for example, which is a OF driver, and
makes dma_device point to a struct device embedded in a struct
platform_device.

This will make the system crash when rds_rdma is loaded in a system
with ehca, since it will try to access the bus member of a non-existent
struct pci_dev.

The only reason rds_rdma uses the struct pci_dev is to get the NUMA node
the device is attached to. Using dev_to_node for that is much better,
since it won't assume which bus the infiniband is attached to.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: dledford@redhat.com
Cc: Jes.Sorensen@redhat.com
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Acked-by: NVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0c6ffbc

07 6月, 2011 1 次提交

net: remove interrupt.h inclusion from netdevice.h · a6b7a407

由 Alexey Dobriyan 提交于 6月 06, 2011

* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6b7a407

01 2月, 2011 1 次提交

rds/ib: use system_wq instead of rds_ib_fmr_wq · c534a107

由 Tejun Heo 提交于 2月 01, 2011

With cmwq, there's no reason to use dedicated rds_ib_fmr_wq - it's not
in the memory reclaim path and the maximum number of concurrent work
items is bound by the number of devices.  Drop it and use system_wq
instead.  This rds_ib_fmr_init/exit() noops.  Both removed.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Andy Grover <andy.grover@oracle.com>

c534a107

21 10月, 2010 1 次提交

rds: make local functions/variables static · ff51bf84

由 stephen hemminger 提交于 10月 19, 2010

The RDS protocol has lots of functions that should be
declared static. rds_message_get/add_version_extension is
removed since it defined but never used.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff51bf84

09 9月, 2010 22 次提交

RDS/IB: print string constants in more places · 59f740a6

由 Zach Brown 提交于 8月 03, 2010

This prints the constant identifier for work completion status and rdma
cm event types, like we already do for IB event types.

A core string array helper is added that each string type uses.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

59f740a6

RDS/IB: protect the list of IB devices · ea819867

由 Zach Brown 提交于 7月 15, 2010

The RDS IB device list wasn't protected by any locking.  Traversal in
both the get_mr and FMR flushing paths could race with additon and
removal.

List manipulation is done with RCU primatives and is protected by the
write side of a rwsem.  The list traversal in the get_mr fast path is
protected by a rcu read critical section.  The FMR list traversal is
more problematic because it can block while traversing the list.  We
protect this with the read side of the rwsem.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

ea819867

RDS/IB: track signaled sends · f046011c

由 Zach Brown 提交于 7月 14, 2010

We're seeing bugs today where IB connection shutdown clears the send
ring while the tasklet is processing completed sends.  Implementation
details cause this to dereference a null pointer.  Shutdown needs to
wait for send completion to stop before tearing down the connection.  We
can't simply wait for the ring to empty because it may contain
unsignaled sends that will never be processed.

This patch tracks the number of signaled sends that we've posted and
waits for them to complete.  It also makes sure that the tasklet has
finished executing.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

f046011c

RDS: remove __init and __exit annotation · ef87b7ea

由 Zach Brown 提交于 7月 09, 2010

The trivial amount of memory saved isn't worth the cost of dealing with section
mismatches.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

ef87b7ea

RDS/IB: create a work queue for FMR flushing · 515e079d

由 Zach Brown 提交于 7月 06, 2010

This patch moves the FMR flushing work in to its own mult-threaded work queue.
This is to maintain performance in preparation for returning the main krdsd
work queue back to a single threaded work queue to avoid deep-rooted
concurrency bugs.

This is also good because it further separates FMRs, which might be removed
some day, from the rest of the code base.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

515e079d

RDS/IB: destroy connections on rmmod · 8aeb1ba6

由 Zach Brown 提交于 6月 25, 2010

IB connections were not being destroyed during rmmod.

First, recently IB device removal callback was changed to disconnect
connections that used the removing device rather than destroying them. So
connections with devices during rmmod were not being destroyed.

Second, rds_ib_destroy_nodev_conns() was being called before connections are
disassociated with devices. It would almost never find connections in the
nodev list.

We first get rid of rds_ib_destroy_conns(), which is no longer called, and
refactor the existing caller into the main body of the function and get rid of
the list and lock wrappers.

Then we call rds_ib_destroy_nodev_conns() *after* ib_unregister_client() has
removed the IB device from all the conns and put the conns on the nodev list.

The result is that IB connections are destroyed by rmmod.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

8aeb1ba6

A
RDS/IB: Make ib_recv_refill return void · b6fb0df1
由 Andy Grover 提交于 6月 23, 2010
```
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
```
b6fb0df1

rds: more FMRs are faster · eabb7322

由 Chris Mason 提交于 6月 11, 2010

When we add more FMRs, we flush them less often and so we go faster.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

eabb7322

RDS/IB: Add caching of frags and incs · 33244125

由 Chris Mason 提交于 5月 26, 2010

This patch is based heavily on an initial patch by Chris Mason.
Instead of freeing slab memory and pages, it keeps them, and
funnels them back to be reused.

The lock minimization strategy uses xchg and cmpxchg atomic ops
for manipulation of pointers to list heads. We anchor the lists with a
pointer to a list_head struct instead of a static list_head struct.
We just have to carefully use the existing primitives with
the difference between a pointer and a static head struct.

For example, 'list_empty()' means that our anchor pointer points to a list with
a single item instead of meaning that our static head element doesn't point to
any list items.

Original patch by Chris, with significant mods and fixes by Andy and Zach.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NZach Brown <zach.brown@oracle.com>

33244125

RDS: Use page_remainder_alloc() for recv bufs · 0b088e00

由 Andy Grover 提交于 5月 24, 2010

Instead of splitting up a page into RDS_FRAG_SIZE chunks
ourselves, ask rds_page_remainder_alloc() to do it. While it
is possible PAGE_SIZE > FRAG_SIZE, on x86en it isn't, so having
duplicate "carve up a page into buffers" code seems excessive.

The other modification this spawns is the use of a single
struct scatterlist in rds_page_frag instead of a bare page ptr.
This causes verbosity to increase in some places, and decrease
in others.

Finally, I decided to unify the lifetimes and alloc/free of
rds_page_frag and its page. This is a nice simplification in itself,
but will be extra-nice once we come to adding cmason's recycling
patch.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>

0b088e00

RDS/IB: add refcount tracking to struct rds_ib_device · 3e0249f9

由 Zach Brown 提交于 5月 18, 2010

The RDS IB client .remove callback used to free the rds_ibdev for the given
device unconditionally. This could race other users of the struct. This patch
adds refcounting so that we only free the rds_ibdev once all of its users are
done.

Many rds_ibdev users are tied to connections. We give the connection a
reference and change these users to reference the device in the connection
instead of looking it up in the IB client data. The only user of the IB client
data remaining is the first lookup of the device as connections are built up.

Incrementing the reference count of a device found in the IB client data could
race with final freeing so we use an RCU grace period to make sure that freeing
won't happen until those lookups are done.

MRs need the rds_ibdev to get at the pool that they're freed in to. They exist
outside a connection and many MRs can reference different devices from one
socket, so it was natural to have each MR hold a reference. MR refs can be
dropped from interrupt handlers and final device teardown can block so we push
it off to a work struct. Pool teardown had to be fixed to cancel its pending
work instead of deadlocking waiting for all queued work, including itself, to
finish.

MRs get their reference from the global device list, which gets a reference.
It is left unprotected by locks and remains racy. A simple global lock would
be a significant bottleneck. More scalable (complicated) locking should be
done carefully in a later patch.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

3e0249f9

RDS/IB: add _to_node() macros for numa and use {k,v}malloc_node() · e4c52c98

由 Andy Grover 提交于 4月 23, 2010

Allocate send/recv rings in memory that is node-local to the HCA.
This significantly helps performance.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>

e4c52c98

A
RDS: Move atomic stats from general to ib-specific area · 51e2cba8
由 Andy Grover 提交于 3月 29, 2010
```
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
```
51e2cba8

RDS: Refill recv ring directly from tasklet · f17a1a55

由 Andy Grover 提交于 3月 18, 2010

Performance is better if we use allocations that don't block
to refill the receive ring. Since the whole reason we were
kicking out to the worker thread was so we could do blocking
allocs, we no longer need to do this.

Remove gfp params from rds_ib_recv_refill(); we always use
GFP_NOWAIT.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>

f17a1a55

RDS: Perform unmapping ops in stages · ff3d7d36

由 Andy Grover 提交于 3月 01, 2010

Previously, RDS would wait until the final send WR had completed
and then handle cleanup. With silent ops, we do not know
if an atomic, rdma, or data op will be last. This patch
handles any of these cases by keeping a pointer to the last
op in the message in m_last_op.

When the TX completion event fires, rds dispatches to per-op-type
cleanup functions, and then does whole-message cleanup, if the
last op equalled m_last_op.

This patch also moves towards having op-specific functions take
the op struct, instead of the overall rm struct.

rds_ib_connection has a pointer to keep track of a a partially-
completed data send operation. This patch changes it from an
rds_message pointer to the narrower rm_data_op pointer, and
modifies places that use this pointer as needed.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>

ff3d7d36

RDS: Remove struct rds_rdma_op · f8b3aaf2

由 Andy Grover 提交于 3月 01, 2010

A big changeset, but it's all pretty dumb.

struct rds_rdma_op was already embedded in struct rm_rdma_op.
Remove rds_rdma_op and put its members in rm_rdma_op. Rename
members with "op_" prefix instead of "r_", for consistency.

Of course this breaks a lot, so fixup the code accordingly.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>

f8b3aaf2

A
RDS: Implement silent atomics · 241eef3e
由 Andy Grover 提交于 1月 19, 2010
```
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
```
241eef3e

RDS: Remove unsignaled_bytes sysctl · 1d34f175

由 Andy Grover 提交于 1月 14, 2010

Removed unsignaled_bytes sysctl and code to signal
based on it. I believe unsignaled_wrs is more than
sufficient for our purposes.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>

1d34f175

RDS/IB: Remove ib_[header/data]_sge() functions · 919ced4c

由 Andy Grover 提交于 1月 13, 2010

These functions were to cope with differently ordered
sg entries depending on RDS 3.0 or 3.1+. Now that
we've dropped 3.0 compatibility we no longer need them.

Also, modify usage sites for these to refer to sge[0] or [1]
directly. Reorder code to initialize header sgs first.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>

919ced4c

A
RDS: inc_purge() transport function unused - remove it · 809fa148
由 Andy Grover 提交于 1月 12, 2010
```
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
```
809fa148

RDS: Base init_depth and responder_resources on hw values · 40589e74

由 Andy Grover 提交于 1月 12, 2010

Instead of using a constant for initiator_depth and
responder_resources, read the per-QP values when the
device is enumerated, and then use these values when creating
the connection.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>

40589e74

RDS: Implement atomic operations · 15133f6e

由 Andy Grover 提交于 1月 12, 2010

Implement a CMSG-based interface to do FADD and CSWP ops.

Alter send routines to handle atomic ops.

Add atomic counters to stats.

Add xmit_atomic() to struct rds_transport

Inline rds_ib_send_unmap_rdma into unmap_rm
Signed-off-by: NAndy Grover <andy.grover@oracle.com>

15133f6e

31 10月, 2009 1 次提交

RDS/IB+IW: Move recv processing to a tasklet · d521b63b

由 Andy Grover 提交于 10月 30, 2009

Move receive processing from event handler to a tasklet.
This should help prevent hangcheck timer from going off
when RDS is under heavy load.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d521b63b

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功