提交 · db6526dcb51b054961a2d96ba43dec23e38818b3 · OpenHarmony / kernel_linux

06 10月, 2015 2 次提交

RDS: use rds_send_xmit() state instead of RDS_LL_SEND_FULL · db6526dc

由 Santosh Shilimkar 提交于 9月 11, 2015

In Transport indepedent rds_sendmsg(), we shouldn't make decisions based
on RDS_LL_SEND_FULL which is used to manage the ring for RDMA based
transports. We can safely issue rds_send_xmit() and the using its
return value take decision on deferred work. This will also fix
the scenario where at times we are seeing connections stuck with
the LL_SEND_FULL bit getting set and never cleared.

We kick krdsd after any time we see -ENOMEM or -EAGAIN from the
ring allocation code.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

db6526dc

RDS: defer the over_batch work to send worker · 4bebdd7a

由 Santosh Shilimkar 提交于 9月 10, 2015

Current process gives up if its send work over the batch limit.
The work queue will get  kicked to finish off any other requests.
This fixes remainder condition from commit 443be0e5 ("RDS: make
sure not to loop forever inside rds_send_xmit").

The restart condition is only for the case where we reached to
over_batch code for some other reason so just retrying again
before giving up.

While at it, make sure we use already available 'send_batch_count'
parameter instead of magic value. The batch count threshold value
of 1024 came via commit 443be0e5 ("RDS: make sure not to loop
forever inside rds_send_xmit"). The idea is to process as big a
batch as we can but at the same time we don't hold other waiting
processes for send. Hence back-off after the send_batch_count
limit (1024) to avoid soft-lock ups.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

4bebdd7a

01 10月, 2015 4 次提交

RDS: Use per-bucket rw lock for bind hash-table · 9b9acde7

由 Santosh Shilimkar 提交于 2月 11, 2014

One global lock protecting hash-tables with 1024 buckets isn't
efficient and it shows up in a massive systems with truck
loads of RDS sockets serving multiple databases. The
perf data clearly highlights the contention on the rw
lock in these massive workloads.

When the contention gets worse, the code gets into a state where
it decides to back off on the lock. So while it has disabled interrupts,
it sits and backs off on this lock get. This causes the system to
become sluggish and eventually all sorts of bad things happen.

The simple fix is to move the lock into the hash bucket and
use per-bucket lock to improve the scalability.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

9b9acde7

RDS: fix rds_sock reference bug while doing bind · 28126959

由 Santosh Shilimkar 提交于 2月 03, 2012

One need to take rds socket reference while using it and release it
once done with it. rds_add_bind() code path does not do that so
lets fix it.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

28126959

RDS: make socket bind/release locking scheme simple and more efficient · 8b0a6b46

由 Santosh Shilimkar 提交于 2月 03, 2012

RDS bind and release locking scheme is very inefficient. It
uses RCU for maintaining the bind hash-table which is great but
it also needs to hold spinlock for [add/remove]_bound(). So
overall usecase, the hash-table concurrent speedup doesn't pay off.
In fact blocking nature of synchronize_rcu() makes the RDS
socket shutdown too slow which hurts RDS performance since
connection shutdown and re-connect happens quite often to
maintain the RC part of the protocol.

So we make the locking scheme simpler and more efficient by
replacing spin_locks with reader/writer locks and getting rid
off rcu for bind hash-table.

In subsequent patch, we also covert the global lock with per-bucket
lock to reduce the global lock contention.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

8b0a6b46

RDS: use kfree_rcu in rds_ib_remove_ipaddr · 59fe4606

由 Santosh Shilimkar 提交于 2月 03, 2012

synchronize_rcu() slowing down un-necessarily the socket shutdown
path. It is used just kfree() the ip addresses in rds_ib_remove_ipaddr()
which is perfect usecase for kfree_rcu();

So lets use that to gain some speedup.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

59fe4606

10 9月, 2015 1 次提交

RDS: verify the underlying transport exists before creating a connection · 74e98eb0

由 Sasha Levin 提交于 9月 08, 2015

There was no verification that an underlying transport exists when creating
a connection, this would cause dereferencing a NULL ptr.

It might happen on sockets that weren't properly bound before attempting to
send a message, which will cause a NULL ptr deref:

[135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[135546.051270] Modules linked in:
[135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
[135546.053217] task: ffff8800835bc000 ti: ffff8800bc708000 task.ti: ffff8800bc708000
[135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
[135546.055666] RSP: 0018:ffff8800bc70fab0  EFLAGS: 00010202
[135546.056457] RAX: dffffc0000000000 RBX: 0000000000000f2c RCX: ffff8800835bc000
[135546.057494] RDX: 0000000000000007 RSI: ffff8800835bccd8 RDI: 0000000000000038
[135546.058530] RBP: ffff8800bc70fb18 R08: 0000000000000001 R09: 0000000000000000
[135546.059556] R10: ffffed014d7a3a23 R11: ffffed014d7a3a21 R12: 0000000000000000
[135546.060614] R13: 0000000000000001 R14: ffff8801ec3d0000 R15: 0000000000000000
[135546.061668] FS:  00007faad4ffb700(0000) GS:ffff880252000000(0000) knlGS:0000000000000000
[135546.062836] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[135546.063682] CR2: 000000000000846a CR3: 000000009d137000 CR4: 00000000000006a0
[135546.064723] Stack:
[135546.065048]  ffffffffafe2055c ffffffffafe23fc1 ffffed00493097bf ffff8801ec3d0008
[135546.066247]  0000000000000000 00000000000000d0 0000000000000000 ac194a24c0586342
[135546.067438]  1ffff100178e1f78 ffff880320581b00 ffff8800bc70fdd0 ffff880320581b00
[135546.068629] Call Trace:
[135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 net/rds/connection.c:134)
[135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
[135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
[135546.071981] rds_sendmsg (net/rds/send.c:1058)
[135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
[135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
[135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
[135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3795)
[135546.076349] ? __might_fault (mm/memory.c:3795)
[135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
[135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
[135546.078856] SYSC_sendto (net/socket.c:1657)
[135546.079596] ? SYSC_connect (net/socket.c:1628)
[135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
[135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
[135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.083410] ? trace_event_raw_event_sys_enter (include/trace/events/syscalls.h:16)
[135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
[135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74e98eb0

06 9月, 2015 1 次提交

RDS: rds_conn_lookup() should factor in the struct net for a match · 8f384c01

由 Sowmini Varadhan 提交于 9月 03, 2015

Only return a conn if the rds_conn_net(conn) matches the struct
net passed to rds_conn_lookup().

Fixes: 467fa153 ("RDS-TCP: Support multiple RDS-TCP listen endpoints,
       one per netns.")
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f384c01

31 8月, 2015 4 次提交

IB/core: Make ib_dealloc_pd return void · 7dd78647

由 Jason Gunthorpe 提交于 8月 05, 2015

The majority of callers never check the return value, and even if they
did, they can't do anything about a failure.

All possible failure cases represent a bug in the caller, so just
WARN_ON inside the function instead.

This fixes a few random errors:
 net/rd/iw.c infinite loops while it fails. (racing with EBUSY?)

This also lays the ground work to get rid of error return from the
drivers. Most drivers do not error, the few that do are broken since
it cannot be handled.

Since uverbs can legitimately make use of EBUSY, open code the
check.
Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7dd78647

rds/ib: Remove ib_get_dma_mr calls · e5580242

由 Jason Gunthorpe 提交于 7月 30, 2015

The pd now has a local_dma_lkey member which completely replaces
ib_get_dma_mr, use it instead.
Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e5580242

RDS: Convert to ib_alloc_mr · fc279959

由 Sagi Grimberg 提交于 7月 30, 2015

Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

fc279959

IB/core: lock client data with lists_rwsem · 7c1eb45a

由 Haggai Eran 提交于 7月 30, 2015

An ib_client callback that is called with the lists_rwsem locked only for
read is protected from changes to the IB client lists, but not from
ib_unregister_device() freeing its client data. This is because
ib_unregister_device() will remove the device from the device list with
lists_rwsem locked for write, but perform the rest of the cleanup,
including the call to remove() without that lock.

Mark client data that is undergoing de-registration with a new going_down
flag in the client data context. Lock the client data list with lists_rwsem
for write in addition to using the spinlock, so that functions calling the
callback would be able to lock only lists_rwsem for read and let callbacks
sleep.

Since ib_unregister_client() now marks the client data context, no need for
remove() to search the context again, so pass the client data directly to
remove() callbacks.
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7c1eb45a

26 8月, 2015 21 次提交

RDS: remove superfluous from rds_ib_alloc_fmr() · 27241214

由 santosh.shilimkar@oracle.com 提交于 8月 25, 2015

Memory allocated for 'ibmr' uses kzalloc_node() which already
initialises the memory to zero. There is no need to do
memset() 0 on that memory.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27241214

RDS: flush the FMR pool less often · ef5217a6

由 santosh.shilimkar@oracle.com 提交于 8月 25, 2015

FMR flush is an expensive and time consuming operation. Reduce the
frequency of FMR pool flush by 50% so that more FMR work gets accumulated
for more efficient flushing.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef5217a6

RDS: push FMR pool flush work to its own worker · ad1d7dc0

由 santosh.shilimkar@oracle.com 提交于 8月 25, 2015

RDS FMR flush operation and also it races with connect/reconect
which happes a lot with RDS. FMR flush being on common rds_wq aggrevates
the problem. Lets push RDS FMR pool flush work to its own worker.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad1d7dc0

RDS: fix fmr pool dirty_count · 6116c203

由 Wengang Wang 提交于 8月 25, 2015

In rds_ib_flush_mr_pool(), dirty_count accounts the clean ones
which is wrong. This can lead to a negative dirty count value.

Lets fix it.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6116c203

RDS: Fix rds MR reference count in rds_rdma_unuse() · 3f6b3143

由 santosh.shilimkar@oracle.com 提交于 8月 25, 2015

rds_rdma_unuse() drops the mr reference count which it hasn't
taken. Correct way of removing mr is to remove mr from the tree
and then rdma_destroy_mr() it first, then rds_mr_put() to decrement
its reference count. Whichever thread holds last reference will free
the mr via rds_mr_put()

This bug was triggering weird null pointer crashes. One if the trace
for it is captured below.

BUG: unable to handle kernel NULL pointer dereference at
0000000000000104
IP: [<ffffffffa0899471>] rds_ib_free_mr+0x31/0x130 [rds_rdma]
PGD 4366fa067 PUD 4366f9067 PMD 0
Oops: 0000 [#1] SMP

[...]

task: ffff88046da6a000 ti: ffff88046da6c000 task.ti: ffff88046da6c000
RIP: 0010:[<ffffffffa0899471>]  [<ffffffffa0899471>]
rds_ib_free_mr+0x31/0x130 [rds_rdma]
RSP: 0018:ffff88046fa43bd8  EFLAGS: 00010286
RAX: 0000000071d38b80 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880079e7ff40
RBP: ffff88046fa43bf8 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88046fa43ca8 R11: ffff88046a802ed8 R12: ffff880079e7fa40
R13: 0000000000000000 R14: ffff880079e7ff40 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88046fa40000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000104 CR3: 00000004366fb000 CR4: 00000000000006e0
Stack:
 ffff880079e7fa40 ffff880671d38f08 ffff880079e7ff40 0000000000000296
 ffff88046fa43c28 ffffffffa087a38b ffff880079e7fa40 ffff880671d38f10
 0000000000000000 0000000000000292 ffff88046fa43c48 ffffffffa087a3b6
Call Trace:
 <IRQ>
 [<ffffffffa087a38b>] rds_destroy_mr+0x8b/0xa0 [rds]
 [<ffffffffa087a3b6>] __rds_put_mr_final+0x16/0x30 [rds]
 [<ffffffffa087a492>] rds_rdma_unuse+0xc2/0x120 [rds]
 [<ffffffffa08766d3>] rds_recv_incoming_exthdrs+0x83/0xa0 [rds]
 [<ffffffffa0876782>] rds_recv_incoming+0x92/0x200 [rds]
 [<ffffffffa0895269>] rds_ib_process_recv+0x259/0x320 [rds_rdma]
 [<ffffffffa08962a8>] rds_ib_recv_tasklet_fn+0x1a8/0x490 [rds_rdma]
 [<ffffffff810dcd78>] ? __remove_hrtimer+0x58/0x90
 [<ffffffff810799e1>] tasklet_action+0xb1/0xc0
 [<ffffffff81079b52>] __do_softirq+0xe2/0x290
 [<ffffffff81079df6>] irq_exit+0xa6/0xb0
 [<ffffffff81613915>] do_IRQ+0x65/0xf0
 [<ffffffff816118ab>] common_interrupt+0x6b/0x6b
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f6b3143

RDS: fix the dangling reference to rds_ib_incoming_slab · ba54d3ce

由 santosh.shilimkar@oracle.com 提交于 8月 25, 2015

On rds_ib_frag_slab allocation failure, ensure rds_ib_incoming_slab
is not pointing to the detsroyed memory.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba54d3ce

rds: Fix improper gfp_t usage. · b01d04aa

由 David S. Miller 提交于 8月 25, 2015

>> net/rds/ib_recv.c:382:28: sparse: incorrect type in initializer (different base types)
   net/rds/ib_recv.c:382:28:    expected int [signed] can_wait
   net/rds/ib_recv.c:382:28:    got restricted gfp_t
   net/rds/ib_recv.c:828:23: sparse: cast to restricted __le64
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b01d04aa

RDS: check for valid cm_id before initiating connection · ae05368a

由 santosh.shilimkar@oracle.com 提交于 8月 22, 2015

Connection could have been dropped while the route is being resolved
so check for valid cm_id before initiating the connection.
Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae05368a

RDS: return EMSGSIZE for oversize requests before processing/queueing · 06e8941e

由 Mukesh Kacker 提交于 8月 22, 2015

rds_send_queue_rm() allows for the "current datagram" being queued
to exceed SO_SNDBUF thresholds by checking bytes queued without
counting in length of current datagram. (Since sk_sndbuf is set
to twice requested SO_SNDBUF value as a kernel heuristic this
is usually fine!)

If this "current datagram" squeezing past the threshold is itself
many times the size of the sk_sndbuf threshold itself then even
twice the SO_SNDBUF does not save us and it gets queued but
cannot be transmitted. Threads block and deadlock and device
becomes unusable. The check for this datagram not exceeding
SNDBUF thresholds (EMSGSIZE) is not done on this datagram as
that check is only done if queueing attempt fails.
(Datagrams that follow this datagram fail queueing attempts, go
through the check and eventually trip EMSGSIZE error but zero
length datagrams silently fail!)

This fix moves the check for datagrams exceeding SNDBUF limits
before any processing or queueing is attempted and returns EMSGSIZE
early in the rds_sndmsg() code. This change also ensures that all
datagrams get checked for exceeding SNDBUF/sk_sndbuf size limits
and the large datagrams that exceed those limits do not get to
rds_send_queue_rm() code for processing.
Signed-off-by: NMukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06e8941e

RDS: make sure rds_send_drop_to properly takes the m_rs_lock · dfcec251

由 santosh.shilimkar@oracle.com 提交于 8月 22, 2015

rds_send_drop_to() is used during socket tear down to find all the
messages on the socket and flush them . It can race with the
acking code unless it takes the m_rs_lock on each and every message.

This plugs a hole where we didn't take m_rs_lock on any message that
didn't have the RDS_MSG_ON_CONN set. Taking m_rs_lock avoids
double frees and other memory corruptions as the ack code trusts
the message m_rs pointer on a socket that had actually been freed.

We must take m_rs_lock to access m_rs. Because of lock nesting and
rs access, we also need to acquire rs_lock.
Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dfcec251

RDS: Don't destroy the rdma id until after we're done using it · 1c3be624

由 Santosh Shilimkar 提交于 8月 22, 2015

During connection resets, we are destroying the rdma id too soon. We can't
destroy it when it is still in use. So lets move rdma_destroy_id() after
we clear the rings.
Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c3be624

RDS: Fix assertion level from fatal to warning · 5c240fa2

由 santosh.shilimkar@oracle.com 提交于 8月 22, 2015

Fix the asserion level since its not fatal and can be hit
in normal execution paths. There is no need to take the
system down.

We keep the WARN_ON() to detect the condition if we get
here with bad pages.
Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c240fa2

RDS: Make sure we do a signaled send for large-send · 3049147c

由 santosh.shilimkar@oracle.com 提交于 8月 22, 2015

WR(Work Requests )always generate a WC(Work Completion) with
signaled send. Default RDS ib code is setup for un-signaled
completion. Since RDS connction is persistent, we can end up
sending the data even after large-send when the remote end is
not active(for any reason).

By doing  a signaled send at least once per large-send,
we can at least detect the problem in work completion
handler there by avoiding sending more data to
inactive remote.
Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3049147c

RDS: Mark message mapped before transmit · 4f73113c

由 santosh.shilimkar@oracle.com 提交于 8月 22, 2015

rds_send_xmit() marks the rds message map flag after
xmit_[rdma/atomic]() which is clearly wrong.  We need
to maintain the ownership between transport and rds.

Also take care of error path.
Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f73113c

RDS: add a sock_destruct callback debug aid · 0df5f9a6

由 santosh.shilimkar@oracle.com 提交于 8月 22, 2015

This helps to detect the accidental processes/apps trying to destroy
the RDS socket which they are sharing with other processes/apps.
Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0df5f9a6

RDS: check for congestion updates during rds_send_xmit · 0c484240

由 santosh.shilimkar@oracle.com 提交于 8月 22, 2015

Ensure we don't keep sending the data if the link is congested.
Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c484240

RDS: make sure we post recv buffers · 73ce4317

由 santosh.shilimkar@oracle.com 提交于 8月 22, 2015

If we get an ENOMEM during rds_ib_recv_refill, we might never come
back and refill again later. Patch makes sure to kick krdsd into
helping out.

To achieve this we add RDS_RECV_REFILL flag and update in the refill
path based on that so that at least some therad will keep posting
receive buffers.

Since krdsd and softirq both might race for refill, we decide to
schedule on work queue based on ring_low instead of ring_empty.
Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73ce4317

RDS: don't update ip address tables if the address hasn't changed · e1f475a7

由 santosh.shilimkar@oracle.com 提交于 8月 22, 2015

If the ip address tables hasn't changed, there is no need to remove
them only to be added back again.

Lets fix it.
Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1f475a7

RDS: destroy the ib state earlier during shutdown · 1bc7b863

由 santosh.shilimkar@oracle.com 提交于 8月 22, 2015

Destroy ib state early during shutdown. Otherwise we can get callbacks
after the QP isn't really able to handle them.
Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1bc7b863

RDS: always free recv frag as we free its ring entry · 43962dd7

由 santosh.shilimkar@oracle.com 提交于 8月 22, 2015

We were still seeing rare occurrences of the WARN_ON(recv->r_frag) which
indicates that the recv refill path was finding allocated frags in ring
entries that were marked free. These were usually followed by OOM crashes.
They only seem to be occurring in the presence of completion errors and
connection resets.

This patch ensures that we free the frag as we mark the ring entry free.
This should stop the refill path from finding allocated frags in ring
entries that were marked free.
Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43962dd7

RDS: restore return value in rds_cmsg_rdma_args() · 1d2e3f39

由 santosh.shilimkar@oracle.com 提交于 8月 22, 2015

In rds_cmsg_rdma_args() 'ret' is used by rds_pin_pages() which returns
number of pinned pages on success. And the same value is returned to the
caller of rds_cmsg_rdma_args() on success which is not intended.

Commit f4a3fc03 ("RDS: Clean up error handling in rds_cmsg_rdma_args")
removed the 'ret = 0' line which broke RDS RDMA mode.

Fix it by restoring the return value on rds_pin_pages() success
keeping the clean-up in place.
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d2e3f39

08 8月, 2015 2 次提交

RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns. · 467fa153

由 Sowmini Varadhan 提交于 8月 05, 2015

Register pernet subsys init/stop functions that will set up
and tear down per-net RDS-TCP listen endpoints. Unregister
pernet subusys functions on 'modprobe -r' to clean up these
end points.

Enable keepalive on both accept and connect socket endpoints.
The keepalive timer expiration will ensure that client socket
endpoints will be removed as appropriate from the netns when
an interface is removed from a namespace.

Register a device notifier callback that will clean up all
sockets (and thus avoid the need to wait for keepalive timeout)
when the loopback device is unregistered from the netns indicating
that the netns is getting deleted.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

467fa153

RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net · d5a8ac28

由 Sowmini Varadhan 提交于 8月 05, 2015

Open the sockets calling sock_create_kern() with the correct struct net
pointer, and use that struct net pointer when verifying the
address passed to rds_bind().
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5a8ac28

04 8月, 2015 1 次提交

rds: fix an integer overflow test in rds_info_getsockopt() · 468b732b

由 Dan Carpenter 提交于 8月 01, 2015

"len" is a signed integer.  We check that len is not negative, so it
goes from zero to INT_MAX.  PAGE_SIZE is unsigned long so the comparison
is type promoted to unsigned long.  ULONG_MAX - 4095 is a higher than
INT_MAX so the condition can never be true.

I don't know if this is harmful but it seems safe to limit "len" to
INT_MAX - 4095.

Fixes: a8c879a7 ('RDS: Info and stats')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

468b732b

15 7月, 2015 1 次提交

rds: rds_ib_device.refcount overflow · 4fabb594

由 Wengang Wang 提交于 7月 06, 2015

Fixes: 3e0249f9 ("RDS/IB: add refcount tracking to struct rds_ib_device")

There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.

A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).

115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
116 {
117         BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0);
118         if (atomic_dec_and_test(&rds_ibdev->refcount))
119                 queue_work(rds_wq, &rds_ibdev->free_work);
120 }

fix is to drop refcount when rds_ib_alloc_fmr failed.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4fabb594

04 7月, 2015 1 次提交

net-RDS: Delete an unnecessary check before the function call "module_put" · f6e1c916

由 Markus Elfring 提交于 7月 02, 2015

The module_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6e1c916

22 6月, 2015 1 次提交

net: rds: use for_each_sg() for scatterlist parsing · d2a9ec64

由 Fabian Frederick 提交于 6月 16, 2015

This patch also renames sg to sglist and aligns function parameters.
See Documentation/DMA-API.txt - Part Id for scatterlist details
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2a9ec64

13 6月, 2015 1 次提交

IB/core: Change ib_create_cq to use struct ib_cq_init_attr · 8e37210b

由 Matan Barak 提交于 6月 11, 2015

Currently, ib_create_cq uses cqe and comp_vecotr instead
of the extendible ib_cq_init_attr struct.

Earlier patches already changed the vendors to work with
ib_cq_init_attr. This patch changes the consumers too.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8e37210b

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多