提交 · fb43d17210baa538e58fc83d2d0f8a32399db73b · openanolis / cloud-kernel

06 2月, 2016 3 次提交

T
SUNRPC: Use the multipath iterator to assign a transport to each task · fb43d172
由 Trond Myklebust 提交于 1月 30, 2016
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
fb43d172

SUNRPC: Make rpc_clnt store the multipath iterators · ad01b2c6

由 Trond Myklebust 提交于 1月 30, 2016

This is a pre-patch for the RPC multipath code. It sets up the storage in
struct rpc_clnt for the multipath code.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ad01b2c6

SUNRPC: Add a structure to track multiple transports · 80b14d5e

由 Trond Myklebust 提交于 2月 14, 2015

In order to support multipathing/trunking we will need the ability to
track multiple transports. This patch sets up a basic structure for
doing so.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

80b14d5e

01 2月, 2016 3 次提交

SUNRPC: Make freeing of struct xprt rcu-safe · fda1bfef

由 Trond Myklebust 提交于 2月 14, 2015

Have it call kfree_rcu() to ensure that we can use it on rcu-protected
lists.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

fda1bfef

SUNRPC: Uninline xprt_get(); It isn't performance critical. · 30c5116b

由 Trond Myklebust 提交于 2月 24, 2015

Also allow callers to pass NULL arguments to xprt_get() and xprt_put().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

30c5116b

T
SUNRPC: Remove unused function rpc_task_reset_client · 58f13692
由 Trond Myklebust 提交于 1月 30, 2016
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
58f13692

23 1月, 2016 1 次提交

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

20 1月, 2016 11 次提交

svc_rdma: use local_dma_lkey · 5fe1043d

由 Christoph Hellwig 提交于 1月 07, 2016

We now alwasy have a per-PD local_dma_lkey available.  Make use of that
fact in svc_rdma and stop registering our own MR.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Acked-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5fe1043d

svcrdma: Add class for RDMA backwards direction transport · 5d252f90

由 Chuck Lever 提交于 1月 07, 2016

To support the server-side of an NFSv4.1 backchannel on RDMA
connections, add a transport class that enables backward
direction messages on an existing forward channel connection.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5d252f90

svcrdma: Define maximum number of backchannel requests · 03fe9931

由 Chuck Lever 提交于 1月 07, 2016

Extra resources for handling backchannel requests have to be
pre-allocated when a transport instance is created. Set up
additional fields in svcxprt_rdma to track these resources.

The max_requests fields are elements of the RPC-over-RDMA
protocol, so they should be u32. To ensure that unsigned
arithmetic is used everywhere, some other fields in the
svcxprt_rdma struct are updated.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

03fe9931

svcrdma: Make map_xdr non-static · ba986c96

由 Chuck Lever 提交于 1月 07, 2016

Pre-requisite to use map_xdr in the backchannel code.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ba986c96

svcrdma: Remove last two __GFP_NOFAIL call sites · 78da2b3c

由 Chuck Lever 提交于 1月 07, 2016

Clean up.

These functions can otherwise fail, so check for page allocation
failures too.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

78da2b3c

svcrdma: Add gfp flags to svc_rdma_post_recv() · 39b09a1a

由 Chuck Lever 提交于 1月 07, 2016

svc_rdma_post_recv() allocates pages for receive buffers on-demand.
It uses GFP_KERNEL so the allocator tries hard, and may sleep. But
I'm about to add a call to svc_rdma_post_recv() from a function
that may not sleep.

Since all svc_rdma_post_recv() call sites can tolerate its failure,
allow it to fail if the page allocator returns nothing. Longer term,
receive buffers, being a finite resource per-connection, should be
pre-allocated and re-used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

39b09a1a

svcrdma: Remove unused req_map and ctxt kmem_caches · 71810ef3

由 Chuck Lever 提交于 1月 07, 2016

Clean up.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

71810ef3

svcrdma: Improve allocation of struct svc_rdma_req_map · 2fe81b23

由 Chuck Lever 提交于 1月 07, 2016

To ensure this allocation cannot fail and will not sleep,
pre-allocate the req_map structures per-connection.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2fe81b23

svcrdma: Improve allocation of struct svc_rdma_op_ctxt · cc886c9f

由 Chuck Lever 提交于 1月 07, 2016

When the maximum payload size of NFS READ and WRITE was increased
by commit cc9a903d ("svcrdma: Change maximum server payload back
to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt
increased to over 6KB (on x86_64). That makes allocating one of
these from a kmem_cache more likely to fail in situations when
system memory is exhausted.

Since I'm about to add a caller where this allocation must always
work _and_ it cannot sleep, pre-allocate ctxts for each connection.

Another motivation for this change is that NFSv4.x servers are
required by specification not to drop NFS requests. Pre-allocating
memory resources reduces the likelihood of a drop.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

cc886c9f

svcrdma: Clean up process_context() · ced4ac0c

由 Chuck Lever 提交于 1月 07, 2016

Be sure the completed ctxt is put in every path.

The xprt enqueue can take a while, so put the completed ctxt back
in circulation _before_ enqueuing the xprt.

Remove/disable debugging.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ced4ac0c

svcrdma: Clean up rdma_create_xprt() · 3d61677c

由 Chuck Lever 提交于 1月 07, 2016

kzalloc is used here, so setting the atomic fields to zero is
unnecessary. sc_ord is set again in handle_connect_req. The other
fields are re-initialized in svc_rdma_accept().
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3d61677c

15 1月, 2016 1 次提交

kmemcg: account certain kmem allocations to memcg · 5d097056

由 Vladimir Davydov 提交于 1月 14, 2016

Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg.  For the list, see below:

 - threadinfo
 - task_struct
 - task_delay_info
 - pid
 - cred
 - mm_struct
 - vm_area_struct and vm_region (nommu)
 - anon_vma and anon_vma_chain
 - signal_struct
 - sighand_struct
 - fs_struct
 - files_struct
 - fdtable and fdtable->full_fds_bits
 - dentry and external_name
 - inode for all filesystems. This is the most tedious part, because
   most filesystems overwrite the alloc_inode method.

The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds.  Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5d097056

07 1月, 2016 2 次提交

Revert "svcrdma: Do not send XDR roundup bytes for a write chunk" · 3daa020f

由 J. Bruce Fields 提交于 12月 23, 2015

This reverts commit 6f18dc89.

Just as one example, it appears this code could do the wrong thing in
the case of a two-byte NFS READ that crosses a page boundary.

Chuck says: "In that case, nfsd would pass down an xdr_buf that has one
byte in a page, one byte in another page, and a two-byte XDR pad. The
logic introduced by this optimization would be fooled, and neither the
second byte nor the XDR pad would be written to the client."

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3daa020f

SUNRPC: Fixup socket wait for memory · 13331a55

由 Trond Myklebust 提交于 1月 06, 2016

We're seeing hangs in the NFS client code, with loops of the form:

 RPC: 30317 xmit incomplete (267368 left of 524448)
 RPC: 30317 call_status (status -11)
 RPC: 30317 call_transmit (status 0)
 RPC: 30317 xprt_prepare_transmit
 RPC: 30317 xprt_transmit(524448)
 RPC:       xs_tcp_send_request(267368) = -11
 RPC: 30317 xmit incomplete (267368 left of 524448)
 RPC: 30317 call_status (status -11)
 RPC: 30317 call_transmit (status 0)
 RPC: 30317 xprt_prepare_transmit
 RPC: 30317 xprt_transmit(524448)

Turns out commit ceb5d58b ("net: fix sock_wake_async() rcu protection")
moved SOCKWQ_ASYNC_NOSPACE out of sock->flags and into sk->sk_wq->flags,
however it never tried to fix up the code in net/sunrpc.

The new idiom is to use the flags in the RCU protected struct socket_wq.
While we're at it, clear out the now redundant places where we set/clear
SOCKWQ_ASYNC_NOSPACE and SOCK_NOSPACE. In principle, sk_stream_wait_memory()
is supposed to set these for us, so we only need to clear them in the
particular case of our ->write_space() callback.

Fixes: ceb5d58b ("net: fix sock_wake_async() rcu protection")
Cc: Eric Dumazet <edumazet@google.com>
Cc: stable@vger.kernel.org # 4.4
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

13331a55

31 12月, 2015 1 次提交

SUNRPC: Fix a missing break in rpc_anyaddr() · 0b161e63

由 Trond Myklebust 提交于 12月 30, 2015

The missing break means that we always return EAFNOSUPPORT when
faced with a request for an IPv6 loopback address.

Reported-by: coverity (CID 401987)
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

0b161e63

28 12月, 2015 1 次提交

SUNRPC: drop unused xs_reclassify_socketX() helpers · d1358917

由 Stefan Hajnoczi 提交于 12月 02, 2015

xs_reclassify_socket4() and friends used to be called directly.
xs_reclassify_socket() is called instead nowadays.

The xs_reclassify_socketX() helper functions are empty when
CONFIG_DEBUG_LOCK_ALLOC is not defined.  Drop them since they have no
callers.

Note that AF_LOCAL still calls xs_reclassify_socketu() directly but is
easily converted to generic xs_reclassify_socket().
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d1358917

23 12月, 2015 2 次提交

sunrpc: Add a function to close temporary transports immediately · c3d4879e

由 Scott Mayhew 提交于 12月 11, 2015

Add a function svc_age_temp_xprts_now() to close temporary transports
whose xpt_local matches the address passed in server_addr immediately
instead of waiting for them to be closed by the timer function.

The function is intended to be used by notifier_blocks that will be
added to nfsd and lockd that will run when an ip address is deleted.

This will eliminate the ACK storms and client hangs that occur in
HA-NFS configurations where nfsd & lockd is left running on the cluster
nodes all the time and the NFS 'service' is migrated back and forth
within a short timeframe.
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c3d4879e

xprtrdma: Avoid calling ib_query_device · e3e45b1b

由 Or Gerlitz 提交于 12月 18, 2015

Instead, use the cached copy of the attributes present on the device.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e3e45b1b

19 12月, 2015 12 次提交

xprtrdma: Revert commit ('xprtrdma: Cap req_cqinit'). · 26ae9d1c