提交 · b1691bc03d4eddb959234409167bef9be9e62d74 · openanolis / cloud-kernel

10 12月, 2014 13 次提交

sunrpc: convert to lockless lookup of queued server threads · b1691bc0

由 Jeff Layton 提交于 11月 21, 2014

Testing has shown that the pool->sp_lock can be a bottleneck on a busy
server. Every time data is received on a socket, the server must take
that lock in order to dequeue a thread from the sp_threads list.

Address this problem by eliminating the sp_threads list (which contains
threads that are currently idle) and replacing it with a RQ_BUSY flag in
svc_rqst. This allows us to walk the sp_all_threads list under the
rcu_read_lock and find a suitable thread for the xprt by doing a
test_and_set_bit.

Note that we do still have a potential atomicity problem however with
this approach. We don't want svc_xprt_do_enqueue to set the
rqst->rq_xprt pointer unless a test_and_set_bit of RQ_BUSY returned
zero (which indicates that the thread was idle). But, by the time we
check that, the bit could be flipped by a waking thread.

To address this, we acquire a new per-rqst spinlock (rq_lock) and take
that before doing the test_and_set_bit. If that returns false, then we
can set rq_xprt and drop the spinlock. Then, when the thread wakes up,
it must set the bit under the same spinlock and can trust that if it was
already set then the rq_xprt is also properly set.

With this scheme, the case where we have an idle thread no longer needs
to take the highly contended pool->sp_lock at all, and that removes the
bottleneck.

That still leaves one issue: What of the case where we walk the whole
sp_all_threads list and don't find an idle thread? Because the search is
lockess, it's possible for the queueing to race with a thread that is
going to sleep. To address that, we queue the xprt and then search again.

If we find an idle thread at that point, we can't attach the xprt to it
directly since that might race with a different thread waking up and
finding it. All we can do is wake the idle thread back up and let it
attempt to find the now-queued xprt.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Tested-by: NChris Worley <chris.worley@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

b1691bc0

sunrpc: fix potential races in pool_stats collection · 403c7b44

由 Jeff Layton 提交于 11月 21, 2014

In a later patch, we'll be removing some spinlocking around the socket
and thread queueing code in order to fix some contention problems. At
that point, the stats counters will no longer be protected by the
sp_lock.

Change the counters to atomic_long_t fields, except for the
"sockets_queued" counter which will still be manipulated under a
spinlock.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Tested-by: NChris Worley <chris.worley@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

403c7b44

sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it · 81244386

由 Jeff Layton 提交于 11月 21, 2014

...also make the manipulation of sp_all_threads list use RCU-friendly
functions.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Tested-by: NChris Worley <chris.worley@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

81244386

sunrpc: require svc_create callers to pass in meaningful shutdown routine · 0b5707e4

由 Jeff Layton 提交于 11月 19, 2014

Currently all svc_create callers pass in NULL for the shutdown parm,
which then gets fixed up to be svc_rpcb_cleanup if the service uses
rpcbind.

Simplify this by instead having the the only caller that requires it
(lockd) pass in svc_rpcb_cleanup and get rid of the special casing.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

0b5707e4

sunrpc: have svc_wake_up only deal with pool 0 · ceff739c

由 Jeff Layton 提交于 11月 19, 2014

The way that svc_wake_up works is a bit inefficient. It walks all of the
available pools for a service and either wakes up a task in each one or
sets the SP_TASK_PENDING flag in each one.

When svc_wake_up is called, there is no need to wake up more than one
thread to do this work. In practice, only lockd currently uses this
function and it's single threaded anyway. Thus, this just boils down to
doing a wake up of a thread in pool 0 or setting a single flag.

Eliminate the for loop in this function and change it to just operate on
pool 0. Also update the comments that sit above it and get rid of some
code that has been commented out for years now.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ceff739c

sunrpc: convert sp_task_pending flag to use atomic bitops · 4d5db3f5

由 Jeff Layton 提交于 11月 19, 2014

In a later patch, we'll want to be able to handle this flag without
holding the sp_lock. Change this field to an unsigned long flags
field, and declare a new flag in it that can be managed with atomic
bitops.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

4d5db3f5

sunrpc: move rq_cachetype field to better optimize space · 62978b3c

由 Jeff Layton 提交于 11月 19, 2014

There are a couple of holes in the svc_rqst field on x86_64. Move the
rq_cachetype to a different location to eliminate both of them.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

62978b3c

sunrpc: move rq_splice_ok flag into rq_flags · 779fb0f3

由 Jeff Layton 提交于 11月 19, 2014

Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

779fb0f3

sunrpc: move rq_dropme flag into rq_flags · 78b65eb3

由 Jeff Layton 提交于 11月 19, 2014

Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

78b65eb3

sunrpc: move rq_usedeferral flag to rq_flags · 30660e04

由 Jeff Layton 提交于 11月 19, 2014

Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

30660e04

sunrpc: move rq_local field to rq_flags · 7501cc2b

由 Jeff Layton 提交于 11月 19, 2014

Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

7501cc2b

sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it · 4d152e2c

由 Jeff Layton 提交于 11月 19, 2014

In a later patch, we're going to need some atomic bit flags. Since that
field will need to be an unsigned long, we mitigate that space
consumption by migrating some other bitflags to the new field. Start
with the rq_secure flag.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

4d152e2c

Merge tag 'nfs-for-3.19-1' into nfsd for-3.19 branch · 2941b0e9

由 J. Bruce Fields 提交于 12月 09, 2014

Mainly what I need is 860a0d9e "sunrpc: add some tracepoints in
svc_rqst handling functions", which subsequent server rpc patches from
jlayton depend on. I'm merging this later tag on the assumption that's
more likely to be a tested and stable point.

2941b0e9

02 12月, 2014 3 次提交

nfsd: minor off by one checks in __write_versions() · 818f2f57

由 Dan Carpenter 提交于 11月 27, 2014

My static checker complains that if "len == remaining" then it means we
have truncated the last character off the version string.

The intent of the code is that we print as many versions as we can
without truncating a version.  Then we put a newline at the end.  If the
newline can't fit we return -EINVAL.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

818f2f57

sunrpc: release svc_pool_map reference when serv allocation fails · 067f96ef

由 Jeff Layton 提交于 11月 19, 2014

Currently, it leaks when the allocation fails.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

067f96ef

sunrpc: eliminate the XPT_DETACHED flag · 8d65ef76

由 Jeff Layton 提交于 11月 17, 2014

All it does is indicate whether a xprt has already been deleted from
a list or not, which is unnecessary since we use list_del_init and it's
always set and checked under the sv_lock anyway.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

8d65ef76

28 11月, 2014 2 次提交

sunrpc: add a debugfs rpc_xprt directory with an info file in it · 388f0c77

由 Jeff Layton 提交于 11月 26, 2014

Add a new directory heirarchy under the debugfs sunrpc/ directory:

    sunrpc/
        rpc_xprt/
            <xprt id>/

Within that directory, we can put files that give info about the
xprts. We do have the (minor) problem that there is no succinct,
unique identifier for rpc_xprts. So we generate them synthetically
with a static atomic_t counter.

For now, this directory just holds an "info" file, but we may add
other files to it in the future.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

388f0c77

sunrpc: add debugfs file for displaying client rpc_task queue · b4b9d2cc

由 Jeff Layton 提交于 11月 26, 2014

It's possible to get a dump of the RPC task queue by writing a value to
/proc/sys/sunrpc/rpc_debug. If you write any value to that file, you get
a dump of the RPC client task list into the log buffer. This is a rather
inconvenient interface however, and makes it hard to get immediate info
about the task queue.

Add a new directory hierarchy under debugfs:

    sunrpc/
        rpc_clnt/
            <clientid>/

Within each clientid directory we create a new "tasks" file that will
dump info similar to what shows up in the log buffer, but with a few
small differences -- we avoid printing raw kernel addresses in favor of
symbolic names and the XID is also displayed.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b4b9d2cc

27 11月, 2014 2 次提交

Merge tag 'nfs-rdma-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next · ea526413

由 Trond Myklebust 提交于 11月 26, 2014

Pull NFS client RDMA changes for 3.19 from Anna Schumaker:
 "NFS: Client side changes for RDMA

  These patches various bugfixes and cleanups for using NFS over RDMA, including
  better error handling and performance improvements by using pad optimization.

  Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>"

* tag 'nfs-rdma-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
  xprtrdma: Display async errors
  xprtrdma: Enable pad optimization
  xprtrdma: Re-write rpcrdma_flush_cqs()
  xprtrdma: Refactor tasklet scheduling
  xprtrdma: unmap all FMRs during transport disconnect
  xprtrdma: Cap req_cqinit
  xprtrdma: Return an errno from rpcrdma_register_external()

ea526413

Merge tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next · 1702562d

由 Trond Myklebust 提交于 11月 26, 2014

Pull pull additional NFS client changes for 3.19 from Anna Schumaker:
  "NFS: Generic client side changes from Chuck

  These patches fixes for iostats and SETCLIENTID in addition to cleaning
  up the nfs4_init_callback() function.

  Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>"

* tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
  NFS: Clean up nfs4_init_callback()
  NFS: SETCLIENTID XDR buffer sizes are incorrect
  SUNRPC: serialize iostats updates

1702562d

26 11月, 2014 12 次提交

nfs: Add DEALLOCATE support · 624bd5b7

由 Anna Schumaker 提交于 11月 25, 2014

This patch adds support for using the NFS v4.2 operation DEALLOCATE to
punch holes in a file.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

624bd5b7

nfs: Add ALLOCATE support · f4ac1674

由 Anna Schumaker 提交于 11月 25, 2014

This patch adds support for using the NFS v4.2 operation ALLOCATE to
preallocate data in a file.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f4ac1674

NFS: Clean up nfs4_init_callback() · c2ef47b7