提交 · 2f425878b6a71571341dcd3f9e9d1a6f6355da9c · openeuler / raspberrypi-kernel

04 4月, 2009 1 次提交

nfsd: don't use the deferral service, return NFS4ERR_DELAY · 2f425878

由 Andy Adamson 提交于 4月 03, 2009

On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be
returned. It is up to the NFSv4.1 client to resend only the operations that
have not been processed.

Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in
nfsd4_proc_compound() only when NFSv4.1 Sessions are used.

Note: this isn't an adequate solution on its own. It's acceptable as a way
to get some minimal 4.1 up and working, but we're going to have to find a
way to avoid returning DELAY in all common cases before 4.1 can really be
considered ready.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
[nfsd41: reverse rq_nodeferral negative logic]
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
[sunrpc: initialize rq_usedeferral]
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

2f425878

19 3月, 2009 2 次提交

knfsd: add file to export stats about nfsd pools · 03cf6c9f

由 Greg Banks 提交于 1月 13, 2009

Add /proc/fs/nfsd/pool_stats to export to userspace various
statistics about the operation of rpc server thread pools.

This patch is based on a forward-ported version of
knfsd-add-pool-thread-stats which has been shipping in the SGI
"Enhanced NFS" product since 2006 and which was previously
posted:

http://article.gmane.org/gmane.linux.nfs/10375

It has also been updated thus:

 * moved EXPORT_SYMBOL() to near the function it exports
 * made the new struct struct seq_operations const
 * used SEQ_START_TOKEN instead of ((void *)1)
 * merged fix from SGI PV 990526 "sunrpc: use dprintk instead of
   printk in svc_pool_stats_*()" by Harshula Jayasuriya.
 * merged fix from SGI PV 964001 "Crash reading pool_stats before
   nfsds are started".
Signed-off-by: NGreg Banks <gnb@sgi.com>
Signed-off-by: NHarshula Jayasuriya <harshula@sgi.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

03cf6c9f

knfsd: avoid overloading the CPU scheduler with enormous load averages · 59a252ff

由 Greg Banks 提交于 1月 13, 2009

Avoid overloading the CPU scheduler with enormous load averages
when handling high call-rate NFS loads. When the knfsd bottom half
is made aware of an incoming call by the socket layer, it tries to
choose an nfsd thread and wake it up. As long as there are idle
threads, one will be woken up.

If there are lot of nfsd threads (a sensible configuration when
the server is disk-bound or is running an HSM), there will be many
more nfsd threads than CPUs to run them. Under a high call-rate
low service-time workload, the result is that almost every nfsd is
runnable, but only a handful are actually able to run. This situation
causes two significant problems:

1. The CPU scheduler takes over 10% of each CPU, which is robbing
the nfsd threads of valuable CPU time.

2. At a high enough load, the nfsd threads starve userspace threads
of CPU time, to the point where daemons like portmap and rpc.mountd
do not schedule for tens of seconds at a time. Clients attempting
to mount an NFS filesystem timeout at the very first step (opening
a TCP connection to portmap) because portmap cannot wake up from
select() and call accept() in time.

Disclaimer: these effects were observed on a SLES9 kernel, modern
kernels' schedulers may behave more gracefully.

The solution is simple: keep in each svc_pool a counter of the number
of threads which have been woken but have not yet run, and do not wake
any more if that count reaches an arbitrary small threshold.

Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
synthetic client threads simulating an rsync (i.e. recursive directory
listing) workload reading from an i386 RH9 install image (161480
regular files in 10841 directories) on the server. That tree is small
enough to fill in the server's RAM so no disk traffic was involved.
This setup gives a sustained call rate in excess of 60000 calls/sec
before being CPU-bound on the server. The server was running 128 nfsds.

Profiling showed schedule() taking 6.7% of every CPU, and __wake_up()
taking 5.2%. This patch drops those contributions to 3.0% and 2.2%.
Load average was over 120 before the patch, and 20.9 after.

This patch is a forward-ported version of knfsd-avoid-nfsd-overload
which has been shipping in the SGI "Enhanced NFS" product since 2006.
It has been posted before:

http://article.gmane.org/gmane.linux.nfs/10374Signed-off-by: NGreg Banks <gnb@sgi.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

59a252ff

07 1月, 2009 1 次提交

sunrpc: add sv_maxconn field to svc_serv (try ) · c9233eb7

由 Jeff Layton 提交于 10月 20, 2008

svc_check_conn_limits() attempts to prevent denial of service attacks
by having the service close old connections once it reaches a
threshold. This threshold is based on the number of threads in the
service:

	(serv->sv_nrthreads + 3) * 20

Once we reach this, we drop the oldest connections and a printk pops
to warn the admin that they should increase the number of threads.

Increasing the number of threads isn't an option however for services
like lockd. We don't want to eliminate this check entirely for such
services but we need some way to increase this limit.

This patch adds a sv_maxconn field to the svc_serv struct. When it's
set to 0, we use the current method to calculate the max number of
connections. RPC services can then set this on an as-needed basis.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

c9233eb7

30 9月, 2008 3 次提交

SUNRPC: Make svc_addr's argument a constant · 5344b12d

由 Chuck Lever 提交于 8月 27, 2008

Clean up: Add extra type safety and squelch a few compiler complaints
in upcoming patches.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

5344b12d

SUNRPC: Support IPv6 when registering kernel RPC services · a26cfad6

由 Chuck Lever 提交于 8月 18, 2008

In order to advertise NFS-related services on IPv6 interfaces via
rpcbind, the kernel RPC server implementation must use
rpcb_v4_register() instead of rpcb_register().

A new kernel build option allows distributions to use the legacy
v2 call until they integrate an appropriate user-space rpcbind
daemon that can support IPv6 RPC services.

I tried adding some automatic logic to fall back if registering
with a v4 protocol request failed, but there are too many corner
cases.  So I just made it a compile-time switch that distributions
can throw when they've replaced portmapper with rpcbind.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

a26cfad6

SUNRPC: Add address family field to svc_serv data structure · e851db5b

由 Chuck Lever 提交于 6月 30, 2008

Introduce and initialize an address family field in the svc_serv structure.

This field will determine what family to use for the service's listener
sockets and what families are advertised via the local rpcbind daemon.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

e851db5b

24 6月, 2008 2 次提交

sunrpc: remove sv_kill_signal field from svc_serv struct · a75c5d01

由 Jeff Layton 提交于 6月 10, 2008

Since we no longer make any distinction between shutdown signals with
nfsd, then it becomes easier to just standardize on a particular signal
to use to bring it down (SIGINT, in this case).
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

a75c5d01

knfsd: convert knfsd to kthread API · 9867d76c

由 Jeff Layton 提交于 6月 10, 2008

This patch is rather large, but I couldn't figure out a way to break it
up that would remain bisectable. It does several things:

- change svc_thread_fn typedef to better match what kthread_create expects
- change svc_pool_map_set_cpumask to be more kthread friendly. Make it
  take a task arg and and get rid of the "oldmask"
- have svc_set_num_threads call kthread_create directly
- eliminate __svc_create_thread
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

9867d76c

24 4月, 2008 1 次提交

SUNRPC: remove svc_create_thread() · 8774282c

由 Jeff Layton 提交于 4月 07, 2008

Now that the nfs4 callback thread uses the kthread API, there are no
more users of svc_create_thread(). Remove it.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

8774282c

11 2月, 2008 1 次提交

nfsd: clean up svc_reserve_auth() · fbb7878c

由 J. Bruce Fields 提交于 2月 07, 2008

This is a void function attempting to return the return value from
another void function, which seems harmless but extremely weird, and
apparently makes some compilers complain.

While we're there, clean up a little (e.g. the switch statement had a
minor style problem and seemed overkill as long as there's only one
case).

Thanks to Trond for noticing this.
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>

fbb7878c

02 2月, 2008 6 次提交

SUNRPC: spin svc_rqst initialization to its own function · 0113ab34

由 Jeff Layton 提交于 1月 29, 2008

Move the initialzation in __svc_create_thread that happens prior to
thread creation to a new function. Export the function to allow
services to have better control over the svc_rqst structs.

Also rearrange the rqstp initialization to prevent NULL pointer
dereferences in svc_exit_thread in case allocations fail.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

0113ab34

svc: Add transport hdr size for defer/revisit · 260c1d12

由 Tom Tucker 提交于 12月 30, 2007

Some transports have a header in front of the RPC header. The current
defer/revisit processing considers only the iov_len and arg_len to
determine how much to back up when saving the original request
to revisit. Add a field to the rqstp structure to save the size
of the transport header so svc_defer can correctly compute
the start of a request.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Acked-by: NNeil Brown <neilb@suse.de>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NGreg Banks <gnb@sgi.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

260c1d12

svc: Removing remaining references to rq_sock in rqstp · 57b1d3ba

由 Tom Tucker 提交于 12月 30, 2007

This functionally empty patch removes rq_sock and unamed union
from rqstp structure.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Acked-by: NNeil Brown <neilb@suse.de>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NGreg Banks <gnb@sgi.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

57b1d3ba

svc: Make deferral processing xprt independent · 8c7b0172

由 Tom Tucker 提交于 12月 30, 2007

This patch moves the transport independent sk_deferred list to the svc_xprt
structure and updates the svc_deferred_req structure to keep pointers to
svc_xprt's directly. The deferral processing code is also moved out of the
transport dependent recvfrom functions and into the generic svc_recv path.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Acked-by: NNeil Brown <neilb@suse.de>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NGreg Banks <gnb@sgi.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

8c7b0172

svc: Add transport specific xpo_release function · 5148bf4e

由 Tom Tucker 提交于 12月 30, 2007

The svc_sock_release function releases pages allocated to a thread. For
UDP this frees the receive skb. For RDMA it will post a receive WR
and bump the client credit count.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Acked-by: NNeil Brown <neilb@suse.de>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NGreg Banks <gnb@sgi.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

5148bf4e

svc: Change the svc_sock in the rqstp structure to a transport · 9f29868b

由 Tom Tucker 提交于 12月 30, 2007

The rqstp structure contains a pointer to the transport for the
RPC request. This functionaly trivial patch adds an unamed union
with pointers to both svc_sock and svc_xprt. Ultimately the
union will be removed and only the rq_xprt field will remain. This
allows incrementally extracting transport independent interfaces without
one gigundo patch.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Acked-by: NNeil Brown <neilb@suse.de>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NGreg Banks <gnb@sgi.com>
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

9f29868b

18 7月, 2007 2 次提交

knfsd: nfsd: set rq_client to ip-address-determined-domain · 3ab4d8b1

由 J. Bruce Fields 提交于 7月 17, 2007

We want it to be possible for users to restrict exports both by IP address and
by pseudoflavor. The pseudoflavor information has previously been passed
using special auth_domains stored in the rq_client field. After the preceding
patch that stored the pseudoflavor in rq_pflavor, that's now superfluous; so
now we use rq_client for the ip information, as auth_null and auth_unix do.

However, we keep around the special auth_domain in the rq_gssclient field for
backwards compatibility purposes, so we can still do upcalls using the old
"gss/pseudoflavor" auth_domain if upcalls using the unix domain to give us an
appropriate export. This allows us to continue supporting old mountd.

In fact, for this first patch, we always use the "gss/pseudoflavor"
auth_domain (and only it) if it is available; thus rq_client is ignored in the
auth_gss case, and this patch on its own makes no change in behavior; that
will be left to later patches.

Note on idmap: I'm almost tempted to just replace the auth_domain in the idmap
upcall by a dummy value--no version of idmapd has ever used it, and it's
unlikely anyone really wants to perform idmapping differently depending on the
where the client is (they may want to perform *credential* mapping
differently, but that's a different matter--the idmapper just handles id's
used in getattr and setattr). But I'm updating the idmapd code anyway, just
out of general backwards-compatibility paranoia.
Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ab4d8b1

knfsd: nfsd4: store pseudoflavor in request · c4170583

由 Andy Adamson 提交于 7月 17, 2007

Add a new field to the svc_rqst structure to record the pseudoflavor that the
request was made with.  For now we record the pseudoflavor but don't use it
for anything.
Signed-off-by: NAndy Adamson <andros@citi.umich.edu>
Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c4170583

10 7月, 2007 1 次提交
- J
  sendfile: convert nfsd to splice_direct_to_actor() · cf8208d0
  由 Jens Axboe 提交于 6月 12, 2007
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
  cf8208d0
10 5月, 2007 1 次提交

RPC: add wrapper for svc_reserve to account for checksum · cd123012

由 Jeff Layton 提交于 5月 09, 2007

When the kernel calls svc_reserve to downsize the expected size of an RPC
reply, it fails to account for the possibility of a checksum at the end of
the packet.  If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O
then you'll generally see messages similar to this in the server's ring
buffer:

RPC request reserved 164 but used 208

While I was never able to verify it, I suspect that this problem is also
the root cause of some oopses I've seen under these conditions:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726

This is probably also a problem for other sec= types and for NFSv4.  The
large reserved size for NFSv4 compound packets seems to generally paper
over the problem, however.

This patch adds a wrapper for svc_reserve that accounts for the possibility
of a checksum.  It also fixes up the appropriate callers of svc_reserve to
call the wrapper.  For now, it just uses a hardcoded value that I
determined via testing.  That value may need to be revised upward as things
change, or we may want to eventually add a new auth_op that attempts to
calculate this somehow.

Unfortunately, there doesn't seem to be a good way to reliably determine
the expected checksum length prior to actually calculating it, particularly
with schemes like spkm3.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NNeil Brown <neilb@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: NJ. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cd123012

07 3月, 2007 1 次提交

[PATCH] knfsd: remove CONFIG_IPV6 ifdefs from sunrpc server code · 5a05ed73

由 NeilBrown 提交于 3月 06, 2007

They don't really save that much, and aren't worth the hassle.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5a05ed73

13 2月, 2007 5 次提交

[PATCH] knfsd: SUNRPC: support IPv6 addresses in RPC server's UDP receive path · 95756482

由 Chuck Lever 提交于 2月 12, 2007

Add support for IPv6 addresses in the RPC server's UDP receive path.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

95756482

[PATCH] knfsd: SUNRPC: Make rq_daddr field address-version independent · 73df0dba

由 Chuck Lever 提交于 2月 12, 2007

The rq_daddr field must support larger addresses.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73df0dba

[PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses · 27459f09

由 Chuck Lever 提交于 2月 12, 2007

Expand the rq_addr field to allow it to contain larger addresses.

Specifically, we replace a 'sockaddr_in' with a 'sockaddr_storage', then
everywhere the 'sockaddr_in' was referenced, we use instead an accessor
function (svc_addr_in) which safely casts the _storage to _in.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

27459f09

[PATCH] knfsd: SUNRPC: Use sockaddr_storage to store address in svc_deferred_req · 24422222

由 Chuck Lever 提交于 2月 12, 2007

Sockaddr_storage will allow us to store arbitrary socket addresses in the
svc_deferred_req struct.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

24422222

[PATCH] knfsd: SUNRPC: Add a function to format the address in an svc_rqst for printing · ad06e4bd

由 Chuck Lever 提交于 2月 12, 2007

There are loads of places where the RPC server assumes that the rq_addr fields
contains an IPv4 address.  Top among these are error and debugging messages
that display the server's IP address.

Let's refactor the address printing into a separate function that's smart
enough to figure out the difference between IPv4 and IPv6 addresses.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ad06e4bd

27 1月, 2007 1 次提交

[PATCH] knfsd: fix an NFSD bug with full sized, non-page-aligned reads · 250f3915

由 NeilBrown 提交于 1月 26, 2007

NFSd assumes that largest number of pages that will be needed for a
request+response is 2+N where N pages is the size of the largest permitted
read/write request.  The '2' are 1 for the non-data part of the request, and 1
for the non-data part of the reply.

However, when a read request is not page-aligned, and we choose to use
->sendfile to send it directly from the page cache, we may need N+1 pages to
hold the whole reply.  This can overflow and array and cause an Oops.

This patch increases size of the array for holding pages by one and makes sure
that entry is NULL when it is not in use.
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

250f3915

21 10月, 2006 1 次提交

[PATCH] fix svc_procfunc declaration · 7111c66e

由 Al Viro 提交于 10月 19, 2006

svc_procfunc instances return __be32, not int
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Acked-by: NTrond Myklebust <trond.myklebust@fys.uio.no>
Acked-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7111c66e

06 10月, 2006 1 次提交

[PATCH] knfsd: tidy up up meaning of 'buffer size' in nfsd/sunrpc · c6b0a9f8

由 NeilBrown 提交于 10月 06, 2006

There is some confusion about the meaning of 'bufsz' for a sunrpc server.
In some cases it is the largest message that can be sent or received.  In
other cases it is the largest 'payload' that can be included in a NFS
message.

In either case, it is not possible for both the request and the reply to be
this large.  One of the request or reply may only be one page long, which
fits nicely with NFS.

So we remove 'bufsz' and replace it with two numbers: 'max_payload' and
'max_mesg'.  Max_payload is the size that the server requests.  It is used
by the server to check the max size allowed on a particular connection:
depending on the protocol a lower limit might be used.

max_mesg is the largest single message that can be sent or received.  It is
calculated as the max_payload, rounded up to a multiple of PAGE_SIZE, and
with PAGE_SIZE added to overhead.  Only one of the request and reply may be
this size.  The other must be at most one page.

Cc: Greg Banks <gnb@sgi.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c6b0a9f8

04 10月, 2006 4 次提交

[PATCH] knfsd: register all RPC programs with portmapper by default · bc5fea42

由 Olaf Kirch 提交于 10月 04, 2006

The NFSACL patches introduced support for multiple RPC services listening on
the same transport.  However, only the first of these services was registered
with portmapper.  This was perfectly fine for nfsacl, as you traditionally do
not want these to show up in a portmapper listing.

The patch below changes the default behavior to always register all services
listening on a given transport, but retains the old behavior for nfsacl
services.
Signed-off-by: NOlaf Kirch <okir@suse.de>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

bc5fea42

[PATCH] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP · 7adae489

由 Greg Banks 提交于 10月 04, 2006

The limit over UDP remains at 32K.  Also, make some of the apparently
arbitrary sizing constants clearer.

The biggest change here involves replacing NFSSVC_MAXBLKSIZE by a function of
the rqstp.  This allows it to be different for different protocols (udp/tcp)
and also allows it to depend on the servers declared sv_bufsiz.

Note that we don't actually increase sv_bufsz for nfs yet.  That comes next.
Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7adae489

[PATCH] knfsd: Avoid excess stack usage in svc_tcp_recvfrom · 3cc03b16

由 NeilBrown 提交于 10月 04, 2006

..  by allocating the array of 'kvec' in 'struct svc_rqst'.

As we plan to increase RPCSVC_MAXPAGES from 8 upto 256, we can no longer
allocate an array of this size on the stack.  So we allocate it in 'struct
svc_rqst'.

However svc_rqst contains (indirectly) an array of the same type and size
(actually several, but they are in a union).  So rather than waste space, we
move those arrays out of the separately allocated union and into svc_rqst to
share with the kvec moved out of svc_tcp_recvfrom (various arrays are used at
different times, so there is no conflict).
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3cc03b16

[PATCH] knfsd: Replace two page lists in struct svc_rqst with one · 44524359

由 NeilBrown 提交于 10月 04, 2006

We are planning to increase RPCSVC_MAXPAGES from about 8 to about 256.  This
means we need to be a bit careful about arrays of size RPCSVC_MAXPAGES.

struct svc_rqst contains two such arrays.  However the there are never more
that RPCSVC_MAXPAGES pages in the two arrays together, so only one array is
needed.

The two arrays are for the pages holding the request, and the pages holding
the reply.  Instead of two arrays, we can simply keep an index into where the
first reply page is.

This patch also removes a number of small inline functions that probably
server to obscure what is going on rather than clarify it, and opencode the
needed functionality.

Also remove the 'rq_restailpage' variable as it is *always* 0.  i.e.  if the
response 'xdr' structure has a non-empty tail it is always in the same pages
as the head.

 check counters are initilised and incr properly
 check for consistant usage of ++ etc
 maybe extra some inlines for common approach
 general review
Signed-off-by: NNeil Brown <neilb@suse.de>
Cc: Magnus Maatta <novell@kiruna.se>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

44524359

02 10月, 2006 6 次提交

[PATCH] knfsd: make rpc threads pools numa aware · bfd24160

由 Greg Banks 提交于 10月 02, 2006

Actually implement multiple pools.  On NUMA machines, allocate a svc_pool per
NUMA node; on SMP a svc_pool per CPU; otherwise a single global pool.  Enqueue
sockets on the svc_pool corresponding to the CPU on which the socket bh is run
(i.e.  the NIC interrupt CPU).  Threads have their cpu mask set to limit them
to the CPUs in the svc_pool that owns them.

This is the patch that allows an Altix to scale NFS traffic linearly
beyond 4 CPUs and 4 NICs.

Incorporates changes and feedback from Neil Brown, Trond Myklebust, and
Christoph Hellwig.
Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

bfd24160

[PATCH] knfsd: add svc_set_num_threads · a7455442

由 Greg Banks 提交于 10月 02, 2006

Currently knfsd keeps its own list of all nfsd threads in nfssvc.c; add a new
way of managing the list of all threads in a svc_serv.  Add
svc_create_pooled() to allow creation of a svc_serv whose threads are managed
by the sunrpc code.  Add svc_set_num_threads() to manage the number of threads
in a service, either per-pool or globally across the service.
Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a7455442

[PATCH] knfsd: add svc_get · 9a24ab57

由 Greg Banks 提交于 10月 02, 2006

add svc_get() for those occasions when we need to temporarily bump up
svc_serv->sv_nrthreads as a pseudo refcount.
Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9a24ab57

[PATCH] knfsd: split svc_serv into pools · 3262c816

由 Greg Banks 提交于 10月 02, 2006

Split out the list of idle threads and pending sockets from svc_serv into a
new svc_pool structure, and allocate a fixed number (in this patch, 1) of
pools per svc_serv. The new structure contains a lock which takes over
several of the duties of svc_serv->sv_lock, which is now relegated to
protecting only sv_tempsocks, sv_permsocks, and sv_tmpcnt in svc_serv.

The point is to move the hottest fields out of svc_serv and into svc_pool,
allowing a following patch to arrange for a svc_pool per NUMA node or per CPU.
This is a major step towards making the NFS server NUMA-friendly.
Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3262c816

[PATCH] knfsd: move tempsock aging to a timer · 36bdfc8b

由 Greg Banks 提交于 10月 02, 2006

Following are 11 patches from Greg Banks which combine to make knfsd more
Numa-aware.  They reduce hitting on 'global' data structures, and create some
data-structures that can be node-local.

knfsd threads are bound to a particular node, and the thread to handle a new
request is chosen from the threads that are attach to the node that received
the interrupt.

The distribution of threads across nodes can be controlled by a new file in
the 'nfsd' filesystem, though the default approach of an even spread is
probably fine for most sites.

Some (old) numbers that show the efficacy of these patches: N == number of
NICs == number of CPUs == nmber of clients.  Number of NUMA nodes == N/2

N	Throughput, MiB/s	CPU usage, % (max=N*100)
	Before	After		Before	After
	---	------	----		-----	-----
	4	312	435		350	228
	6	500	656		501	418
	8	562	804		690	589

This patch:

Move the aging of RPC/TCP connection sockets from the main svc_recv() loop to
a timer which uses a mark-and-sweep algorithm every 6 minutes.  This reduces
the amount of work that needs to be done in the main RPC loop and the length
of time we need to hold the (effectively global) svc_serv->sv_lock.

[akpm@osdl.org: cleanup]
Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

36bdfc8b

[PATCH] knfsd: Drop 'serv' option to svc_recv and svc_process · 6fb2b47f

由 NeilBrown 提交于 10月 02, 2006

It isn't needed as it is available in rqstp->rq_server, and dropping it allows
some local vars to be dropped.

[akpm@osdl.org: build fix]
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: NNeil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6fb2b47f