1. 04 4月, 2009 1 次提交
  2. 19 3月, 2009 2 次提交
    • G
      knfsd: add file to export stats about nfsd pools · 03cf6c9f
      Greg Banks 提交于
      Add /proc/fs/nfsd/pool_stats to export to userspace various
      statistics about the operation of rpc server thread pools.
      
      This patch is based on a forward-ported version of
      knfsd-add-pool-thread-stats which has been shipping in the SGI
      "Enhanced NFS" product since 2006 and which was previously
      posted:
      
      http://article.gmane.org/gmane.linux.nfs/10375
      
      It has also been updated thus:
      
       * moved EXPORT_SYMBOL() to near the function it exports
       * made the new struct struct seq_operations const
       * used SEQ_START_TOKEN instead of ((void *)1)
       * merged fix from SGI PV 990526 "sunrpc: use dprintk instead of
         printk in svc_pool_stats_*()" by Harshula Jayasuriya.
       * merged fix from SGI PV 964001 "Crash reading pool_stats before
         nfsds are started".
      Signed-off-by: NGreg Banks <gnb@sgi.com>
      Signed-off-by: NHarshula Jayasuriya <harshula@sgi.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      03cf6c9f
    • G
      knfsd: avoid overloading the CPU scheduler with enormous load averages · 59a252ff
      Greg Banks 提交于
      Avoid overloading the CPU scheduler with enormous load averages
      when handling high call-rate NFS loads.  When the knfsd bottom half
      is made aware of an incoming call by the socket layer, it tries to
      choose an nfsd thread and wake it up.  As long as there are idle
      threads, one will be woken up.
      
      If there are lot of nfsd threads (a sensible configuration when
      the server is disk-bound or is running an HSM), there will be many
      more nfsd threads than CPUs to run them.  Under a high call-rate
      low service-time workload, the result is that almost every nfsd is
      runnable, but only a handful are actually able to run.  This situation
      causes two significant problems:
      
      1. The CPU scheduler takes over 10% of each CPU, which is robbing
         the nfsd threads of valuable CPU time.
      
      2. At a high enough load, the nfsd threads starve userspace threads
         of CPU time, to the point where daemons like portmap and rpc.mountd
         do not schedule for tens of seconds at a time.  Clients attempting
         to mount an NFS filesystem timeout at the very first step (opening
         a TCP connection to portmap) because portmap cannot wake up from
         select() and call accept() in time.
      
      Disclaimer: these effects were observed on a SLES9 kernel, modern
      kernels' schedulers may behave more gracefully.
      
      The solution is simple: keep in each svc_pool a counter of the number
      of threads which have been woken but have not yet run, and do not wake
      any more if that count reaches an arbitrary small threshold.
      
      Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16
      synthetic client threads simulating an rsync (i.e. recursive directory
      listing) workload reading from an i386 RH9 install image (161480
      regular files in 10841 directories) on the server.  That tree is small
      enough to fill in the server's RAM so no disk traffic was involved.
      This setup gives a sustained call rate in excess of 60000 calls/sec
      before being CPU-bound on the server.  The server was running 128 nfsds.
      
      Profiling showed schedule() taking 6.7% of every CPU, and __wake_up()
      taking 5.2%.  This patch drops those contributions to 3.0% and 2.2%.
      Load average was over 120 before the patch, and 20.9 after.
      
      This patch is a forward-ported version of knfsd-avoid-nfsd-overload
      which has been shipping in the SGI "Enhanced NFS" product since 2006.
      It has been posted before:
      
      http://article.gmane.org/gmane.linux.nfs/10374Signed-off-by: NGreg Banks <gnb@sgi.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      59a252ff
  3. 07 1月, 2009 1 次提交
    • J
      sunrpc: add sv_maxconn field to svc_serv (try #3) · c9233eb7
      Jeff Layton 提交于
      svc_check_conn_limits() attempts to prevent denial of service attacks
      by having the service close old connections once it reaches a
      threshold. This threshold is based on the number of threads in the
      service:
      
      	(serv->sv_nrthreads + 3) * 20
      
      Once we reach this, we drop the oldest connections and a printk pops
      to warn the admin that they should increase the number of threads.
      
      Increasing the number of threads isn't an option however for services
      like lockd. We don't want to eliminate this check entirely for such
      services but we need some way to increase this limit.
      
      This patch adds a sv_maxconn field to the svc_serv struct. When it's
      set to 0, we use the current method to calculate the max number of
      connections. RPC services can then set this on an as-needed basis.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Acked-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      c9233eb7
  4. 30 9月, 2008 3 次提交
  5. 24 6月, 2008 2 次提交
  6. 24 4月, 2008 1 次提交
  7. 11 2月, 2008 1 次提交
    • J
      nfsd: clean up svc_reserve_auth() · fbb7878c
      J. Bruce Fields 提交于
      This is a void function attempting to return the return value from
      another void function, which seems harmless but extremely weird, and
      apparently makes some compilers complain.
      
      While we're there, clean up a little (e.g. the switch statement had a
      minor style problem and seemed overkill as long as there's only one
      case).
      
      Thanks to Trond for noticing this.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      fbb7878c
  8. 02 2月, 2008 6 次提交
  9. 18 7月, 2007 2 次提交
    • J
      knfsd: nfsd: set rq_client to ip-address-determined-domain · 3ab4d8b1
      J. Bruce Fields 提交于
      We want it to be possible for users to restrict exports both by IP address and
      by pseudoflavor.  The pseudoflavor information has previously been passed
      using special auth_domains stored in the rq_client field.  After the preceding
      patch that stored the pseudoflavor in rq_pflavor, that's now superfluous; so
      now we use rq_client for the ip information, as auth_null and auth_unix do.
      
      However, we keep around the special auth_domain in the rq_gssclient field for
      backwards compatibility purposes, so we can still do upcalls using the old
      "gss/pseudoflavor" auth_domain if upcalls using the unix domain to give us an
      appropriate export.  This allows us to continue supporting old mountd.
      
      In fact, for this first patch, we always use the "gss/pseudoflavor"
      auth_domain (and only it) if it is available; thus rq_client is ignored in the
      auth_gss case, and this patch on its own makes no change in behavior; that
      will be left to later patches.
      
      Note on idmap: I'm almost tempted to just replace the auth_domain in the idmap
      upcall by a dummy value--no version of idmapd has ever used it, and it's
      unlikely anyone really wants to perform idmapping differently depending on the
      where the client is (they may want to perform *credential* mapping
      differently, but that's a different matter--the idmapper just handles id's
      used in getattr and setattr).  But I'm updating the idmapd code anyway, just
      out of general backwards-compatibility paranoia.
      Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ab4d8b1
    • A
      knfsd: nfsd4: store pseudoflavor in request · c4170583
      Andy Adamson 提交于
      Add a new field to the svc_rqst structure to record the pseudoflavor that the
      request was made with.  For now we record the pseudoflavor but don't use it
      for anything.
      Signed-off-by: NAndy Adamson <andros@citi.umich.edu>
      Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4170583
  10. 10 7月, 2007 1 次提交
  11. 10 5月, 2007 1 次提交
    • J
      RPC: add wrapper for svc_reserve to account for checksum · cd123012
      Jeff Layton 提交于
      When the kernel calls svc_reserve to downsize the expected size of an RPC
      reply, it fails to account for the possibility of a checksum at the end of
      the packet.  If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O
      then you'll generally see messages similar to this in the server's ring
      buffer:
      
      RPC request reserved 164 but used 208
      
      While I was never able to verify it, I suspect that this problem is also
      the root cause of some oopses I've seen under these conditions:
      
      https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726
      
      This is probably also a problem for other sec= types and for NFSv4.  The
      large reserved size for NFSv4 compound packets seems to generally paper
      over the problem, however.
      
      This patch adds a wrapper for svc_reserve that accounts for the possibility
      of a checksum.  It also fixes up the appropriate callers of svc_reserve to
      call the wrapper.  For now, it just uses a hardcoded value that I
      determined via testing.  That value may need to be revised upward as things
      change, or we may want to eventually add a new auth_op that attempts to
      calculate this somehow.
      
      Unfortunately, there doesn't seem to be a good way to reliably determine
      the expected checksum length prior to actually calculating it, particularly
      with schemes like spkm3.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Acked-by: NNeil Brown <neilb@suse.de>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Acked-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd123012
  12. 07 3月, 2007 1 次提交
  13. 13 2月, 2007 5 次提交
  14. 27 1月, 2007 1 次提交
    • N
      [PATCH] knfsd: fix an NFSD bug with full sized, non-page-aligned reads · 250f3915
      NeilBrown 提交于
      NFSd assumes that largest number of pages that will be needed for a
      request+response is 2+N where N pages is the size of the largest permitted
      read/write request.  The '2' are 1 for the non-data part of the request, and 1
      for the non-data part of the reply.
      
      However, when a read request is not page-aligned, and we choose to use
      ->sendfile to send it directly from the page cache, we may need N+1 pages to
      hold the whole reply.  This can overflow and array and cause an Oops.
      
      This patch increases size of the array for holding pages by one and makes sure
      that entry is NULL when it is not in use.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      250f3915
  15. 21 10月, 2006 1 次提交
  16. 06 10月, 2006 1 次提交
    • N
      [PATCH] knfsd: tidy up up meaning of 'buffer size' in nfsd/sunrpc · c6b0a9f8
      NeilBrown 提交于
      There is some confusion about the meaning of 'bufsz' for a sunrpc server.
      In some cases it is the largest message that can be sent or received.  In
      other cases it is the largest 'payload' that can be included in a NFS
      message.
      
      In either case, it is not possible for both the request and the reply to be
      this large.  One of the request or reply may only be one page long, which
      fits nicely with NFS.
      
      So we remove 'bufsz' and replace it with two numbers: 'max_payload' and
      'max_mesg'.  Max_payload is the size that the server requests.  It is used
      by the server to check the max size allowed on a particular connection:
      depending on the protocol a lower limit might be used.
      
      max_mesg is the largest single message that can be sent or received.  It is
      calculated as the max_payload, rounded up to a multiple of PAGE_SIZE, and
      with PAGE_SIZE added to overhead.  Only one of the request and reply may be
      this size.  The other must be at most one page.
      
      Cc: Greg Banks <gnb@sgi.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c6b0a9f8
  17. 04 10月, 2006 4 次提交
    • O
      [PATCH] knfsd: register all RPC programs with portmapper by default · bc5fea42
      Olaf Kirch 提交于
      The NFSACL patches introduced support for multiple RPC services listening on
      the same transport.  However, only the first of these services was registered
      with portmapper.  This was perfectly fine for nfsacl, as you traditionally do
      not want these to show up in a portmapper listing.
      
      The patch below changes the default behavior to always register all services
      listening on a given transport, but retains the old behavior for nfsacl
      services.
      Signed-off-by: NOlaf Kirch <okir@suse.de>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bc5fea42
    • G
      [PATCH] knfsd: Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP · 7adae489
      Greg Banks 提交于
      The limit over UDP remains at 32K.  Also, make some of the apparently
      arbitrary sizing constants clearer.
      
      The biggest change here involves replacing NFSSVC_MAXBLKSIZE by a function of
      the rqstp.  This allows it to be different for different protocols (udp/tcp)
      and also allows it to depend on the servers declared sv_bufsiz.
      
      Note that we don't actually increase sv_bufsz for nfs yet.  That comes next.
      Signed-off-by: NGreg Banks <gnb@melbourne.sgi.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7adae489
    • N
      [PATCH] knfsd: Avoid excess stack usage in svc_tcp_recvfrom · 3cc03b16
      NeilBrown 提交于
      ..  by allocating the array of 'kvec' in 'struct svc_rqst'.
      
      As we plan to increase RPCSVC_MAXPAGES from 8 upto 256, we can no longer
      allocate an array of this size on the stack.  So we allocate it in 'struct
      svc_rqst'.
      
      However svc_rqst contains (indirectly) an array of the same type and size
      (actually several, but they are in a union).  So rather than waste space, we
      move those arrays out of the separately allocated union and into svc_rqst to
      share with the kvec moved out of svc_tcp_recvfrom (various arrays are used at
      different times, so there is no conflict).
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3cc03b16
    • N
      [PATCH] knfsd: Replace two page lists in struct svc_rqst with one · 44524359
      NeilBrown 提交于
      We are planning to increase RPCSVC_MAXPAGES from about 8 to about 256.  This
      means we need to be a bit careful about arrays of size RPCSVC_MAXPAGES.
      
      struct svc_rqst contains two such arrays.  However the there are never more
      that RPCSVC_MAXPAGES pages in the two arrays together, so only one array is
      needed.
      
      The two arrays are for the pages holding the request, and the pages holding
      the reply.  Instead of two arrays, we can simply keep an index into where the
      first reply page is.
      
      This patch also removes a number of small inline functions that probably
      server to obscure what is going on rather than clarify it, and opencode the
      needed functionality.
      
      Also remove the 'rq_restailpage' variable as it is *always* 0.  i.e.  if the
      response 'xdr' structure has a non-empty tail it is always in the same pages
      as the head.
      
       check counters are initilised and incr properly
       check for consistant usage of ++ etc
       maybe extra some inlines for common approach
       general review
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Cc: Magnus Maatta <novell@kiruna.se>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      44524359
  18. 02 10月, 2006 6 次提交