1. 09 1月, 2022 11 次提交
    • T
      nfsd: Replace use of rwsem with errseq_t · 555dbf1a
      Trond Myklebust 提交于
      The nfsd_file nf_rwsem is currently being used to separate file write
      and commit instances to ensure that we catch errors and apply them to
      the correct write/commit.
      We can improve scalability at the expense of a little accuracy (some
      extra false positives) by replacing the nf_rwsem with more careful
      use of the errseq_t mechanism to track errors across the different
      operations.
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      [ cel: rebased on zero-verifier fix ]
      555dbf1a
    • C
      NFSD: Fix verifier returned in stable WRITEs · f11ad7aa
      Chuck Lever 提交于
      RFC 8881 explains the purpose of the write verifier this way:
      
      > The final portion of the result is the field writeverf. This field
      > is the write verifier and is a cookie that the client can use to
      > determine whether a server has changed instance state (e.g., server
      > restart) between a call to WRITE and a subsequent call to either
      > WRITE or COMMIT.
      
      But then it says:
      
      > This cookie MUST be unchanged during a single instance of the
      > NFSv4.1 server and MUST be unique between instances of the NFSv4.1
      > server. If the cookie changes, then the client MUST assume that
      > any data written with an UNSTABLE4 value for committed and an old
      > writeverf in the reply has been lost and will need to be
      > recovered.
      
      RFC 1813 has similar language for NFSv3. NFSv2 does not have a write
      verifier since it doesn't implement the COMMIT procedure.
      
      Since commit 19e0663f ("nfsd: Ensure sampling of the write
      verifier is atomic with the write"), the Linux NFS server has
      returned a boot-time-based verifier for UNSTABLE WRITEs, but a zero
      verifier for FILE_SYNC and DATA_SYNC WRITEs. FILE_SYNC and DATA_SYNC
      WRITEs are not followed up with a COMMIT, so there's no need for
      clients to compare verifiers for stable writes.
      
      However, by returning a different verifier for stable and unstable
      writes, the above commit puts the Linux NFS server a step farther
      out of compliance with the first MUST above. At least one NFS client
      (FreeBSD) noticed the difference, making this a potential
      regression.
      Reported-by: NRick Macklem <rmacklem@uoguelph.ca>
      Link: https://lore.kernel.org/linux-nfs/YQXPR0101MB096857EEACF04A6DF1FC6D9BDD749@YQXPR0101MB0968.CANPRD01.PROD.OUTLOOK.COM/T/
      Fixes: 19e0663f ("nfsd: Ensure sampling of the write verifier is atomic with the write")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      f11ad7aa
    • J
      nfsd: Retry once in nfsd_open on an -EOPENSTALE return · 12bcbd40
      Jeff Layton 提交于
      If we get back -EOPENSTALE from an NFSv4 open, then we either got some
      unhandled error or the inode we got back was not the same as the one
      associated with the dentry.
      
      We really have no recourse in that situation other than to retry the
      open, and if it fails to just return nfserr_stale back to the client.
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: NLance Shelton <lance.shelton@hammerspace.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      12bcbd40
    • J
      nfsd: Add errno mapping for EREMOTEIO · a2694e51
      Jeff Layton 提交于
      The NFS client can occasionally return EREMOTEIO when signalling issues
      with the server.  ...map to NFSERR_IO.
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: NLance Shelton <lance.shelton@hammerspace.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      a2694e51
    • P
      nfsd: map EBADF · b3d0db70
      Peng Tao 提交于
      Now that we have open file cache, it is possible that another client
      deletes the file and DP will not know about it. Then IO to MDS would
      fail with BADSTATEID and knfsd would start state recovery, which
      should fail as well and then nfs read/write will fail with EBADF.
      And it triggers a WARN() in nfserrno().
      
      -----------[ cut here ]------------
      WARNING: CPU: 0 PID: 13529 at fs/nfsd/nfsproc.c:758 nfserrno+0x58/0x70 [nfsd]()
      nfsd: non-standard errno: -9
      modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_connt
      pata_acpi floppy
      CPU: 0 PID: 13529 Comm: nfsd Tainted: G        W       4.1.5-00307-g6e6579b #7
      Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
       0000000000000000 00000000464e6c9c ffff88079085fba8 ffffffff81789936
       0000000000000000 ffff88079085fc00 ffff88079085fbe8 ffffffff810a08ea
       ffff88079085fbe8 ffff88080f45c900 ffff88080f627d50 ffff880790c46a48
       all Trace:
       [<ffffffff81789936>] dump_stack+0x45/0x57
       [<ffffffff810a08ea>] warn_slowpath_common+0x8a/0xc0
       [<ffffffff810a0975>] warn_slowpath_fmt+0x55/0x70
       [<ffffffff81252908>] ? splice_direct_to_actor+0x148/0x230
       [<ffffffffa02fb8c0>] ? fsid_source+0x60/0x60 [nfsd]
       [<ffffffffa02f9918>] nfserrno+0x58/0x70 [nfsd]
       [<ffffffffa02fba57>] nfsd_finish_read+0x97/0xb0 [nfsd]
       [<ffffffffa02fc7a6>] nfsd_splice_read+0x76/0xa0 [nfsd]
       [<ffffffffa02fcca1>] nfsd_read+0xc1/0xd0 [nfsd]
       [<ffffffffa0233af2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
       [<ffffffffa03073da>] nfsd3_proc_read+0xba/0x150 [nfsd]
       [<ffffffffa02f7a03>] nfsd_dispatch+0xc3/0x210 [nfsd]
       [<ffffffffa0233af2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
       [<ffffffffa0232913>] svc_process_common+0x453/0x6f0 [sunrpc]
       [<ffffffffa0232cc3>] svc_process+0x113/0x1b0 [sunrpc]
       [<ffffffffa02f740f>] nfsd+0xff/0x170 [nfsd]
       [<ffffffffa02f7310>] ? nfsd_destroy+0x80/0x80 [nfsd]
       [<ffffffff810bf3a8>] kthread+0xd8/0xf0
       [<ffffffff810bf2d0>] ? kthread_create_on_node+0x1b0/0x1b0
       [<ffffffff817912a2>] ret_from_fork+0x42/0x70
       [<ffffffff810bf2d0>] ? kthread_create_on_node+0x1b0/0x1b0
      Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: NLance Shelton <lance.shelton@hammerspace.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      b3d0db70
    • C
      NFSD: Fix zero-length NFSv3 WRITEs · 6a2f7744
      Chuck Lever 提交于
      The Linux NFS server currently responds to a zero-length NFSv3 WRITE
      request with NFS3ERR_IO. It responds to a zero-length NFSv4 WRITE
      with NFS4_OK and count of zero.
      
      RFC 1813 says of the WRITE procedure's @count argument:
      
      count
               The number of bytes of data to be written. If count is
               0, the WRITE will succeed and return a count of 0,
               barring errors due to permissions checking.
      
      RFC 8881 has similar language for NFSv4, though NFSv4 removed the
      explicit @count argument because that value is already contained in
      the opaque payload array.
      
      The synthetic client pynfs's WRT4 and WRT15 tests do emit zero-
      length WRITEs to exercise this spec requirement. Commit fdec6114
      ("nfsd4: zero-length WRITE should succeed") addressed the same
      problem there with the same fix.
      
      But interestingly the Linux NFS client does not appear to emit zero-
      length WRITEs, instead squelching them. I'm not aware of a test that
      can generate such WRITEs for NFSv3, so I wrote a naive C program to
      generate a zero-length WRITE and test this fix.
      
      Fixes: 8154ef27 ("NFSD: Clean up legacy NFS WRITE argument XDR decoders")
      Reported-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      6a2f7744
    • V
      nfsd4: add refcount for nfsd4_blocked_lock · 47446d74
      Vasily Averin 提交于
      nbl allocated in nfsd4_lock can be released by a several ways:
      directly in nfsd4_lock(), via nfs4_laundromat(), via another nfs
      command RELEASE_LOCKOWNER or via nfsd4_callback.
      This structure should be refcounted to be used and released correctly
      in all these cases.
      
      Refcount is initialized to 1 during allocation and is incremented
      when nbl is added into nbl_list/nbl_lru lists.
      
      Usually nbl is linked into both lists together, so only one refcount
      is used for both lists.
      
      However nfsd4_lock() should keep in mind that nbl can be present
      in one of lists only. This can happen if nbl was handled already
      by nfs4_laundromat/nfsd4_callback/etc.
      
      Refcount is decremented if vfs_lock_file() returns FILE_LOCK_DEFERRED,
      because nbl can be handled already by nfs4_laundromat/nfsd4_callback/etc.
      
      Refcount is not changed in find_blocked_lock() because of it reuses counter
      released after removing nbl from lists.
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      47446d74
    • J
      nfs: block notification on fs with its own ->lock · 40595cdc
      J. Bruce Fields 提交于
      NFSv4.1 supports an optional lock notification feature which notifies
      the client when a lock comes available.  (Normally NFSv4 clients just
      poll for locks if necessary.)  To make that work, we need to request a
      blocking lock from the filesystem.
      
      We turned that off for NFS in commit f657f8ee ("nfs: don't atempt
      blocking locks on nfs reexports") [sic] because it actually blocks the
      nfsd thread while waiting for the lock.
      
      Thanks to Vasily Averin for pointing out that NFS isn't the only
      filesystem with that problem.
      
      Any filesystem that leaves ->lock NULL will use posix_lock_file(), which
      does the right thing.  Simplest is just to assume that any filesystem
      that defines its own ->lock is not safe to request a blocking lock from.
      
      So, this patch mostly reverts commit f657f8ee ("nfs: don't atempt
      blocking locks on nfs reexports") [sic] and commit b840be2f ("lockd:
      don't attempt blocking locks on nfs reexports"), and instead uses a
      check of ->lock (Vasily's suggestion) to decide whether to support
      blocking lock notifications on a given filesystem.  Also add a little
      documentation.
      
      Perhaps someday we could add back an export flag later to allow
      filesystems with "good" ->lock methods to support blocking lock
      notifications.
      Reported-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      [ cel: Description rewritten to address checkpatch nits ]
      [ cel: Fixed warning when SUNRPC debugging is disabled ]
      [ cel: Fixed NULL check ]
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NVasily Averin <vvs@virtuozzo.com>
      40595cdc
    • C
      NFSD: De-duplicate nfsd4_decode_bitmap4() · cd2e999c
      Chuck Lever 提交于
      Clean up. Trond points out that xdr_stream_decode_uint32_array()
      does the same thing as nfsd4_decode_bitmap4().
      Suggested-by: NTrond Myklebust <trondmy@hammerspace.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      cd2e999c
    • J
      nfsd: improve stateid access bitmask documentation · 3dcd1d8a
      J. Bruce Fields 提交于
      The use of the bitmaps is confusing.  Add a cross-reference to make it
      easier to find the existing comment.  Add an updated reference with URL
      to make it quicker to look up.  And a bit more editorializing about the
      value of this.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      3dcd1d8a
    • C
      NFSD: Combine XDR error tracepoints · 70e94d75
      Chuck Lever 提交于
      Clean up: The garbage_args and cant_encode tracepoints report the
      same information as each other, so combine them into a single
      tracepoint class to reduce code duplication and slightly reduce the
      size of trace.o.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      70e94d75
  2. 14 12月, 2021 25 次提交
    • C
      SUNRPC: Remove low signal-to-noise tracepoints · 5089f3d9
      Chuck Lever 提交于
      I'm about to add more information to the server-side SUNRPC
      tracepoints, so I'm going to offset the increased trace log
      consumption by getting rid of some tracepoints that fire frequently
      but don't offer much value.
      
      trace_svc_xprt_received() was useful for debugging, perhaps, but
      is not generally informative.
      
      trace_svc_handle_xprt() reports largely the same information as
      trace_svc_xdr_recvfrom().
      
      As a clean-up, rename trace_svc_xprt_do_enqueue() to match
      svc_xprt_dequeue().
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      5089f3d9
    • N
      NFSD: simplify per-net file cache management · 1463b38e
      NeilBrown 提交于
      We currently have a 'laundrette' for closing cached files - a different
      work-item for each network-namespace.
      
      These 'laundrettes' (aka struct nfsd_fcache_disposal) are currently on a
      list, and are freed using rcu.
      
      The list is not necessary as we have a per-namespace structure (struct
      nfsd_net) which can hold a link to the nfsd_fcache_disposal.
      The use of kfree_rcu is also unnecessary as the cache is cleaned of all
      files associated with a given namespace, and no new files can be added,
      before the nfsd_fcache_disposal is freed.
      
      So add a '->fcache_disposal' link to nfsd_net, and discard the list
      management and rcu usage.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      1463b38e
    • J
      NFSD: Fix inconsistent indenting · 1e37d0e5
      Jiapeng Chong 提交于
      Eliminate the follow smatch warning:
      
      fs/nfsd/nfs4xdr.c:4766 nfsd4_encode_read_plus_hole() warn: inconsistent
      indenting.
      Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: NJiapeng Chong <jiapeng.chong@linux.alibaba.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      1e37d0e5
    • C
      NFSD: Remove be32_to_cpu() from DRC hash function · 7578b2f6
      Chuck Lever 提交于
      Commit 7142b98d ("nfsd: Clean up drc cache in preparation for
      global spinlock elimination"), billed as a clean-up, added
      be32_to_cpu() to the DRC hash function without explanation. That
      commit removed two comments that state that byte-swapping in the
      hash function is unnecessary without explaining whether there was
      a need for that change.
      
      On some Intel CPUs, the swab32 instruction is known to cause a CPU
      pipeline stall. be32_to_cpu() does not add extra randomness, since
      the hash multiplication is done /before/ shifting to the high-order
      bits of the result.
      
      As a micro-optimization, remove the unnecessary transform from the
      DRC hash function.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      7578b2f6
    • N
      NFS: switch the callback service back to non-pooled. · 23a1a573
      NeilBrown 提交于
      Now that thread management is consistent there is no need for
      nfs-callback to use svc_create_pooled() as introduced in Commit
      df807fff ("NFSv4.x/callback: Create the callback service through
      svc_create_pooled").  So switch back to svc_create().
      
      If service pools were configured, but the number of threads were left at
      '1', nfs callback may not work reliably when svc_create_pooled() is used.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      23a1a573
    • N
      lockd: use svc_set_num_threads() for thread start and stop · 6b044fba
      NeilBrown 提交于
      svc_set_num_threads() does everything that lockd_start_svc() does, except
      set sv_maxconn.  It also (when passed 0) finds the threads and
      stops them with kthread_stop().
      
      So move the setting for sv_maxconn, and use svc_set_num_thread()
      
      We now don't need nlmsvc_task.
      
      Now that we use svc_set_num_threads() it makes sense to set svo_module.
      This request that the thread exists with module_put_and_exit().
      Also fix the documentation for svo_module to make this explicit.
      
      svc_prepare_thread is now only used where it is defined, so it can be
      made static.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      6b044fba
    • N
      SUNRPC: always treat sv_nrpools==1 as "not pooled" · 93aa619e
      NeilBrown 提交于
      Currently 'pooled' services hold a reference on the pool_map, and
      'unpooled' services do not.
      svc_destroy() uses the presence of ->svo_function (via
      svc_serv_is_pooled()) to determine if the reference should be dropped.
      There is no direct correlation between being pooled and the use of
      svo_function, though in practice, lockd is the only non-pooled service,
      and the only one not to use svo_function.
      
      This is untidy and would cause problems if we changed lockd to use
      svc_set_num_threads(), which requires the use of ->svo_function.
      
      So change the test for "is the service pooled" to "is sv_nrpools > 1".
      
      This means that when svc_pool_map_get() returns 1, it must NOT take a
      reference to the pool.
      
      We discard svc_serv_is_pooled(), and test sv_nrpools directly.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      93aa619e
    • N
      SUNRPC: move the pool_map definitions (back) into svc.c · cf0e124e
      NeilBrown 提交于
      These definitions are not used outside of svc.c, and there is no
      evidence that they ever have been.  So move them into svc.c
      and make the declarations 'static'.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      cf0e124e
    • N
      lockd: rename lockd_create_svc() to lockd_get() · ecd3ad68
      NeilBrown 提交于
      lockd_create_svc() already does an svc_get() if the service already
      exists, so it is more like a "get" than a "create".
      
      So:
       - Move the increment of nlmsvc_users into the function as well
       - rename to lockd_get().
      
      It is now the inverse of lockd_put().
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      ecd3ad68
    • N
      lockd: introduce lockd_put() · 865b6740
      NeilBrown 提交于
      There is some cleanup that is duplicated in lockd_down() and the failure
      path of lockd_up().
      Factor these out into a new lockd_put() and call it from both places.
      
      lockd_put() does *not* take the mutex - that must be held by the caller.
      It decrements nlmsvc_users and if that reaches zero, it cleans up.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      865b6740
    • N
      lockd: move svc_exit_thread() into the thread · 6a4e2527
      NeilBrown 提交于
      The normal place to call svc_exit_thread() is from the thread itself
      just before it exists.
      Do this for lockd.
      
      This means that nlmsvc_rqst is not used out side of lockd_start_svc(),
      so it can be made local to that function, and renamed to 'rqst'.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      6a4e2527
    • N
      lockd: move lockd_start_svc() call into lockd_create_svc() · b73a2972
      NeilBrown 提交于
      lockd_start_svc() only needs to be called once, just after the svc is
      created.  If the start fails, the svc is discarded too.
      
      It thus makes sense to call lockd_start_svc() from lockd_create_svc().
      This allows us to remove the test against nlmsvc_rqst at the start of
      lockd_start_svc() - it must always be NULL.
      
      lockd_up() only held an extra reference on the svc until a thread was
      created - then it dropped it.  The thread - and thus the extra reference
      - will remain until kthread_stop() is called.
      Now that the thread is created in lockd_create_svc(), the extra
      reference can be dropped there.  So the 'serv' variable is no longer
      needed in lockd_up().
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      b73a2972
    • N
      lockd: simplify management of network status notifiers · 5a8a7ff5
      NeilBrown 提交于
      Now that the network status notifiers use nlmsvc_serv rather then
      nlmsvc_rqst the management can be simplified.
      
      Notifier unregistration synchronises with any pending notifications so
      providing we unregister before nlm_serv is freed no further interlock
      is required.
      
      So we move the unregister call to just before the thread is killed
      (which destroys the service) and just before the service is destroyed in
      the failure-path of lockd_up().
      
      Then nlm_ntf_refcnt and nlm_ntf_wq can be removed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      5a8a7ff5
    • N
      lockd: introduce nlmsvc_serv · 2840fe86
      NeilBrown 提交于
      lockd has two globals - nlmsvc_task and nlmsvc_rqst - but mostly it
      wants the 'struct svc_serv', and when it doesn't want it exactly it can
      get to what it wants from the serv.
      
      This patch is a first step to removing nlmsvc_task and nlmsvc_rqst.  It
      introduces nlmsvc_serv to store the 'struct svc_serv*'.  This is set as
      soon as the serv is created, and cleared only when it is destroyed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      2840fe86
    • N
      NFSD: simplify locking for network notifier. · d057cfec
      NeilBrown 提交于
      nfsd currently maintains an open-coded read/write semaphore (refcount
      and wait queue) for each network namespace to ensure the nfs service
      isn't shut down while the notifier is running.
      
      This is excessive.  As there is unlikely to be contention between
      notifiers and they run without sleeping, a single spinlock is sufficient
      to avoid problems.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      [ cel: ensure nfsd_notifier_lock is static ]
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      d057cfec
    • N
      SUNRPC: discard svo_setup and rename svc_set_num_threads_sync() · 3ebdbe52
      NeilBrown 提交于
      The ->svo_setup callback serves no purpose.  It is always called from
      within the same module that chooses which callback is needed.  So
      discard it and call the relevant function directly.
      
      Now that svc_set_num_threads() is no longer used remove it and rename
      svc_set_num_threads_sync() to remove the "_sync" suffix.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      3ebdbe52
    • N
      NFSD: Make it possible to use svc_set_num_threads_sync · 3409e4f1
      NeilBrown 提交于
      nfsd cannot currently use svc_set_num_threads_sync.  It instead
      uses svc_set_num_threads which does *not* wait for threads to all
      exit, and has a separate mechanism (nfsd_shutdown_complete) to wait
      for completion.
      
      The reason that nfsd is unlike other services is that nfsd threads can
      exit separately from svc_set_num_threads being called - they die on
      receipt of SIGKILL.  Also, when the last thread exits, the service must
      be shut down (sockets closed).
      
      For this, the nfsd_mutex needs to be taken, and as that mutex needs to
      be held while svc_set_num_threads is called, the one cannot wait for
      the other.
      
      This patch changes the nfsd thread so that it can drop the ref on the
      service without blocking on nfsd_mutex, so that svc_set_num_threads_sync
      can be used:
       - if it can drop a non-last reference, it does that.  This does not
         trigger shutdown and does not require a mutex.  This will likely
         happen for all but the last thread signalled, and for all threads
         being shut down by nfsd_shutdown_threads()
       - if it can get the mutex without blocking (trylock), it does that
         and then drops the reference.  This will likely happen for the
         last thread killed by SIGKILL
       - Otherwise there might be an unrelated task holding the mutex,
         possibly in another network namespace, or nfsd_shutdown_threads()
         might be just about to get a reference on the service, after which
         we can drop ours safely.
         We cannot conveniently get wakeup notifications on these events,
         and we are unlikely to need to, so we sleep briefly and check again.
      
      With this we can discard nfsd_shutdown_complete and
      nfsd_complete_shutdown(), and switch to svc_set_num_threads_sync.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      3409e4f1
    • N
      NFSD: narrow nfsd_mutex protection in nfsd thread · 9d3792ae
      NeilBrown 提交于
      There is nothing happening in the start of nfsd() that requires
      protection by the mutex, so don't take it until shutting down the thread
      - which does still require protection - but only for nfsd_put().
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      9d3792ae
    • N
      SUNRPC: use sv_lock to protect updates to sv_nrthreads. · 2a36395f
      NeilBrown 提交于
      Using sv_lock means we don't need to hold the service mutex over these
      updates.
      
      In particular,  svc_exit_thread() no longer requires synchronisation, so
      threads can exit asynchronously.
      
      Note that we could use an atomic_t, but as there are many more read
      sites than writes, that would add unnecessary noise to the code.
      Some reads are already racy, and there is no need for them to not be.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      2a36395f
    • N
      nfsd: make nfsd_stats.th_cnt atomic_t · 9b6c8c9b
      NeilBrown 提交于
      This allows us to move the updates for th_cnt out of the mutex.
      This is a step towards reducing mutex coverage in nfsd().
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      9b6c8c9b
    • N
      SUNRPC: stop using ->sv_nrthreads as a refcount · ec52361d
      NeilBrown 提交于
      The use of sv_nrthreads as a general refcount results in clumsy code, as
      is seen by various comments needed to explain the situation.
      
      This patch introduces a 'struct kref' and uses that for reference
      counting, leaving sv_nrthreads to be a pure count of threads.  The kref
      is managed particularly in svc_get() and svc_put(), and also nfsd_put();
      
      svc_destroy() now takes a pointer to the embedded kref, rather than to
      the serv.
      
      nfsd allows the svc_serv to exist with ->sv_nrhtreads being zero.  This
      happens when a transport is created before the first thread is started.
      To support this, a 'keep_active' flag is introduced which holds a ref on
      the svc_serv.  This is set when any listening socket is successfully
      added (unless there are running threads), and cleared when the number of
      threads is set.  So when the last thread exits, the nfs_serv will be
      destroyed.
      The use of 'keep_active' replaces previous code which checked if there
      were any permanent sockets.
      
      We no longer clear ->rq_server when nfsd() exits.  This was done
      to prevent svc_exit_thread() from calling svc_destroy().
      Instead we take an extra reference to the svc_serv to prevent
      svc_destroy() from being called.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      ec52361d
    • N
      SUNRPC/NFSD: clean up get/put functions. · 8c62d127
      NeilBrown 提交于
      svc_destroy() is poorly named - it doesn't necessarily destroy the svc,
      it might just reduce the ref count.
      nfsd_destroy() is poorly named for the same reason.
      
      This patch:
       - removes the refcount functionality from svc_destroy(), moving it to
         a new svc_put().  Almost all previous callers of svc_destroy() now
         call svc_put().
       - renames nfsd_destroy() to nfsd_put() and improves the code, using
         the new svc_destroy() rather than svc_put()
       - removes a few comments that explain the important for balanced
         get/put calls.  This should be obvious.
      
      The only non-trivial part of this is that svc_destroy() would call
      svc_sock_update() on a non-final decrement.  It can no longer do that,
      and svc_put() isn't really a good place of it.  This call is now made
      from svc_exit_thread() which seems like a good place.  This makes the
      call *before* sv_nrthreads is decremented rather than after.  This
      is not particularly important as the call just sets a flag which
      causes sv_nrthreads set be checked later.  A subsequent patch will
      improve the ordering.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      8c62d127
    • N
      SUNRPC: change svc_get() to return the svc. · df5e49c8
      NeilBrown 提交于
      It is common for 'get' functions to return the object that was 'got',
      and there are a couple of places where users of svc_get() would be a
      little simpler if svc_get() did that.
      
      Make it so.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      df5e49c8
    • N
      NFSD: handle errors better in write_ports_addfd() · 89b24336
      NeilBrown 提交于
      If write_ports_add() fails, we shouldn't destroy the serv, unless we had
      only just created it.  So if there are any permanent sockets already
      attached, leave the serv in place.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      89b24336
    • C
      NFSD: Fix sparse warning · c2f1c4bd
      Chuck Lever 提交于
      /home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: warning: incorrect type in assignment (different base types)
      /home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24:    expected restricted __be32 [usertype] status
      /home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24:    got int
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      c2f1c4bd
  3. 13 12月, 2021 4 次提交
    • L
      Linux 5.16-rc5 · 2585cf9d
      Linus Torvalds 提交于
      2585cf9d
    • L
      Merge tag 'usb-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 90d9fbc1
      Linus Torvalds 提交于
      Pull USB fixes from Greg KH:
       "Here are some small USB fixes for 5.16-rc5.  They include:
      
         - gadget driver fixes for reported issues
      
         - xhci fixes for reported problems.
      
         - config endpoint parsing fixes for where we got bitfields wrong
      
        Most of these have been in linux-next, the remaining few were not, but
        got lots of local testing in my systems and in some cloud testing
        infrastructures"
      
      * tag 'usb-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: core: config: using bit mask instead of individual bits
        usb: core: config: fix validation of wMaxPacketValue entries
        USB: gadget: zero allocate endpoint 0 buffers
        USB: gadget: detect too-big endpoint 0 requests
        xhci: avoid race between disable slot command and host runtime suspend
        xhci: Remove CONFIG_USB_DEFAULT_PERSIST to prevent xHCI from runtime suspending
        Revert "usb: dwc3: dwc3-qcom: Enable tx-fifo-resize property by default"
      90d9fbc1
    • L
      Merge tag 'char-misc-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 8d7ed104
      Linus Torvalds 提交于
      Pull char/misc driver fixes from Greg KH:
       "Here are a bunch of small char/misc and other driver subsystem fixes.
      
        Included in here are:
      
         - iio driver fixes for reported problems
      
         - phy driver fixes for a number of reported problems
      
         - mhi resume bugfix for broken hardware
      
         - nvmem driver fix
      
         - rtsx driver fix for irq issues
      
         - fastrpc packet parsing fix
      
        All of these have been in linux-next for a while with no reported
        issues"
      
      * tag 'char-misc-5.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (33 commits)
        bus: mhi: core: Add support for forced PM resume
        iio: trigger: stm32-timer: fix MODULE_ALIAS
        misc: rtsx: Avoid mangling IRQ during runtime PM
        nvmem: eeprom: at25: fix FRAM byte_len
        misc: fastrpc: fix improper packet size calculation
        MAINTAINERS: add maintainer for Qualcomm FastRPC driver
        bus: mhi: pci_generic: Fix device recovery failed issue
        iio: adc: stm32: fix null pointer on defer_probe error
        phy: HiSilicon: Fix copy and paste bug in error handling
        dt-bindings: phy: zynqmp-psgtr: fix USB phy name
        phy: ti: omap-usb2: Fix the kernel-doc style
        phy: qualcomm: ipq806x-usb: Fix kernel-doc style
        iio: at91-sama5d2: Fix incorrect sign extension
        iio: adc: axp20x_adc: fix charging current reporting on AXP22x
        iio: gyro: adxrs290: fix data signedness
        phy: ti: tusb1210: Fix the kernel-doc warn
        phy: qualcomm: usb-hsic: Fix the kernel-doc warn
        phy: qualcomm: qmp: Add missing struct documentation
        phy: mvebu-cp110-utmi: Fix kernel-doc warns
        iio: ad7768-1: Call iio_trigger_notify_done() on error
        ...
      8d7ed104
    • L
      Merge tag 'timers-urgent-2021-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c7fc5126
      Linus Torvalds 提交于
      Pull timer fixes from Thomas Gleixner:
       "Two fixes for clock chip drivers:
      
         - A regression fix for the Designware APB timer. A recent change to
           the error checking code transformed the error condition wrongly so
           it turned into a fail if good condition.
      
         - Fix a clang build fail of the ARM architected timer driver"
      
      * tag 'timers-urgent-2021-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/arm_arch_timer: Force inlining of erratum_set_next_event_generic()
        clocksource/drivers/dw_apb_timer_of: Fix probe failure
      c7fc5126