1. 04 4月, 2018 22 次提交
    • J
      nfsd: fix incorrect umasks · 880a3a53
      J. Bruce Fields 提交于
      We're neglecting to clear the umask after it's set, which can cause a
      later unrelated rpc to (incorrectly) use the same umask if it happens to
      be processed by the same thread.
      
      There's a more subtle problem here too:
      
      An NFSv4 compound request is decoded all in one pass before any
      operations are executed.
      
      Currently we're setting current->fs->umask at the time we decode the
      compound.  In theory a single compound could contain multiple creates
      each setting a umask.  In that case we'd end up using whichever umask
      was passed in the *last* operation as the umask for all the creates,
      whether that was correct or not.
      
      So, we should just be saving the umask at decode time and waiting to set
      it until we actually process the corresponding operation.
      
      In practice it's unlikely any client would do multiple creates in a
      single compound.  And even if it did they'd likely be from the same
      process (hence carry the same umask).  So this is a little academic, but
      we should get it right anyway.
      
      Fixes: 47057abd (nfsd: add support for the umask attribute)
      Cc: stable@vger.kernel.org
      Reported-by: NLucash Stach <l.stach@pengutronix.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      880a3a53
    • E
      sunrpc: remove incorrect HMAC request initialization · f3aefb6a
      Eric Biggers 提交于
      make_checksum_hmac_md5() is allocating an HMAC transform and doing
      crypto API calls in the following order:
      
          crypto_ahash_init()
          crypto_ahash_setkey()
          crypto_ahash_digest()
      
      This is wrong because it makes no sense to init() the request before a
      key has been set, given that the initial state depends on the key.  And
      digest() is short for init() + update() + final(), so in this case
      there's no need to explicitly call init() at all.
      
      Before commit 9fa68f62 ("crypto: hash - prevent using keyed hashes
      without setting key") the extra init() had no real effect, at least for
      the software HMAC implementation.  (There are also hardware drivers that
      implement HMAC-MD5, and it's not immediately obvious how gracefully they
      handle init() before setkey().)  But now the crypto API detects this
      incorrect initialization and returns -ENOKEY.  This is breaking NFS
      mounts in some cases.
      
      Fix it by removing the incorrect call to crypto_ahash_init().
      Reported-by: NMichael Young <m.a.young@durham.ac.uk>
      Fixes: 9fa68f62 ("crypto: hash - prevent using keyed hashes without setting key")
      Fixes: fffdaef2 ("gss_krb5: Add support for rc4-hmac encryption")
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f3aefb6a
    • C
      NFSD: Clean up legacy NFS SYMLINK argument XDR decoders · 38a70315
      Chuck Lever 提交于
      Move common code in NFSD's legacy SYMLINK decoders into a helper.
      The immediate benefits include:
      
       - one fewer data copies on transports that support DDP
       - consistent error checking across all versions
       - reduction of code duplication
       - support for both legal forms of SYMLINK requests on RDMA
         transports for all versions of NFS (in particular, NFSv2, for
         completeness)
      
      In the long term, this helper is an appropriate spot to perform a
      per-transport call-out to fill the pathname argument using, say,
      RDMA Reads.
      
      Filling the pathname in the proc function also means that eventually
      the incoming filehandle can be interpreted so that filesystem-
      specific memory can be allocated as a sink for the pathname
      argument, rather than using anonymous pages.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      38a70315
    • C
      NFSD: Clean up legacy NFS WRITE argument XDR decoders · 8154ef27
      Chuck Lever 提交于
      Move common code in NFSD's legacy NFS WRITE decoders into a helper.
      The immediate benefit is reduction of code duplication and some nice
      micro-optimizations (see below).
      
      In the long term, this helper can perform a per-transport call-out
      to fill the rq_vec (say, using RDMA Reads).
      
      The legacy WRITE decoders and procs are changed to work like NFSv4,
      which constructs the rq_vec just before it is about to call
      vfs_writev.
      
      Why? Calling a transport call-out from the proc instead of the XDR
      decoder means that the incoming FH can be resolved to a particular
      filesystem and file. This would allow pages from the backing file to
      be presented to the transport to be filled, rather than presenting
      anonymous pages and copying or flipping them into the file's page
      cache later.
      
      I also prefer using the pages in rq_arg.pages, instead of pulling
      the data pages directly out of the rqstp::rq_pages array. This is
      currently the way the NFSv3 write decoder works, but the other two
      do not seem to take this approach. Fixing this removes the only
      reference to rq_pages found in NFSD, eliminating an NFSD assumption
      about how transports use the pages in rq_pages.
      
      Lastly, avoid setting up the first element of rq_vec as a zero-
      length buffer. This happens with an RDMA transport when a normal
      Read chunk is present because the data payload is in rq_arg's
      page list (none of it is in the head buffer).
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      8154ef27
    • C
      nfsd: Trace NFSv4 COMPOUND execution · fff4080b
      Chuck Lever 提交于
      This helps record the identity and timing of the ops in each NFSv4
      COMPOUND, replacing dprintk calls that did much the same thing.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      fff4080b
    • C
      nfsd: Add I/O trace points in the NFSv4 read proc · 87c5942e
      Chuck Lever 提交于
      NFSv4 read compound processing invokes nfsd_splice_read and
      nfs_readv directly, so the trace points currently in nfsd_read are
      not invoked for NFSv4 reads.
      
      Move the NFSD READ trace points to common helpers so that NFSv4
      reads are captured.
      
      Also, record any local I/O error that occurs, the total count of
      bytes that were actually returned, and whether splice or vectored
      read was used.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      87c5942e
    • C
      nfsd: Add I/O trace points in the NFSv4 write path · d890be15
      Chuck Lever 提交于
      NFSv4 write compound processing invokes nfsd_vfs_write directly. The
      trace points currently in nfsd_write are not effective for NFSv4
      writes.
      
      Move the trace points into the shared nfsd_vfs_write() helper.
      
      After the I/O, we also want to record any local I/O error that
      might have occurred, and the total count of bytes that were actually
      moved (rather than the requested number).
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      d890be15
    • C
      nfsd: Add "nfsd_" to trace point names · f394b62b
      Chuck Lever 提交于
      Follow naming convention used in client and in sunrpc layers.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f394b62b
    • C
      nfsd: Record request byte count, not count of vectors · 79e0b4e2
      Chuck Lever 提交于
      Byte count is more helpful to know than vector count.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      79e0b4e2
    • C
      nfsd: Fix NFSD trace points · afa720a0
      Chuck Lever 提交于
      nfsd-1915  [003] 77915.780959: write_opened:
      	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1
      nfsd-1915  [003] 77915.780960: write_io_done:
      	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1
      nfsd-1915  [003] 77915.780964: write_done:
      	[FAILED TO PARSE] xid=3286130958 fh=0 offset=154624 len=1
      
      Byte swapping and knfsd_fh_hash() are not available in "trace-cmd
      report", where the print format string is actually used. These
      data transformations have to be done during the TP_fast_assign step.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      afa720a0
    • C
      svc: Report xprt dequeue latency · 55f5088c
      Chuck Lever 提交于
      Record the time between when a rqstp is enqueued on a transport
      and when it is dequeued. This includes how long the rqstp waits on
      the queue and how long it takes the kernel scheduler to wake a
      nfsd thread to service it.
      
      The svc_xprt_dequeue trace point is altered to include the number
      of microseconds between xprt_enqueue and xprt_dequeue.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      55f5088c
    • C
      sunrpc: Report per-RPC execution stats · aaba72cd
      Chuck Lever 提交于
      Introduce a mechanism to report the server-side execution latency of
      each RPC. The goal is to enable user space to filter the trace
      record for latency outliers, build histograms, etc.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      aaba72cd
    • C
      sunrpc: Re-purpose trace_svc_process · 0b9547bf
      Chuck Lever 提交于
      Currently, trace_svc_process has two call sites:
      
      1. Just after a call to svc_send. svc_send already invokes
         trace_svc_send with the same arguments just before returning
      
      2. Just before a call to svc_drop. svc_drop already invokes
         trace_svc_drop with the same arguments just after it is called
      
      Therefore trace_svc_process does not provide any additional
      information not already provided by these other trace points.
      
      However, it would be useful to record the incoming RPC procedure.
      So reuse trace_svc_process for this purpose.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0b9547bf
    • C
      sunrpc: Save remote presentation address in svc_xprt for trace events · ece200dd
      Chuck Lever 提交于
      TP_printk defines a format string that is passed to user space for
      converting raw trace event records to something human-readable.
      
      My user space's printf (Oracle Linux 7), however, does not have a
      %pI format specifier. The result is that what is supposed to be an
      IP address in the output of "trace-cmd report" is just a string that
      says the field couldn't be displayed.
      
      To fix this, adopt the same approach as the client: maintain a pre-
      formated presentation address for occasions when %pI is not
      available.
      
      The location of the trace_svc_send trace point is adjusted so that
      rqst->rq_xprt is not NULL when the trace event is recorded.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ece200dd
    • C
      sunrpc: Simplify trace_svc_recv · 41f306d0
      Chuck Lever 提交于
      There doesn't seem to be a lot of value in calling trace_svc_recv
      in the failing case.
      
      1. There are two very common cases: one is the transport is not
      ready, and the other is shutdown. Neither is terribly interesting.
      
      2. The trace record for the failing case contains nothing but
      the status code.
      
      Therefore the trace point call site in the error exit is removed.
      Since the trace point is now recording a length instead of a
      status, rename the status field and remove the case that records a
      zero XID.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      41f306d0
    • C
      sunrpc: Simplify do_enqueue tracing · 7dbb53ba
      Chuck Lever 提交于
      There are three cases where svc_xprt_do_enqueue() returns without
      waking an nfsd thread:
      
      1. There is no work to do
      
      2. The transport is already busy
      
      3. There are no available nfsd threads
      
      Only 3. is truly interesting. Move the trace point so it records
      that there was work to do and either an nfsd thread was awoken, or
      a free one could not found.
      
      As an additional clean up, remove a redundant comment and a couple
      of dprintk call sites.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      7dbb53ba
    • C
      sunrpc: Move trace_svc_xprt_dequeue() · caa3e106
      Chuck Lever 提交于
      Reduce the amount of noise generated by trace_svc_xprt_dequeue by
      moving it to the end of svc_get_next_xprt. This generates exactly
      one trace event when a ready xprt is found, rather than spurious
      events when there is no work to do. The empty events contain no
      information that can't be obtained simply by tracing function calls
      to svc_xprt_dequeue.
      
      A small additional benefit is simplification of the svc_xprt_event
      trace class, which no longer has to handle the case when the @xprt
      parameter is NULL.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      caa3e106
    • C
      sunrpc: Update show_svc_xprt_flags() to include recently added flags · 03edb90f
      Chuck Lever 提交于
      XPT_KILL_TEMP was added by commit 546125d1 ("sunrpc: don't call
      sleeping functions from the notifier block callbacks"), and
      XPT_CONG_CTRL was added by commit 362142b2 ("sunrpc: flag
      transports as having congestion control") .
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      03edb90f
    • C
      svc: Simplify ->xpo_secure_port · 989f881e
      Chuck Lever 提交于
      Clean up: Instead of returning a value that is used to set or clear
      a bit, just make ->xpo_secure_port mangle that bit, and return void.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      989f881e
    • C
      sunrpc: Remove unneeded pointer dereference · 63a1b156
      Chuck Lever 提交于
      Clean up: Noticed during code inspection that there is already a
      local automatic variable "xprt" so dereferencing rqst->rq_xprt
      again is unnecessary.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      63a1b156
    • S
      nfsd: use correct enum type in decode_cb_op_status · 47299f79
      Stefan Agner 提交于
      Use enum nfs_cb_opnum4 in decode_cb_op_status. This fixes warnings
      seen with clang:
        fs/nfsd/nfs4callback.c:451:36: warning: implicit conversion from
            enumeration type 'enum nfs_cb_opnum4' to different enumeration
            type 'enum nfs_opnum4' [-Wenum-conversion]
              status = decode_cb_op_status(xdr, OP_CB_SEQUENCE, &cb->cb_seq_status);
                       ~~~~~~~~~~~~~~~~~~~      ^~~~~~~~~~~~~~
      Signed-off-by: NStefan Agner <stefan@agner.ch>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      47299f79
    • F
      nfsd: fix boolreturn.cocci warnings · 51d87bc2
      Fengguang Wu 提交于
      fs/nfsd/nfs4state.c:926:8-9: WARNING: return of 0/1 in function 'nfs4_delegation_exists' with return type bool
      fs/nfsd/nfs4state.c:2955:9-10: WARNING: return of 0/1 in function 'nfsd4_compound_in_session' with return type bool
      
       Return statements in functions returning bool should use
       true/false instead of 1/0.
      Generated by: scripts/coccinelle/misc/boolreturn.cocci
      
      Fixes: 68b18f52 ("nfsd: make nfs4_get_existing_delegation less confusing")
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      [bfields: also fix -EAGAIN]
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      51d87bc2
  2. 21 3月, 2018 11 次提交
  3. 20 3月, 2018 7 次提交