1. 17 8月, 2021 1 次提交
    • C
      SUNRPC: Add svc_rqst_replace_page() API · 2f0f88f4
      Chuck Lever 提交于
      Replacing a page in rq_pages[] requires a get_page(), which is a
      bus-locked operation, and a put_page(), which can be even more
      costly.
      
      To reduce the cost of replacing a page in rq_pages[], batch the
      put_page() operations by collecting "freed" pages in a pagevec,
      and then release those pages when the pagevec is full. This
      pagevec is also emptied when each RPC completes.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      2f0f88f4
  2. 23 4月, 2021 1 次提交
  3. 07 3月, 2021 1 次提交
  4. 25 1月, 2021 1 次提交
  5. 01 12月, 2020 2 次提交
    • C
      SUNRPC: Prepare for xdr_stream-style decoding on the server-side · 5191955d
      Chuck Lever 提交于
      A "permanent" struct xdr_stream is allocated in struct svc_rqst so
      that it is usable by all server-side decoders. A per-rqst scratch
      buffer is also allocated to handle decoding XDR data items that
      cross page boundaries.
      
      To demonstrate how it will be used, add the first call site for the
      new svcxdr_init_decode() API.
      
      As an additional part of the overall conversion, add symbolic
      constants for successful and failed XDR operations. Returning "0" is
      overloaded. Sometimes it means something failed, but sometimes it
      means success. To make it more clear when XDR decoding functions
      succeed or fail, introduce symbolic constants.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      5191955d
    • C
      SUNRPC: Rename svc_encode_read_payload() · 03493bca
      Chuck Lever 提交于
      Clean up: "result payload" is a less confusing name for these
      payloads. "READ payload" reflects only the NFS usage.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      03493bca
  6. 18 5月, 2020 1 次提交
  7. 12 5月, 2020 1 次提交
  8. 17 3月, 2020 2 次提交
    • C
      SUNRPC: Teach server to use xprt_sock_sendmsg for socket sends · da1661b9
      Chuck Lever 提交于
      xprt_sock_sendmsg uses the more efficient iov_iter-enabled kernel
      socket API, and is a pre-requisite for server send-side support for
      TLS.
      
      Note that svc_process no longer needs to reserve a word for the
      stream record marker, since the TCP transport now provides the
      record marker automatically in a separate buffer.
      
      The dprintk() in svc_send_common is also removed. It didn't seem
      crucial for field troubleshooting. If more is needed there, a trace
      point could be added in xprt_sock_sendmsg().
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      da1661b9
    • C
      nfsd: Fix NFSv4 READ on RDMA when using readv · 41205539
      Chuck Lever 提交于
      svcrdma expects that the payload falls precisely into the xdr_buf
      page vector. This does not seem to be the case for
      nfsd4_encode_readv().
      
      This code is called only when fops->splice_read is missing or when
      RQ_SPLICE_OK is clear, so it's not a noticeable problem in many
      common cases.
      
      Add new transport method: ->xpo_read_payload so that when a READ
      payload does not fit exactly in rq_res's page vector, the XDR
      encoder can inform the RPC transport exactly where that payload is,
      without the payload's XDR pad.
      
      That way, when a Write chunk is present, the transport knows what
      byte range in the Reply message is supposed to be matched with the
      chunk.
      
      Note that the Linux NFS server implementation of NFS/RDMA can
      currently handle only one Write chunk per RPC-over-RDMA message.
      This simplifies the implementation of this fix.
      
      Fixes: b0420980 ("nfsd4: allow exotic read compounds")
      Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=198053Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      41205539
  9. 31 10月, 2019 1 次提交
  10. 25 9月, 2019 1 次提交
  11. 18 7月, 2019 1 次提交
  12. 21 5月, 2019 1 次提交
  13. 24 4月, 2019 4 次提交
  14. 14 2月, 2019 1 次提交
    • C
      SUNRPC: Remove rpc_xprt::tsh_size · 067fb11b
      Chuck Lever 提交于
      tsh_size was added to accommodate transports that send a pre-amble
      before each RPC message. However, this assumes the pre-amble is
      fixed in size, which isn't true for some transports. That makes
      tsh_size not very generic.
      
      Also I'd like to make the estimation of RPC send and receive
      buffer sizes more precise. tsh_size doesn't currently appear to be
      accounted for at all by call_allocate.
      
      Therefore let's just remove the tsh_size concept, and make the only
      transports that have a non-zero tsh_size employ a direct approach.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      067fb11b
  15. 28 12月, 2018 3 次提交
    • V
      sunrpc: make visible processing error in bc_svc_process() · 8f7766c8
      Vasily Averin 提交于
      Force bc_svc_process() to generate debug message after processing errors
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      8f7766c8
    • V
      sunrpc: remove unused xpo_prep_reply_hdr callback · 64e20ba2
      Vasily Averin 提交于
      xpo_prep_reply_hdr are not used now.
      
      It was defined for tcp transport only, however it cannot be
      called indirectly, so let's move it to its caller and
      remove unused callback.
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      64e20ba2
    • V
      sunrpc: use-after-free in svc_process_common() · d4b09acf
      Vasily Averin 提交于
      if node have NFSv41+ mounts inside several net namespaces
      it can lead to use-after-free in svc_process_common()
      
      svc_process_common()
              /* Setup reply header */
              rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE
      
      svc_process_common() can use incorrect rqstp->rq_xprt,
      its caller function bc_svc_process() takes it from serv->sv_bc_xprt.
      The problem is that serv is global structure but sv_bc_xprt
      is assigned per-netnamespace.
      
      According to Trond, the whole "let's set up rqstp->rq_xprt
      for the back channel" is nothing but a giant hack in order
      to work around the fact that svc_process_common() uses it
      to find the xpt_ops, and perform a couple of (meaningless
      for the back channel) tests of xpt_flags.
      
      All we really need in svc_process_common() is to be able to run
      rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()
      
      Bruce J Fields points that this xpo_prep_reply_hdr() call
      is an awfully roundabout way just to do "svc_putnl(resv, 0);"
      in the tcp case.
      
      This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(),
      now it calls svc_process_common() with rqstp->rq_xprt = NULL.
      
      To adjust reply header svc_process_common() just check
      rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.
      
      To handle rqstp->rq_xprt = NULL case in functions called from
      svc_process_common() patch intruduces net namespace pointer
      svc_rqst->rq_bc_net and adjust SVC_NET() definition.
      Some other function was also adopted to properly handle described case.
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Cc: stable@vger.kernel.org
      Fixes: 23c20ecd ("NFS: callback up - users counting cleanup")
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      d4b09acf
  16. 10 8月, 2018 2 次提交
    • C
      NFSD: Handle full-length symlinks · 11b4d66e
      Chuck Lever 提交于
      I've given up on the idea of zero-copy handling of SYMLINK on the
      server side. This is because the Linux VFS symlink API requires the
      symlink pathname to be in a NUL-terminated kmalloc'd buffer. The
      NUL-termination is going to be problematic (watching out for
      landing on a page boundary and dealing with a 4096-byte pathname).
      
      I don't believe that SYMLINK creation is on a performance path or is
      requested frequently enough that it will cause noticeable CPU cache
      pollution due to data copies.
      
      There will be two places where a transport callout will be necessary
      to fill in the rqstp: one will be in the svc_fill_symlink_pathname()
      helper that is used by NFSv2 and NFSv3, and the other will be in
      nfsd4_decode_create().
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      11b4d66e
    • C
      NFSD: Refactor the generic write vector fill helper · 3fd9557a
      Chuck Lever 提交于
      fill_in_write_vector() is nearly the same logic as
      svc_fill_write_vector(), but there are a few differences so that
      the former can handle multiple WRITE payloads in a single COMPOUND.
      
      svc_fill_write_vector() can be adjusted so that it can be used in
      the NFSv4 WRITE code path too. Instead of assuming the pages are
      coming from rq_args.pages, have the caller pass in the page list.
      
      The immediate benefit is a reduction of code duplication. It also
      prevents the NFSv4 WRITE decoder from passing an empty vector
      element when the transport has provided the payload in the xdr_buf's
      page array.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      3fd9557a
  17. 04 4月, 2018 3 次提交
    • C
      NFSD: Clean up legacy NFS SYMLINK argument XDR decoders · 38a70315
      Chuck Lever 提交于
      Move common code in NFSD's legacy SYMLINK decoders into a helper.
      The immediate benefits include:
      
       - one fewer data copies on transports that support DDP
       - consistent error checking across all versions
       - reduction of code duplication
       - support for both legal forms of SYMLINK requests on RDMA
         transports for all versions of NFS (in particular, NFSv2, for
         completeness)
      
      In the long term, this helper is an appropriate spot to perform a
      per-transport call-out to fill the pathname argument using, say,
      RDMA Reads.
      
      Filling the pathname in the proc function also means that eventually
      the incoming filehandle can be interpreted so that filesystem-
      specific memory can be allocated as a sink for the pathname
      argument, rather than using anonymous pages.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      38a70315
    • C
      NFSD: Clean up legacy NFS WRITE argument XDR decoders · 8154ef27
      Chuck Lever 提交于
      Move common code in NFSD's legacy NFS WRITE decoders into a helper.
      The immediate benefit is reduction of code duplication and some nice
      micro-optimizations (see below).
      
      In the long term, this helper can perform a per-transport call-out
      to fill the rq_vec (say, using RDMA Reads).
      
      The legacy WRITE decoders and procs are changed to work like NFSv4,
      which constructs the rq_vec just before it is about to call
      vfs_writev.
      
      Why? Calling a transport call-out from the proc instead of the XDR
      decoder means that the incoming FH can be resolved to a particular
      filesystem and file. This would allow pages from the backing file to
      be presented to the transport to be filled, rather than presenting
      anonymous pages and copying or flipping them into the file's page
      cache later.
      
      I also prefer using the pages in rq_arg.pages, instead of pulling
      the data pages directly out of the rqstp::rq_pages array. This is
      currently the way the NFSv3 write decoder works, but the other two
      do not seem to take this approach. Fixing this removes the only
      reference to rq_pages found in NFSD, eliminating an NFSD assumption
      about how transports use the pages in rq_pages.
      
      Lastly, avoid setting up the first element of rq_vec as a zero-
      length buffer. This happens with an RDMA transport when a normal
      Read chunk is present because the data payload is in rq_arg's
      page list (none of it is in the head buffer).
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      8154ef27
    • C
      sunrpc: Re-purpose trace_svc_process · 0b9547bf
      Chuck Lever 提交于
      Currently, trace_svc_process has two call sites:
      
      1. Just after a call to svc_send. svc_send already invokes
         trace_svc_send with the same arguments just before returning
      
      2. Just before a call to svc_drop. svc_drop already invokes
         trace_svc_drop with the same arguments just after it is called
      
      Therefore trace_svc_process does not provide any additional
      information not already provided by these other trace points.
      
      However, it would be useful to record the incoming RPC procedure.
      So reuse trace_svc_process for this purpose.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0b9547bf
  18. 31 10月, 2017 1 次提交
    • K
      treewide: Fix function prototypes for module_param_call() · e4dca7b7
      Kees Cook 提交于
      Several function prototypes for the set/get functions defined by
      module_param_call() have a slightly wrong argument types. This fixes
      those in an effort to clean up the calls when running under type-enforced
      compiler instrumentation for CFI. This is the result of running the
      following semantic patch:
      
      @match_module_param_call_function@
      declarer name module_param_call;
      identifier _name, _set_func, _get_func;
      expression _arg, _mode;
      @@
      
       module_param_call(_name, _set_func, _get_func, _arg, _mode);
      
      @fix_set_prototype
       depends on match_module_param_call_function@
      identifier match_module_param_call_function._set_func;
      identifier _val, _param;
      type _val_type, _param_type;
      @@
      
       int _set_func(
      -_val_type _val
      +const char * _val
       ,
      -_param_type _param
      +const struct kernel_param * _param
       ) { ... }
      
      @fix_get_prototype
       depends on match_module_param_call_function@
      identifier match_module_param_call_function._get_func;
      identifier _val, _param;
      type _val_type, _param_type;
      @@
      
       int _get_func(
      -_val_type _val
      +char * _val
       ,
      -_param_type _param
      +const struct kernel_param * _param
       ) { ... }
      
      Two additional by-hand changes are included for places where the above
      Coccinelle script didn't notice them:
      
      	drivers/platform/x86/thinkpad_acpi.c
      	fs/lockd/svc.c
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NJessica Yu <jeyu@kernel.org>
      e4dca7b7
  19. 18 10月, 2017 1 次提交
    • K
      sunrpc: Convert timers to use timer_setup() · ff861c4d
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Anna Schumaker <anna.schumaker@netapp.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-nfs@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff861c4d
  20. 25 8月, 2017 1 次提交
  21. 14 7月, 2017 7 次提交
  22. 29 6月, 2017 1 次提交
    • C
      sunrpc: Disable splice for krb5i · 06eb8a56
      Chuck Lever 提交于
      Running a multi-threaded 8KB fio test (70/30 mix), three or four out
      of twelve of the jobs fail when using krb5i. The failure is an EIO
      on a read.
      
      Troubleshooting confirmed the EIO results when the client fails to
      verify the MIC of an NFS READ reply. Bruce suggested the problem
      could be due to the data payload changing between the time the
      reply's MIC was computed on the server and the time the reply was
      actually sent.
      
      krb5p gets around this problem by disabling RQ_SPLICE_OK. Use the
      same mechanism for krb5i RPCs.
      
      "iozone -i0 -i1 -s128m -y1k -az -I", export is tmpfs, mount is
      sec=krb5i,vers=3,proto=rdma. The important numbers are the
      read / reread column.
      
      Here's without the RQ_SPLICE_OK patch:
      
                    kB  reclen    write  rewrite    read    reread
                131072       1     7546     7929     8396     8267
                131072       2    14375    14600    15843    15639
                131072       4    19280    19248    21303    21410
                131072       8    32350    31772    35199    34883
                131072      16    36748    37477    49365    51706
                131072      32    55669    56059    57475    57389
                131072      64    74599    75190    74903    75550
                131072     128    99810   101446   102828   102724
                131072     256   122042   122612   124806   125026
                131072     512   137614   138004   141412   141267
                131072    1024   146601   148774   151356   151409
                131072    2048   180684   181727   293140   292840
                131072    4096   206907   207658   552964   549029
                131072    8192   223982   224360   454493   473469
                131072   16384   228927   228390   654734   632607
      
      And here's with it:
      
                    kB  reclen    write  rewrite    read    reread
                131072       1     7700     7365     7958     8011
                131072       2    13211    13303    14937    14414
                131072       4    19001    19265    20544    20657
                131072       8    30883    31097    34255    33566
                131072      16    36868    34908    51499    49944
                131072      32    56428    55535    58710    56952
                131072      64    73507    74676    75619    74378
                131072     128   100324   101442   103276   102736
                131072     256   122517   122995   124639   124150
                131072     512   137317   139007   140530   140830
                131072    1024   146807   148923   151246   151072
                131072    2048   179656   180732   292631   292034
                131072    4096   206216   208583   543355   541951
                131072    8192   223738   224273   494201   489372
                131072   16384   229313   229840   691719   668427
      
      I would say that there is not much difference in this test.
      
      For good measure, here's the same test with sec=krb5p:
      
                    kB  reclen    write  rewrite    read    reread
                131072       1     5982     5881     6137     6218
                131072       2    10216    10252    10850    10932
                131072       4    12236    12575    15375    15526
                131072       8    15461    15462    23821    22351
                131072      16    25677    25811    27529    27640
                131072      32    31903    32354    34063    33857
                131072      64    42989    43188    45635    45561
                131072     128    52848    53210    56144    56141
                131072     256    59123    59214    62691    62933
                131072     512    63140    63277    66887    67025
                131072    1024    65255    65299    69213    69140
                131072    2048    76454    76555   133767   133862
                131072    4096    84726    84883   251925   250702
                131072    8192    89491    89482   270821   276085
                131072   16384    91572    91597   361768   336868
      
      BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=307Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      06eb8a56
  23. 15 5月, 2017 2 次提交