1. 09 3月, 2021 1 次提交
  2. 17 2月, 2021 3 次提交
  3. 15 2月, 2021 1 次提交
  4. 06 2月, 2021 6 次提交
    • C
      xprtrdma: Clean up rpcrdma_prepare_readch() · 586a0787
      Chuck Lever 提交于
      Since commit 9ed5af26 ("SUNRPC: Clean up the handling of page
      padding in rpc_prepare_reply_pages()") [Dec 2020] the NFS client
      passes payload data to the transport with the padding in xdr->pages
      instead of in the send buffer's tail kvec. There's no need for the
      extra logic to advance the base of the tail kvec because the upper
      layer no longer places XDR padding there.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      586a0787
    • C
      xprtrdma: Pad optimization, revisited · 2324fbed
      Chuck Lever 提交于
      The NetApp Linux team discovered that with NFS/RDMA servers that do
      not support RFC 8797, the Linux client is forming NFSv4.x WRITE
      requests incorrectly.
      
      In this case, the Linux NFS client disables implicit chunk round-up
      for odd-length Read and Write chunks. The goal was to support old
      servers that needed that padding to be sent explicitly by clients.
      
      In that case the Linux NFS included the tail kvec in the Read chunk,
      since the tail contains any needed padding. That meant a separate
      memory registration is needed for the tail kvec, adding to the cost
      of forming such requests. To avoid that cost for a mere 3 bytes of
      zeroes that are always ignored by receivers, we try to use implicit
      roundup when possible.
      
      For NFSv4.x, the tail kvec also sometimes contains a trailing
      GETATTR operation. The Linux NFS client unintentionally includes
      that GETATTR operation in the Read chunk as well as inline.
      
      The fix is simply to /never/ include the tail kvec when forming a
      data payload Read chunk. The padding is thus now always present.
      
      Note that since commit 9ed5af26 ("SUNRPC: Clean up the handling
      of page padding in rpc_prepare_reply_pages()") [Dec 2020] the NFS
      client passes payload data to the transport with the padding in
      xdr->pages instead of in the send buffer's tail kvec. So now the
      Linux NFS client appends XDR padding to all odd-sized Read chunks.
      This shouldn't be a problem because:
      
       - RFC 8166-compliant servers are supposed to work with or without
         that XDR padding in Read chunks.
      
       - Since the padding is now in the same memory region as the data
         payload, a separate memory registration is not needed. In
         addition, the link layer extends data in RDMA Read responses to
         4-byte boundaries anyway. Thus there is now no savings when the
         padding is not included.
      
      Because older kernels include the payload's XDR padding in the
      tail kvec, a fix there will be more complicated. Thus backporting
      this patch is not recommended.
      
      Reported by: Olga Kornievskaia <Olga.Kornievskaia@netapp.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NTom Talpey <tom@talpey.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      2324fbed
    • C
      rpcrdma: Fix comments about reverse-direction operation · 84dff5eb
      Chuck Lever 提交于
      During the final stages of publication of RFC 8167, reviewers
      requested that we use the term "reverse direction" rather than
      "backwards direction". Update comments to reflect this preference.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NTom Talpey <tom@talpey.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      84dff5eb
    • C
      xprtrdma: Refactor invocations of offset_in_page() · 67b16625
      Chuck Lever 提交于
      Clean up so that offset_in_page() is invoked less often in the
      most common case, which is mapping xdr->pages.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NTom Talpey <tom@talpey.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      67b16625
    • C
      xprtrdma: Simplify rpcrdma_convert_kvec() and frwr_map() · 54e6aec5
      Chuck Lever 提交于
      Clean up.
      
      Remove a conditional branch from the SGL set-up loop in frwr_map():
      Instead of using either sg_set_page() or sg_set_buf(), initialize
      the mr_page field properly when rpcrdma_convert_kvec() converts the
      kvec to an SGL entry. frwr_map() can then invoke sg_set_page()
      unconditionally.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NTom Talpey <tom@talpey.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      54e6aec5
    • C
      xprtrdma: Remove FMR support in rpcrdma_convert_iovs() · 9929f4ad
      Chuck Lever 提交于
      Support for FMR was removed by commit ba69cd12 ("xprtrdma:
      Remove support for FMR memory registration") [Dec 2018]. That means
      the buffer-splitting behavior of rpcrdma_convert_kvec(), added by
      commit 821c791a ("xprtrdma: Segment head and tail XDR buffers
      on page boundaries") [Mar 2016], is no longer necessary. FRWR
      memory registration handles this case with aplomb.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      9929f4ad
  5. 02 2月, 2021 1 次提交
  6. 01 2月, 2021 3 次提交
  7. 26 1月, 2021 2 次提交
    • D
      SUNRPC: Handle 0 length opaque XDR object data properly · e4a7d1f7
      Dave Wysochanski 提交于
      When handling an auth_gss downcall, it's possible to get 0-length
      opaque object for the acceptor.  In the case of a 0-length XDR
      object, make sure simple_get_netobj() fills in dest->data = NULL,
      and does not continue to kmemdup() which will set
      dest->data = ZERO_SIZE_PTR for the acceptor.
      
      The trace event code can handle NULL but not ZERO_SIZE_PTR for a
      string, and so without this patch the rpcgss_context trace event
      will crash the kernel as follows:
      
      [  162.887992] BUG: kernel NULL pointer dereference, address: 0000000000000010
      [  162.898693] #PF: supervisor read access in kernel mode
      [  162.900830] #PF: error_code(0x0000) - not-present page
      [  162.902940] PGD 0 P4D 0
      [  162.904027] Oops: 0000 [#1] SMP PTI
      [  162.905493] CPU: 4 PID: 4321 Comm: rpc.gssd Kdump: loaded Not tainted 5.10.0 #133
      [  162.908548] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  162.910978] RIP: 0010:strlen+0x0/0x20
      [  162.912505] Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31
      [  162.920101] RSP: 0018:ffffaec900c77d90 EFLAGS: 00010202
      [  162.922263] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000fffde697
      [  162.925158] RDX: 000000000000002f RSI: 0000000000000080 RDI: 0000000000000010
      [  162.928073] RBP: 0000000000000010 R08: 0000000000000e10 R09: 0000000000000000
      [  162.930976] R10: ffff8e698a590cb8 R11: 0000000000000001 R12: 0000000000000e10
      [  162.933883] R13: 00000000fffde697 R14: 000000010034d517 R15: 0000000000070028
      [  162.936777] FS:  00007f1e1eb93700(0000) GS:ffff8e6ab7d00000(0000) knlGS:0000000000000000
      [  162.940067] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  162.942417] CR2: 0000000000000010 CR3: 0000000104eba000 CR4: 00000000000406e0
      [  162.945300] Call Trace:
      [  162.946428]  trace_event_raw_event_rpcgss_context+0x84/0x140 [auth_rpcgss]
      [  162.949308]  ? __kmalloc_track_caller+0x35/0x5a0
      [  162.951224]  ? gss_pipe_downcall+0x3a3/0x6a0 [auth_rpcgss]
      [  162.953484]  gss_pipe_downcall+0x585/0x6a0 [auth_rpcgss]
      [  162.955953]  rpc_pipe_write+0x58/0x70 [sunrpc]
      [  162.957849]  vfs_write+0xcb/0x2c0
      [  162.959264]  ksys_write+0x68/0xe0
      [  162.960706]  do_syscall_64+0x33/0x40
      [  162.962238]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  162.964346] RIP: 0033:0x7f1e1f1e57df
      Signed-off-by: NDave Wysochanski <dwysocha@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      e4a7d1f7
    • D
      SUNRPC: Move simple_get_bytes and simple_get_netobj into private header · ba6dfce4
      Dave Wysochanski 提交于
      Remove duplicated helper functions to parse opaque XDR objects
      and place inside new file net/sunrpc/auth_gss/auth_gss_internal.h.
      In the new file carry the license and copyright from the source file
      net/sunrpc/auth_gss/auth_gss.c.  Finally, update the comment inside
      include/linux/sunrpc/xdr.h since lockd is not the only user of
      struct xdr_netobj.
      Signed-off-by: NDave Wysochanski <dwysocha@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      ba6dfce4
  8. 25 1月, 2021 8 次提交
  9. 13 1月, 2021 1 次提交
    • C
      SUNRPC: Move the svc_xdr_recvfrom tracepoint again · 5f39d271
      Chuck Lever 提交于
      Commit 156708ad ("SUNRPC: Move the svc_xdr_recvfrom()
      tracepoint") tried to capture the correct XID in the trace record,
      but this line in svc_recv:
      
      	rqstp->rq_xid = svc_getu32(&rqstp->rq_arg.head[0]);
      
      alters the size of rq_arg.head[0].iov_len. The tracepoint records
      the correct XID but an incorrect value for the length of the
      xdr_buf's head.
      
      To keep the trace callsites simple, I've created two trace classes.
      One assumes the xdr_buf contains a full RPC message, and the XID
      can be extracted from it. The other assumes the contents of the
      xdr_buf are arbitrary, and the xid will be provided by the caller.
      
      Currently there is only one user of each class, but I expect we will
      need a few more tracepoints using each class as time goes on.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      5f39d271
  10. 11 1月, 2021 1 次提交
    • J
      net: sunrpc: interpret the return value of kstrtou32 correctly · 86b53fbf
      j.nixdorf@avm.de 提交于
      A return value of 0 means success. This is documented in lib/kstrtox.c.
      
      This was found by trying to mount an NFS share from a link-local IPv6
      address with the interface specified by its index:
      
        mount("[fe80::1%1]:/srv/nfs", "/mnt", "nfs", 0, "nolock,addr=fe80::1%1")
      
      Before this commit this failed with EINVAL and also caused the following
      message in dmesg:
      
        [...] NFS: bad IP address specified: addr=fe80::1%1
      
      The syscall using the same address based on the interface name instead
      of its index succeeds.
      
      Credits for this patch go to my colleague Christian Speich, who traced
      the origin of this bug to this line of code.
      Signed-off-by: NJohannes Nixdorf <j.nixdorf@avm.de>
      Fixes: 00cfaa94 ("replace strict_strto calls")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      86b53fbf
  11. 19 12月, 2020 1 次提交
    • C
      SUNRPC: Handle TCP socket sends with kernel_sendpage() again · 4a85a6a3
      Chuck Lever 提交于
      Daire Byrne reports a ~50% aggregrate throughput regression on his
      Linux NFS server after commit da1661b9 ("SUNRPC: Teach server to
      use xprt_sock_sendmsg for socket sends"), which replaced
      kernel_send_page() calls in NFSD's socket send path with calls to
      sock_sendmsg() using iov_iter.
      
      Investigation showed that tcp_sendmsg() was not using zero-copy to
      send the xdr_buf's bvec pages, but instead was relying on memcpy.
      This means copying every byte of a large NFS READ payload.
      
      It looks like TLS sockets do indeed support a ->sendpage method,
      so it's really not necessary to use xprt_sock_sendmsg() to support
      TLS fully on the server. A mechanical reversion of da1661b9 is
      not possible at this point, but we can re-implement the server's
      TCP socket sendmsg path using kernel_sendpage().
      Reported-by: NDaire Byrne <daire@dneg.com>
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209439Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      4a85a6a3
  12. 14 12月, 2020 10 次提交
  13. 09 12月, 2020 2 次提交