1. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  2. 09 2月, 2018 3 次提交
    • T
      SUNRPC: Don't call __UDPX_INC_STATS() from a preemptible context · 0afa6b44
      Trond Myklebust 提交于
      Calling __UDPX_INC_STATS() from a preemptible context leads to a
      warning of the form:
      
       BUG: using __this_cpu_add() in preemptible [00000000] code: kworker/u5:0/31
       caller is xs_udp_data_receive_workfn+0x194/0x270
       CPU: 1 PID: 31 Comm: kworker/u5:0 Not tainted 4.15.0-rc8-00076-g90ea9f1b #2
       Workqueue: xprtiod xs_udp_data_receive_workfn
       Call Trace:
        dump_stack+0x85/0xc1
        check_preemption_disabled+0xce/0xe0
        xs_udp_data_receive_workfn+0x194/0x270
        process_one_work+0x318/0x620
        worker_thread+0x20a/0x390
        ? process_one_work+0x620/0x620
        kthread+0x120/0x130
        ? __kthread_bind_mask+0x60/0x60
        ret_from_fork+0x24/0x30
      
      Since we're taking a spinlock in those functions anyway, let's fix the
      issue by moving the call so that it occurs under the spinlock.
      Reported-by: Nkernel test robot <fengguang.wu@intel.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      0afa6b44
    • O
      fix parallelism for rpc tasks · f515f86b
      Olga Kornievskaia 提交于
      Hi folks,
      
      On a multi-core machine, is it expected that we can have parallel RPCs
      handled by each of the per-core workqueue?
      
      In testing a read workload, observing via "top" command that a single
      "kworker" thread is running servicing the requests (no parallelism).
      It's more prominent while doing these operations over krb5p mount.
      
      What has been suggested by Bruce is to try this and in my testing I
      see then the read workload spread among all the kworker threads.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      f515f86b
    • C
      svcrdma: Fix Read chunk round-up · 175e0310
      Chuck Lever 提交于
      A single NFSv4 WRITE compound can often have three operations:
      PUTFH, WRITE, then GETATTR.
      
      When the WRITE payload is sent in a Read chunk, the client places
      the GETATTR in the inline part of the RPC/RDMA message, just after
      the WRITE operation (sans payload). The position value in the Read
      chunk enables the receiver to insert the Read chunk at the correct
      place in the received XDR stream; that is between the WRITE and
      GETATTR.
      
      According to RFC 8166, an NFS/RDMA client does not have to add XDR
      round-up to the Read chunk that carries the WRITE payload. The
      receiver adds XDR round-up padding if it is absent and the
      receiver's XDR decoder requires it to be present.
      
      Commit 193bcb7b ("svcrdma: Populate tail iovec when receiving")
      attempted to add support for receiving such a compound so that just
      the WRITE payload appears in rq_arg's page list, and the trailing
      GETATTR is placed in rq_arg's tail iovec. (TCP just strings the
      whole compound into the head iovec and page list, without regard
      to the alignment of the WRITE payload).
      
      The server transport logic also had to accommodate the optional XDR
      round-up of the Read chunk, which it did simply by lengthening the
      tail iovec when round-up was needed. This approach is adequate for
      the NFSv2 and NFSv3 WRITE decoders.
      
      Unfortunately it is not sufficient for nfsd4_decode_write. When the
      Read chunk length is a couple of bytes less than PAGE_SIZE, the
      computation at the end of nfsd4_decode_write allows argp->pagelen to
      go negative, which breaks the logic in read_buf that looks for the
      tail iovec.
      
      The result is that a WRITE operation whose payload length is just
      less than a multiple of a page succeeds, but the subsequent GETATTR
      in the same compound fails with NFS4ERR_OP_ILLEGAL because the XDR
      decoder can't find it. Clients ignore the error, but they must
      update their attribute cache via a separate round trip.
      
      As nfsd4_decode_write appears to expect the payload itself to always
      have appropriate XDR round-up, have svc_rdma_build_normal_read_chunk
      add the Read chunk XDR round-up to the page_len rather than
      lengthening the tail iovec.
      Reported-by: NOlga Kornievskaia <kolga@netapp.com>
      Fixes: 193bcb7b ("svcrdma: Populate tail iovec when receiving")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      175e0310
  3. 08 2月, 2018 1 次提交
  4. 07 2月, 2018 1 次提交
  5. 06 2月, 2018 2 次提交
  6. 03 2月, 2018 2 次提交
    • C
      xprtrdma: Fix BUG after a device removal · e89e8d8f
      Chuck Lever 提交于
      Michal Kalderon reports a BUG that occurs just after device removal:
      
      [  169.112490] rpcrdma: removing device qedr0 for 192.168.110.146:20049
      [  169.143909] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      [  169.181837] IP: rpcrdma_dma_unmap_regbuf+0xa/0x60 [rpcrdma]
      
      The RPC/RDMA client transport attempts to allocate some resources
      on demand. Registered buffers are one such resource. These are
      allocated (or re-allocated) by xprt_rdma_allocate to hold RPC Call
      and Reply messages. A hardware resource is associated with each of
      these buffers, as they can be used for a Send or Receive Work
      Request.
      
      If a device is removed from under an NFS/RDMA mount, the transport
      layer is responsible for releasing all hardware resources before
      the device can be finally unplugged. A BUG results when the NFS
      mount hasn't yet seen much activity: the transport tries to release
      resources that haven't yet been allocated.
      
      rpcrdma_free_regbuf() already checks for this case, so just move
      that check to cover the DEVICE_REMOVAL case as well.
      Reported-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
      Fixes: bebd0318 ("xprtrdma: Support unplugging an HCA ...")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      e89e8d8f
    • C
      xprtrdma: Fix calculation of ri_max_send_sges · 1179e2c2
      Chuck Lever 提交于
      Commit 16f906d6 ("xprtrdma: Reduce required number of send
      SGEs") introduced the rpcrdma_ia::ri_max_send_sges field. This fixes
      a problem where xprtrdma would not work if the device's max_sge
      capability was small (low single digits).
      
      At least RPCRDMA_MIN_SEND_SGES are needed for the inline parts of
      each RPC. ri_max_send_sges is set to this value:
      
        ia->ri_max_send_sges = max_sge - RPCRDMA_MIN_SEND_SGES;
      
      Then when marshaling each RPC, rpcrdma_args_inline uses that value
      to determine whether the device has enough Send SGEs to convey an
      NFS WRITE payload inline, or whether instead a Read chunk is
      required.
      
      More recently, commit ae72950a ("xprtrdma: Add data structure to
      manage RDMA Send arguments") used the ri_max_send_sges value to
      calculate the size of an array, but that commit erroneously assumed
      ri_max_send_sges contains a value similar to the device's max_sge,
      and not one that was reduced by the minimum SGE count.
      
      This assumption results in the calculated size of the sendctx's
      Send SGE array to be too small. When the array is used to marshal
      an RPC, the code can write Send SGEs into the following sendctx
      element in that array, corrupting it. When the device's max_sge is
      large, this issue is entirely harmless; but it results in an oops
      in the provider's post_send method, if dev.attrs.max_sge is small.
      
      So let's straighten this out: ri_max_send_sges will now contain a
      value with the same meaning as dev.attrs.max_sge, which makes
      the code easier to understand, and enables rpcrdma_sendctx_create
      to calculate the size of the SGE array correctly.
      Reported-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
      Fixes: 16f906d6 ("xprtrdma: Reduce required number of send SGEs")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
      Cc: stable@vger.kernel.org # v4.10+
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      1179e2c2
  7. 23 1月, 2018 16 次提交
  8. 19 1月, 2018 1 次提交
  9. 17 1月, 2018 13 次提交