1. 11 2月, 2017 2 次提交
    • C
      xprtrdma: Per-connection pad optimization · b5f0afbe
      Chuck Lever 提交于
      Pad optimization is changed by echoing into
      /proc/sys/sunrpc/rdma_pad_optimize. This is a global setting,
      affecting all RPC-over-RDMA connections to all servers.
      
      The marshaling code picks up that value and uses it for decisions
      about how to construct each RPC-over-RDMA frame. Having it change
      suddenly in mid-operation can result in unexpected failures. And
      some servers a client mounts might need chunk round-up, while
      others don't.
      
      So instead, copy the pad_optimize setting into each connection's
      rpcrdma_ia when the transport is created, and use the copy, which
      can't change during the life of the connection, instead.
      
      This also removes a hack: rpcrdma_convert_iovs was using
      the remote-invalidation-expected flag to predict when it could leave
      out Write chunk padding. This is because the Linux server handles
      implicit XDR padding on Write chunks correctly, and only Linux
      servers can set the connection's remote-invalidation-expected flag.
      
      It's more sensible to use the pad optimization setting instead.
      
      Fixes: 677eb17e ("xprtrdma: Fix XDR tail buffer marshalling")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b5f0afbe
    • C
      xprtrdma: Fix Read chunk padding · 24abdf1b
      Chuck Lever 提交于
      When pad optimization is disabled, rpcrdma_convert_iovs still
      does not add explicit XDR round-up padding to a Read chunk.
      
      Commit 677eb17e ("xprtrdma: Fix XDR tail buffer marshalling")
      incorrectly short-circuited the test for whether round-up padding
      is needed that appears later in rpcrdma_convert_iovs.
      
      However, if this is indeed a regular Read chunk (and not a
      Position-Zero Read chunk), the tail iovec _always_ contains the
      chunk's padding, and never anything else.
      
      So, it's easy to just skip the tail when padding optimization is
      enabled, and add the tail in a subsequent Read chunk segment, if
      disabled.
      
      Fixes: 677eb17e ("xprtrdma: Fix XDR tail buffer marshalling")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      24abdf1b
  2. 10 2月, 2017 4 次提交
  3. 09 2月, 2017 7 次提交
  4. 31 1月, 2017 1 次提交
    • N
      SUNRPC: two small improvements to rpcauth shrinker. · 4c3ffd05
      NeilBrown 提交于
      1/ If we find an entry that is too young to be pruned,
        return SHRINK_STOP to ensure we don't get called again.
        This is more correct, and avoids wasting a little CPU time.
        Prior to 3.12, it can prevent drop_slab() from spinning indefinitely.
      
      2/ Return a precise number from rpcauth_cache_shrink_count(), rather than
        rounding down to a multiple of 100 (of whatever sysctl_vfs_cache_pressure is).
        This ensures that when we "echo 3 > /proc/sys/vm/drop_caches", this cache is
        still purged, even if it has fewer than 100 entires.
      
      Neither of these are really important, they just make behaviour
      more predicatable, which can be helpful when debugging related issues.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      4c3ffd05
  5. 25 1月, 2017 1 次提交
    • K
      SUNRPC: cleanup ida information when removing sunrpc module · c929ea0b
      Kinglong Mee 提交于
      After removing sunrpc module, I get many kmemleak information as,
      unreferenced object 0xffff88003316b1e0 (size 544):
        comm "gssproxy", pid 2148, jiffies 4294794465 (age 4200.081s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffffb0cfb58a>] kmemleak_alloc+0x4a/0xa0
          [<ffffffffb03507fe>] kmem_cache_alloc+0x15e/0x1f0
          [<ffffffffb0639baa>] ida_pre_get+0xaa/0x150
          [<ffffffffb0639cfd>] ida_simple_get+0xad/0x180
          [<ffffffffc06054fb>] nlmsvc_lookup_host+0x4ab/0x7f0 [lockd]
          [<ffffffffc0605e1d>] lockd+0x4d/0x270 [lockd]
          [<ffffffffc06061e5>] param_set_timeout+0x55/0x100 [lockd]
          [<ffffffffc06cba24>] svc_defer+0x114/0x3f0 [sunrpc]
          [<ffffffffc06cbbe7>] svc_defer+0x2d7/0x3f0 [sunrpc]
          [<ffffffffc06c71da>] rpc_show_info+0x8a/0x110 [sunrpc]
          [<ffffffffb044a33f>] proc_reg_write+0x7f/0xc0
          [<ffffffffb038e41f>] __vfs_write+0xdf/0x3c0
          [<ffffffffb0390f1f>] vfs_write+0xef/0x240
          [<ffffffffb0392fbd>] SyS_write+0xad/0x130
          [<ffffffffb0d06c37>] entry_SYSCALL_64_fastpath+0x1a/0xa9
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      I found, the ida information (dynamic memory) isn't cleanup.
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Fixes: 2f048db4 ("SUNRPC: Add an identifier for struct rpc_clnt")
      Cc: stable@vger.kernel.org # v3.12+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      c929ea0b
  6. 13 1月, 2017 3 次提交
  7. 26 12月, 2016 1 次提交
    • T
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner 提交于
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      2456e855
  8. 25 12月, 2016 1 次提交
  9. 10 12月, 2016 1 次提交
    • N
      SUNRPC: fix refcounting problems with auth_gss messages. · 1cded9d2
      NeilBrown 提交于
      There are two problems with refcounting of auth_gss messages.
      
      First, the reference on the pipe->pipe list (taken by a call
      to rpc_queue_upcall()) is not counted.  It seems to be
      assumed that a message in pipe->pipe will always also be in
      pipe->in_downcall, where it is correctly reference counted.
      
      However there is no guaranty of this.  I have a report of a
      NULL dereferences in rpc_pipe_read() which suggests a msg
      that has been freed is still on the pipe->pipe list.
      
      One way I imagine this might happen is:
      - message is queued for uid=U and auth->service=S1
      - rpc.gssd reads this message and starts processing.
        This removes the message from pipe->pipe
      - message is queued for uid=U and auth->service=S2
      - rpc.gssd replies to the first message. gss_pipe_downcall()
        calls __gss_find_upcall(pipe, U, NULL) and it finds the
        *second* message, as new messages are placed at the head
        of ->in_downcall, and the service type is not checked.
      - This second message is removed from ->in_downcall and freed
        by gss_release_msg() (even though it is still on pipe->pipe)
      - rpc.gssd tries to read another message, and dereferences a pointer
        to this message that has just been freed.
      
      I fix this by incrementing the reference count before calling
      rpc_queue_upcall(), and decrementing it if that fails, or normally in
      gss_pipe_destroy_msg().
      
      It seems strange that the reply doesn't target the message more
      precisely, but I don't know all the details.  In any case, I think the
      reference counting irregularity became a measureable bug when the
      extra arg was added to __gss_find_upcall(), hence the Fixes: line
      below.
      
      The second problem is that if rpc_queue_upcall() fails, the new
      message is not freed. gss_alloc_msg() set the ->count to 1,
      gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1,
      then the pointer is discarded so the memory never gets freed.
      
      Fixes: 9130b8db ("SUNRPC: allow for upcalls for same uid but different gss service")
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      1cded9d2
  10. 07 12月, 2016 1 次提交
  11. 02 12月, 2016 1 次提交
    • N
      sunrpc: Don't engage exponential backoff when connection attempt is rejected. · 2c2ee6d2
      NeilBrown 提交于
      xs_connect() contains an exponential backoff mechanism so the repeated
      connection attempts are delayed by longer and longer amounts.
      
      This is appropriate when the connection failed due to a timeout, but
      it not appropriate when a definitive "no" answer is received.  In such
      cases, call_connect_status() imposes a minimum 3-second back-off, so
      not having the exponetial back-off will never result in immediate
      retries.
      
      The current situation is a problem when the NFS server tries to
      register with rpcbind but rpcbind isn't running.  All connection
      attempts are made on the same "xprt" and as the connection is never
      "closed", the exponential back delays successive attempts to register,
      or de-register, different protocols.  This results in a multi-minute
      delay with no benefit.
      
      So, when call_connect_status() receives a definitive "no", use
      xprt_conditional_disconnect() to cancel the previous connection attempt.
      This will set XPRT_CLOSE_WAIT so that xprt->ops->close() calls xs_close()
      which resets the reestablish_timeout.
      
      To ensure xprt_conditional_disconnect() does the right thing, we
      ensure that rq_connect_cookie is set before a connection attempt, and
      allow xprt_conditional_disconnect() to complete even when the
      transport is not fully connected.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      2c2ee6d2
  12. 01 12月, 2016 10 次提交
  13. 30 11月, 2016 7 次提交