1. 01 8月, 2018 1 次提交
    • B
      NFSv4 client live hangs after live data migration recovery · 0f90be13
      Bill Baker 提交于
      After a live data migration event at the NFS server, the client may send
      I/O requests to the wrong server, causing a live hang due to repeated
      recovery events.  On the wire, this will appear as an I/O request failing
      with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly.
      NFS4ERR_BADSSESSION is returned because the session ID being used was
      issued by the other server and is not valid at the old server.
      
      The failure is caused by async worker threads having cached the transport
      (xprt) in the rpc_task structure.  After the migration recovery completes,
      the task is redispatched and the task resends the request to the wrong
      server based on the old value still present in tk_xprt.
      
      The solution is to recompute the tk_xprt field of the rpc_task structure
      so that the request goes to the correct server.
      Signed-off-by: NBill Baker <bill.baker@oracle.com>
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NHelen Chao <helen.chao@oracle.com>
      Fixes: fb43d172 ("SUNRPC: Use the multipath iterator to assign a ...")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      0f90be13
  2. 07 5月, 2018 1 次提交
  3. 11 4月, 2018 2 次提交
  4. 13 2月, 2018 1 次提交
    • D
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko 提交于
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  5. 15 1月, 2018 1 次提交
  6. 01 12月, 2017 1 次提交
  7. 18 11月, 2017 2 次提交
  8. 07 9月, 2017 1 次提交
    • N
      SUNRPC: remove some dead code. · f1ecbc21
      NeilBrown 提交于
      RPC_TASK_NO_RETRANS_TIMEOUT is set when cl_noretranstimeo
      is set, which happens when  RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT is set,
      which happens when NFS_CS_NO_RETRANS_TIMEOUT is set.
      
      This flag means "don't resend on a timeout, only resend if the
      connection gets broken for some reason".
      
      cl_discrtry is set when RPC_CLNT_CREATE_DISCRTRY is set, which
      happens when NFS_CS_DISCRTRY is set.
      
      This flag means "always disconnect before resending".
      
      NFS_CS_NO_RETRANS_TIMEOUT and NFS_CS_DISCRTRY are both only set
      in nfs4_init_client(), and it always sets both.
      
      So we will never have a situation where only one of the flags is set.
      So this code, which tests if timeout retransmits are allowed, and
      disconnection is required, will never run.
      
      So it makes sense to remove this code as it cannot be tested and
      could confuse people reading the code (like me).
      
      (alternately we could leave it there with a comment saying
       it is never actually used).
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      f1ecbc21
  9. 21 8月, 2017 1 次提交
    • N
      SUNRPC: ECONNREFUSED should cause a rebind. · fd01b259
      NeilBrown 提交于
      If you
       - mount and NFSv3 filesystem
       - do some file locking which requires the server
         to make a GRANT call back
       - unmount
       - mount again and do the same locking
      
      then the second attempt at locking suffers a 30 second delay.
      Unmounting and remounting causes lockd to stop and restart,
      which causes it to bind to a new port.
      The server still thinks the old port is valid and gets ECONNREFUSED
      when trying to contact it.
      ECONNREFUSED should be seen as a hard error that is not worth
      retrying.  Rebinding is the only reasonable response.
      
      This patch forces a rebind if that makes sense.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      fd01b259
  10. 14 7月, 2017 4 次提交
  11. 15 5月, 2017 4 次提交
  12. 21 4月, 2017 1 次提交
  13. 10 2月, 2017 3 次提交
  14. 25 1月, 2017 1 次提交
    • K
      SUNRPC: cleanup ida information when removing sunrpc module · c929ea0b
      Kinglong Mee 提交于
      After removing sunrpc module, I get many kmemleak information as,
      unreferenced object 0xffff88003316b1e0 (size 544):
        comm "gssproxy", pid 2148, jiffies 4294794465 (age 4200.081s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffffb0cfb58a>] kmemleak_alloc+0x4a/0xa0
          [<ffffffffb03507fe>] kmem_cache_alloc+0x15e/0x1f0
          [<ffffffffb0639baa>] ida_pre_get+0xaa/0x150
          [<ffffffffb0639cfd>] ida_simple_get+0xad/0x180
          [<ffffffffc06054fb>] nlmsvc_lookup_host+0x4ab/0x7f0 [lockd]
          [<ffffffffc0605e1d>] lockd+0x4d/0x270 [lockd]
          [<ffffffffc06061e5>] param_set_timeout+0x55/0x100 [lockd]
          [<ffffffffc06cba24>] svc_defer+0x114/0x3f0 [sunrpc]
          [<ffffffffc06cbbe7>] svc_defer+0x2d7/0x3f0 [sunrpc]
          [<ffffffffc06c71da>] rpc_show_info+0x8a/0x110 [sunrpc]
          [<ffffffffb044a33f>] proc_reg_write+0x7f/0xc0
          [<ffffffffb038e41f>] __vfs_write+0xdf/0x3c0
          [<ffffffffb0390f1f>] vfs_write+0xef/0x240
          [<ffffffffb0392fbd>] SyS_write+0xad/0x130
          [<ffffffffb0d06c37>] entry_SYSCALL_64_fastpath+0x1a/0xa9
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      I found, the ida information (dynamic memory) isn't cleanup.
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Fixes: 2f048db4 ("SUNRPC: Add an identifier for struct rpc_clnt")
      Cc: stable@vger.kernel.org # v3.12+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      c929ea0b
  15. 02 12月, 2016 1 次提交
    • N
      sunrpc: Don't engage exponential backoff when connection attempt is rejected. · 2c2ee6d2
      NeilBrown 提交于
      xs_connect() contains an exponential backoff mechanism so the repeated
      connection attempts are delayed by longer and longer amounts.
      
      This is appropriate when the connection failed due to a timeout, but
      it not appropriate when a definitive "no" answer is received.  In such
      cases, call_connect_status() imposes a minimum 3-second back-off, so
      not having the exponetial back-off will never result in immediate
      retries.
      
      The current situation is a problem when the NFS server tries to
      register with rpcbind but rpcbind isn't running.  All connection
      attempts are made on the same "xprt" and as the connection is never
      "closed", the exponential back delays successive attempts to register,
      or de-register, different protocols.  This results in a multi-minute
      delay with no benefit.
      
      So, when call_connect_status() receives a definitive "no", use
      xprt_conditional_disconnect() to cancel the previous connection attempt.
      This will set XPRT_CLOSE_WAIT so that xprt->ops->close() calls xs_close()
      which resets the reestablish_timeout.
      
      To ensure xprt_conditional_disconnect() does the right thing, we
      ensure that rq_connect_cookie is set before a connection attempt, and
      allow xprt_conditional_disconnect() to complete even when the
      transport is not fully connected.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      2c2ee6d2
  16. 08 11月, 2016 1 次提交
  17. 20 9月, 2016 9 次提交
  18. 25 8月, 2016 1 次提交
  19. 06 8月, 2016 2 次提交
  20. 25 7月, 2016 1 次提交
  21. 15 6月, 2016 1 次提交