1. 26 11月, 2014 8 次提交
  2. 25 11月, 2014 5 次提交
  3. 14 11月, 2014 1 次提交
    • J
      sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor · b3ecba09
      Jeff Layton 提交于
      Bruce reported that he was seeing the following BUG pop:
      
          BUG: sleeping function called from invalid context at mm/slab.c:2846
          in_atomic(): 0, irqs_disabled(): 0, pid: 4539, name: mount.nfs
          2 locks held by mount.nfs/4539:
          #0:  (nfs_clid_init_mutex){+.+.+.}, at: [<ffffffffa01c0a9a>] nfs4_discover_server_trunking+0x4a/0x2f0 [nfsv4]
          #1:  (rcu_read_lock){......}, at: [<ffffffffa00e3185>] gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
          Preemption disabled at:[<ffffffff81a4f082>] printk+0x4d/0x4f
      
          CPU: 3 PID: 4539 Comm: mount.nfs Not tainted 3.18.0-rc1-00013-g5b095e99 #3393
          Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
          ffff880021499390 ffff8800381476a8 ffffffff81a534cf 0000000000000001
          0000000000000000 ffff8800381476c8 ffffffff81097854 00000000000000d0
          0000000000000018 ffff880038147718 ffffffff8118e4f3 0000000020479f00
          Call Trace:
          [<ffffffff81a534cf>] dump_stack+0x4f/0x7c
          [<ffffffff81097854>] __might_sleep+0x114/0x180
          [<ffffffff8118e4f3>] __kmalloc+0x1a3/0x280
          [<ffffffffa00e31d8>] gss_stringify_acceptor+0x58/0xb0 [auth_rpcgss]
          [<ffffffffa00e3185>] ? gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
          [<ffffffffa006b438>] rpcauth_stringify_acceptor+0x18/0x30 [sunrpc]
          [<ffffffffa01b0469>] nfs4_proc_setclientid+0x199/0x380 [nfsv4]
          [<ffffffffa01b04d0>] ? nfs4_proc_setclientid+0x200/0x380 [nfsv4]
          [<ffffffffa01bdf1a>] nfs40_discover_server_trunking+0xda/0x150 [nfsv4]
          [<ffffffffa01bde45>] ? nfs40_discover_server_trunking+0x5/0x150 [nfsv4]
          [<ffffffffa01c0acf>] nfs4_discover_server_trunking+0x7f/0x2f0 [nfsv4]
          [<ffffffffa01c8e24>] nfs4_init_client+0x104/0x2f0 [nfsv4]
          [<ffffffffa01539b4>] nfs_get_client+0x314/0x3f0 [nfs]
          [<ffffffffa0153780>] ? nfs_get_client+0xe0/0x3f0 [nfs]
          [<ffffffffa01c83aa>] nfs4_set_client+0x8a/0x110 [nfsv4]
          [<ffffffffa0069708>] ? __rpc_init_priority_wait_queue+0xa8/0xf0 [sunrpc]
          [<ffffffffa01c9b2f>] nfs4_create_server+0x12f/0x390 [nfsv4]
          [<ffffffffa01c1472>] nfs4_remote_mount+0x32/0x60 [nfsv4]
          [<ffffffff81196489>] mount_fs+0x39/0x1b0
          [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20
          [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150
          [<ffffffffa01c1396>] nfs_do_root_mount+0x86/0xc0 [nfsv4]
          [<ffffffffa01c1784>] nfs4_try_mount+0x44/0xc0 [nfsv4]
          [<ffffffffa01549b7>] ? get_nfs_version+0x27/0x90 [nfs]
          [<ffffffffa0161a2d>] nfs_fs_mount+0x47d/0xd60 [nfs]
          [<ffffffff81a59c5e>] ? mutex_unlock+0xe/0x10
          [<ffffffffa01606a0>] ? nfs_remount+0x430/0x430 [nfs]
          [<ffffffffa01609c0>] ? nfs_clone_super+0x140/0x140 [nfs]
          [<ffffffff81196489>] mount_fs+0x39/0x1b0
          [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20
          [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150
          [<ffffffff811b5830>] do_mount+0x210/0xbe0
          [<ffffffff811b54ca>] ? copy_mount_options+0x3a/0x160
          [<ffffffff811b651f>] SyS_mount+0x6f/0xb0
          [<ffffffff81a5c852>] system_call_fastpath+0x12/0x17
      
      Sleeping under the rcu_read_lock is bad. This patch fixes it by dropping
      the rcu_read_lock before doing the allocation and then reacquiring it
      and redoing the dereference before doing the copy. If we find that the
      string has somehow grown in the meantime, we'll reallocate and try again.
      
      Cc: <stable@vger.kernel.org> # v3.17+
      Reported-by: N"J. Bruce Fields" <bfields@fieldses.org>
      Signed-off-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      b3ecba09
  4. 30 9月, 2014 1 次提交
    • S
      svcrdma: advertise the correct max payload · 7e5be288
      Steve Wise 提交于
      Svcrdma currently advertises 1MB, which is too large.  The correct value
      is the minimum of RPCSVC_MAXPAYLOAD and the max scatter-gather allowed
      in an NFSRDMA IO chunk * the host page size. This bug is usually benign
      because the Linux X64 NFSRDMA client correctly limits the payload size to
      the correct value (64*4096 = 256KB).  But if the Linux client is PPC64
      with a 64KB page size, then the client will indeed use a payload size
      that will overflow the server.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      7e5be288
  5. 26 9月, 2014 1 次提交
    • T
      SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT · 2aca5b86
      Trond Myklebust 提交于
      The flag RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT was intended introduced in
      order to allow NFSv4 clients to disable resend timeouts. Since those
      cause the RPC layer to break the connection, they mess up the duplicate
      reply caches that remain indexed on the port number in NFSv4..
      
      This patch includes the code that was missing in the original to
      set the appropriate flag in struct rpc_clnt, when the caller of
      rpc_create() sets RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT.
      
      Fixes: 8a19a0b6 (SUNRPC: Add RPC task and client level options to...)
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      2aca5b86
  6. 25 9月, 2014 4 次提交
    • N
      NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() · 1aff5256
      NeilBrown 提交于
      Now that nfs_release_page() doesn't block indefinitely, other deadlock
      avoidance mechanisms aren't needed.
       - it doesn't hurt for kswapd to block occasionally.  If it doesn't
         want to block it would clear __GFP_WAIT.  The current_is_kswapd()
         was only added to avoid deadlocks and we have a new approach for
         that.
       - memory allocation in the SUNRPC layer can very rarely try to
         ->releasepage() a page it is trying to handle.  The deadlock
         is removed as nfs_release_page() doesn't block indefinitely.
      
      So we don't need to set PF_FSTRANS for sunrpc network operations any
      more.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      1aff5256
    • J
      rpc: Add -EPERM processing for xs_udp_send_request() · 3dedbb5c
      Jason Baron 提交于
      If an iptables drop rule is added for an nfs server, the client can end up in
      a softlockup. Because of the way that xs_sendpages() is structured, the -EPERM
      is ignored since the prior bits of the packet may have been successfully queued
      and thus xs_sendpages() returns a non-zero value. Then, xs_udp_send_request()
      thinks that because some bits were queued it should return -EAGAIN. We then try
      the request again and again, resulting in cpu spinning. Reproducer:
      
      1) open a file on the nfs server '/nfs/foo' (mounted using udp)
      2) iptables -A OUTPUT -d <nfs server ip> -j DROP
      3) write to /nfs/foo
      4) close /nfs/foo
      5) iptables -D OUTPUT -d <nfs server ip> -j DROP
      
      The softlockup occurs in step 4 above.
      
      The previous patch, allows xs_sendpages() to return both a sent count and
      any error values that may have occurred. Thus, if we get an -EPERM, return
      that to the higher level code.
      
      With this patch in place we can successfully abort the above sequence and
      avoid the softlockup.
      
      I also tried the above test case on an nfs mount on tcp and although the system
      does not softlockup, I still ended up with the 'hung_task' firing after 120
      seconds, due to the i/o being stuck. The tcp case appears a bit harder to fix,
      since -EPERM appears to get ignored much lower down in the stack and does not
      propogate up to xs_sendpages(). This case is not quite as insidious as the
      softlockup and it is not addressed here.
      Reported-by: NYigong Lou <ylou@akamai.com>
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      3dedbb5c
    • J
      rpc: return sent and err from xs_sendpages() · f279cd00
      Jason Baron 提交于
      If an error is returned after the first bits of a packet have already been
      successfully queued, xs_sendpages() will return a positive 'int' value
      indicating success. Callers seem to treat this as -EAGAIN.
      
      However, there are cases where its not a question of waiting for the write
      queue to drain. For example, when there is an iptables rule dropping packets
      to the destination, the lower level code can return -EPERM only after parts
      of the packet have been successfully queued. In this case, we can end up
      continuously retrying resulting in a kernel softlockup.
      
      This patch is intended to make no changes in behavior but is in preparation for
      subsequent patches that can make decisions based on both on the number of bytes
      sent by xs_sendpages() and any errors that may have be returned.
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      f279cd00
    • B
      SUNRPC: Don't wake tasks during connection abort · a743419f
      Benjamin Coddington 提交于
      When aborting a connection to preserve source ports, don't wake the task in
      xs_error_report.  This allows tasks with RPC_TASK_SOFTCONN to succeed if the
      connection needs to be re-established since it preserves the task's status
      instead of setting it to the status of the aborting kernel_connect().
      
      This may also avoid a potential conflict on the socket's lock.
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Cc: stable@vger.kernel.org # 3.14+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      a743419f
  7. 11 9月, 2014 1 次提交
    • C
      rpc: xs_bind - do not bind when requesting a random ephemeral port · 0f7a622c
      Chris Perl 提交于
      When attempting to establish a local ephemeral endpoint for a TCP or UDP
      socket, do not explicitly call bind, instead let it happen implicilty when the
      socket is first used.
      
      The main motivating factor for this change is when TCP runs out of unique
      ephemeral ports (i.e.  cannot find any ephemeral ports which are not a part of
      *any* TCP connection).  In this situation if you explicitly call bind, then the
      call will fail with EADDRINUSE.  However, if you allow the allocation of an
      ephemeral port to happen implicitly as part of connect (or other functions),
      then ephemeral ports can be reused, so long as the combination of (local_ip,
      local_port, remote_ip, remote_port) is unique for TCP sockets on the system.
      
      This doesn't matter for UDP sockets, but it seemed easiest to treat TCP and UDP
      sockets the same.
      
      This can allow mount.nfs(8) to continue to function successfully, even in the
      face of misbehaving applications which are creating a large number of TCP
      connections.
      Signed-off-by: NChris Perl <chris.perl@gmail.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      0f7a622c
  8. 29 8月, 2014 2 次提交
  9. 18 8月, 2014 6 次提交
  10. 06 8月, 2014 1 次提交
    • S
      svcrdma: remove rdma_create_qp() failure recovery logic · d1e458fe
      Steve Wise 提交于
      In svc_rdma_accept(), if rdma_create_qp() fails, there is useless
      logic to try and call rdma_create_qp() again with reduced sge depths.
      The assumption, I guess, was that perhaps the initial sge depths
      chosen were too big.  However they initial depths are selected based
      on the rdma device attribute max_sge returned from ib_query_device().
      If rdma_create_qp() fails, it would not be because the max_send_sge and
      max_recv_sge values passed in exceed the device's max.  So just remove
      this code.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      d1e458fe
  11. 04 8月, 2014 7 次提交
  12. 01 8月, 2014 3 次提交