1. 25 9月, 2014 5 次提交
    • J
      rpc: Add -EPERM processing for xs_udp_send_request() · 3dedbb5c
      Jason Baron 提交于
      If an iptables drop rule is added for an nfs server, the client can end up in
      a softlockup. Because of the way that xs_sendpages() is structured, the -EPERM
      is ignored since the prior bits of the packet may have been successfully queued
      and thus xs_sendpages() returns a non-zero value. Then, xs_udp_send_request()
      thinks that because some bits were queued it should return -EAGAIN. We then try
      the request again and again, resulting in cpu spinning. Reproducer:
      
      1) open a file on the nfs server '/nfs/foo' (mounted using udp)
      2) iptables -A OUTPUT -d <nfs server ip> -j DROP
      3) write to /nfs/foo
      4) close /nfs/foo
      5) iptables -D OUTPUT -d <nfs server ip> -j DROP
      
      The softlockup occurs in step 4 above.
      
      The previous patch, allows xs_sendpages() to return both a sent count and
      any error values that may have occurred. Thus, if we get an -EPERM, return
      that to the higher level code.
      
      With this patch in place we can successfully abort the above sequence and
      avoid the softlockup.
      
      I also tried the above test case on an nfs mount on tcp and although the system
      does not softlockup, I still ended up with the 'hung_task' firing after 120
      seconds, due to the i/o being stuck. The tcp case appears a bit harder to fix,
      since -EPERM appears to get ignored much lower down in the stack and does not
      propogate up to xs_sendpages(). This case is not quite as insidious as the
      softlockup and it is not addressed here.
      Reported-by: NYigong Lou <ylou@akamai.com>
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      3dedbb5c
    • J
      rpc: return sent and err from xs_sendpages() · f279cd00
      Jason Baron 提交于
      If an error is returned after the first bits of a packet have already been
      successfully queued, xs_sendpages() will return a positive 'int' value
      indicating success. Callers seem to treat this as -EAGAIN.
      
      However, there are cases where its not a question of waiting for the write
      queue to drain. For example, when there is an iptables rule dropping packets
      to the destination, the lower level code can return -EPERM only after parts
      of the packet have been successfully queued. In this case, we can end up
      continuously retrying resulting in a kernel softlockup.
      
      This patch is intended to make no changes in behavior but is in preparation for
      subsequent patches that can make decisions based on both on the number of bytes
      sent by xs_sendpages() and any errors that may have be returned.
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      f279cd00
    • B
      lockd: Try to reconnect if statd has moved · 173b3afc
      Benjamin Coddington 提交于
      If rpc.statd is restarted, upcalls to monitor hosts can fail with
      ECONNREFUSED.  In that case force a lookup of statd's new port and retry the
      upcall.
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      173b3afc
    • B
      SUNRPC: Don't wake tasks during connection abort · a743419f
      Benjamin Coddington 提交于
      When aborting a connection to preserve source ports, don't wake the task in
      xs_error_report.  This allows tasks with RPC_TASK_SOFTCONN to succeed if the
      connection needs to be re-established since it preserves the task's status
      instead of setting it to the status of the aborting kernel_connect().
      
      This may also avoid a potential conflict on the socket's lock.
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Cc: stable@vger.kernel.org # 3.14+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      a743419f
    • O
      Fixing lease renewal · 8faaa6d5
      Olga Kornievskaia 提交于
      Commit c9fdeb28 removed a 'continue' after checking if the lease needs
      to be renewed. However, if client hasn't moved, the code falls down to
      starting reboot recovery erroneously (ie., sends open reclaim and gets
      back stale_clientid error) before recovering from getting stale_clientid
      on the renew operation.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Fixes: c9fdeb28 (NFS: Add basic migration support to state manager thread)
      Cc: stable@vger.kernel.org # 3.13+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      8faaa6d5
  2. 22 9月, 2014 1 次提交
  3. 16 9月, 2014 1 次提交
  4. 13 9月, 2014 15 次提交
  5. 11 9月, 2014 18 次提交