1. 08 10月, 2008 1 次提交
  2. 10 7月, 2008 2 次提交
    • C
      SUNRPC: Ensure all transports set rq_xtime consistently · b22602a6
      Chuck Lever 提交于
      The RPC client uses the rq_xtime field in each RPC request to determine the
      round-trip time of the request.  Currently, the rq_xtime field is
      initialized by each transport just before it starts enqueing a request to
      be sent.  However, transports do not handle initializing this value
      consistently; sometimes they don't initialize it at all.
      
      To make the measurement of request round-trip time consistent for all
      RPC client transport capabilities, pull rq_xtime initialization into the
      RPC client's generic transport logic.  Now all transports will get a
      standardized RTT measure automatically, from:
      
        xprt_transmit()
      
      to
      
        xprt_complete_rqst()
      
      This makes round-trip time calculation more accurate for the TCP transport.
      The socket ->sendmsg() method can return "-EAGAIN" if the socket's output
      buffer is full, so the TCP transport's ->send_request() method may call
      the ->sendmsg() method repeatedly until it gets all of the request's bytes
      queued in the socket's buffer.
      
      Currently, the TCP transport sets the rq_xtime field every time through
      that loop so the final value is the timestamp just before the *last* call
      to the underlying socket's ->sendmsg() method.  After this patch, the
      rq_xtime field contains a timestamp that reflects the time just before the
      *first* call to ->sendmsg().
      
      This is consequential under heavy workloads because large requests often
      take multiple ->sendmsg() calls to get all the bytes of a request queued.
      The TCP transport causes the request to sleep until the remote end of the
      socket has received enough bytes to clear space in the socket's local
      output buffer.  This delay can be quite significant.
      
      The method introduced by this patch is a more accurate measure of RTT
      for stream transports, since the server can cause enough back pressure
      to delay (ie increase the latency of) requests from the client.
      
      Additionally, this patch corrects the behavior of the RDMA transport, which
      entirely neglected to initialize the rq_xtime field.  RPC performance
      metrics for RDMA transports now display correct RPC request round trip
      times.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Acked-by: NTom Talpey <thomas.talpey@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      b22602a6
    • C
      SUNRPC: Remove obsolete messages during transport connect · cd983ef8
      Chuck Lever 提交于
      Recent changes to the RPC client's transport connect logic make connect
      status values ECONNREFUSED and ECONNRESET impossible.
      
      Clean up xprt_connect_status() to account for these changes.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      cd983ef8
  3. 28 4月, 2008 1 次提交
  4. 20 4月, 2008 4 次提交
    • T
      SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests · 7c1d71cf
      Trond Myklebust 提交于
      NFSv4 requires us to ensure that we break the TCP connection before we're
      allowed to retransmit a request. However in the case where we're
      retransmitting several requests that have been sent on the same
      connection, we need to ensure that we don't interfere with the attempt to
      reconnect and/or break the connection again once it has been established.
      
      We therefore introduce a 'connection' cookie that is bumped every time a
      connection is broken. This allows requests to track if they need to force a
      disconnection.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      7c1d71cf
    • T
    • T
      SUNRPC: Fix read ordering problems with req->rq_private_buf.len · 1e799b67
      Trond Myklebust 提交于
      We want to ensure that req->rq_private_buf.len is updated before
      req->rq_received, so that call_decode() doesn't use an old value for
      req->rq_rcv_buf.len.
      
      In 'call_decode()' itself, instead of using task->tk_status (which is set
      using req->rq_received) must use the actual value of
      req->rq_private_buf.len when deciding whether or not the received RPC reply
      is too short.
      
      Finally ensure that we set req->rq_rcv_buf.len to zero when retrying a
      request. A typo meant that we were resetting req->rq_private_buf.len in
      call_decode(), and then clobbering that value with the old rq_rcv_buf.len
      again in xprt_transmit().
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      1e799b67
    • T
      SUNRPC: Fix up xprt_write_space() · b6ddf64f
      Trond Myklebust 提交于
      The rest of the networking layer uses SOCK_ASYNC_NOSPACE to signal whether
      or not we have someone waiting for buffer memory. Convert the SUNRPC layer
      to use the same idiom.
      Remove the unlikely()s in xs_udp_write_space and xs_tcp_write_space. In
      fact, the most common case will be that there is nobody waiting for buffer
      space.
      
      SOCK_NOSPACE is there to tell the TCP layer whether or not the cwnd was
      limited by the application window. Ensure that we follow the same idiom as
      the rest of the networking layer here too.
      
      Finally, ensure that we clear SOCK_ASYNC_NOSPACE once we wake up, so that
      write_space() doesn't keep waking things up on xprt->pending.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      b6ddf64f
  5. 29 2月, 2008 1 次提交
  6. 26 2月, 2008 3 次提交
  7. 14 2月, 2008 1 次提交
    • R
      docbook: sunrpc filenames and notation fixes · 65b6e42c
      Randy Dunlap 提交于
      Use updated file list for docbook files and
      fix kernel-doc warnings in sunrpc:
      Warning(linux-2.6.24-git12//net/sunrpc/rpc_pipe.c:689): No description found for parameter 'rpc_client'
      Warning(linux-2.6.24-git12//net/sunrpc/rpc_pipe.c:765): No description found for parameter 'flags'
      Warning(linux-2.6.24-git12//net/sunrpc/clnt.c:584): No description found for parameter 'tk_ops'
      Warning(linux-2.6.24-git12//net/sunrpc/clnt.c:618): No description found for parameter 'bufsize'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65b6e42c
  8. 30 1月, 2008 7 次提交
  9. 29 1月, 2008 1 次提交
  10. 22 11月, 2007 1 次提交
  11. 10 10月, 2007 5 次提交
  12. 11 7月, 2007 2 次提交
    • F
      SUNRPC: cleanup transport creation argument passing · 96802a09
      Frank van Maarseveen 提交于
      Cleanup argument passing to functions for creating an RPC transport.
      Signed-off-by: NFrank van Maarseveen <frankvm@frankvm.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      96802a09
    • T
      SUNRPC: fix hang due to eventd deadlock... · c1384c9c
      Trond Myklebust 提交于
      Brian Behlendorf writes:
      
      The root cause of the NFS hang we were observing appears to be a rare
      deadlock between the kernel provided usermodehelper API and the linux NFS
      client.  The deadlock can arise because both of these services use the
      generic linux work queues.  The usermodehelper API run the specified user
      application in the context of the work queue.  And NFS submits both cleanup
      and reconnect work to the generic work queue for handling.  Normally this
      is fine but a deadlock can result in the following situation.
      
        - NFS client is in a disconnected state
        - [events/0] runs a usermodehelper app with an NFS dependent operation,
          this triggers an NFS reconnect.
        - NFS reconnect happens to be submitted to [events/0] work queue.
        - Deadlock, the [events/0] work queue will never process the
          reconnect because it is blocked on the previous NFS dependent
          operation which will not complete.`
      
      The solution is simply to run reconnect requests on rpciod.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      c1384c9c
  13. 01 5月, 2007 3 次提交
    • C
      SUNRPC: introduce rpcbind: replacement for in-kernel portmapper · a509050b
      Chuck Lever 提交于
      Introduce a replacement for the in-kernel portmapper client that supports
      all 3 versions of the rpcbind protocol.  This code is not used yet.
      
      Original code by Groupe Bull updated for the latest kernel, with multiple
      bug fixes.
      
      Note that rpcb_clnt.c does not yet support registering via versions 3 and
      4 of the rpcbind protocol.  That is planned for a later patch.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      a509050b
    • C
      SUNRPC: Eliminate side effects from rpc_malloc · c5a4dd8b
      Chuck Lever 提交于
      Currently rpc_malloc sets req->rq_buffer internally.  Make this a more
      generic interface:  return a pointer to the new buffer (or NULL) and
      make the caller set req->rq_buffer and req->rq_bufsize.  This looks much
      more like kmalloc and eliminates the side effects.
      
      To fix a potential deadlock, this patch also replaces GFP_NOFS with
      GFP_NOWAIT in rpc_malloc.  This prevents async RPCs from sleeping outside
      the RPC's task scheduler while allocating their buffer.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      c5a4dd8b
    • C
      SUNRPC: RPC buffer size estimates are too large · 2bea90d4
      Chuck Lever 提交于
      The RPC buffer size estimation logic in net/sunrpc/clnt.c always
      significantly overestimates the requirements for the buffer size.
      A little instrumentation demonstrated that in fact rpc_malloc was never
      allocating the buffer from the mempool, but almost always called kmalloc.
      
      To compute the size of the RPC buffer more precisely, split p_bufsiz into
      two fields; one for the argument size, and one for the result size.
      
      Then, compute the sum of the exact call and reply header sizes, and split
      the RPC buffer precisely between the two.  That should keep almost all RPC
      buffers within the 2KiB buffer mempool limit.
      
      And, we can finally be rid of RPC_SLACK_SPACE!
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      2bea90d4
  14. 21 4月, 2007 1 次提交
    • T
      RPC: Fix the TCP resend semantics for NFSv4 · 241c39b9
      Trond Myklebust 提交于
      Fix a regression due to the patch "NFS: disconnect before retrying NFSv4
      requests over TCP"
      
      The assumption made in xprt_transmit() that the condition
      	"req->rq_bytes_sent == 0 and request is on the receive list"
      should imply that we're dealing with a retransmission is false.
      Firstly, it may simply happen that the socket send queue was full
      at the time the request was initially sent through xprt_transmit().
      Secondly, doing this for each request that was retransmitted implies
      that we disconnect and reconnect for _every_ request that happened to
      be retransmitted irrespective of whether or not a disconnection has
      already occurred.
      
      Fix is to move this logic into the call_status request timeout handler.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      241c39b9
  15. 13 2月, 2007 1 次提交
  16. 11 2月, 2007 1 次提交
  17. 04 2月, 2007 1 次提交
  18. 06 12月, 2006 2 次提交
  19. 22 11月, 2006 1 次提交
    • D
      WorkStruct: Pass the work_struct pointer instead of context data · 65f27f38
      David Howells 提交于
      Pass the work_struct pointer to the work function rather than context data.
      The work function can use container_of() to work out the data.
      
      For the cases where the container of the work_struct may go away the moment the
      pending bit is cleared, it is made possible to defer the release of the
      structure by deferring the clearing of the pending bit.
      
      To make this work, an extra flag is introduced into the management side of the
      work_struct.  This governs auto-release of the structure upon execution.
      
      Ordinarily, the work queue executor would release the work_struct for further
      scheduling or deallocation by clearing the pending bit prior to jumping to the
      work function.  This means that, unless the driver makes some guarantee itself
      that the work_struct won't go away, the work function may not access anything
      else in the work_struct or its container lest they be deallocated..  This is a
      problem if the auxiliary data is taken away (as done by the last patch).
      
      However, if the pending bit is *not* cleared before jumping to the work
      function, then the work function *may* access the work_struct and its container
      with no problems.  But then the work function must itself release the
      work_struct by calling work_release().
      
      In most cases, automatic release is fine, so this is the default.  Special
      initiators exist for the non-auto-release case (ending in _NAR).
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      65f27f38
  20. 29 9月, 2006 1 次提交