1. 18 10月, 2017 1 次提交
    • D
      rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals · bc5e3a54
      David Howells 提交于
      Make AF_RXRPC accept MSG_WAITALL as a flag to sendmsg() to tell it to
      ignore signals whilst loading up the message queue, provided progress is
      being made in emptying the queue at the other side.
      
      Progress is defined as the base of the transmit window having being
      advanced within 2 RTT periods.  If the period is exceeded with no progress,
      sendmsg() will return anyway, indicating how much data has been copied, if
      any.
      
      Once the supplied buffer is entirely decanted, the sendmsg() will return.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bc5e3a54
  2. 29 8月, 2017 3 次提交
    • D
      rxrpc: Allow failed client calls to be retried · c038a58c
      David Howells 提交于
      Allow a client call that failed on network error to be retried, provided
      that the Tx queue still holds DATA packet 1.  This allows an operation to
      be submitted to another server or another address for the same server
      without having to repackage and re-encrypt the data so far processed.
      
      Two new functions are provided:
      
       (1) rxrpc_kernel_check_call() - This is used to find out the completion
           state of a call to guess whether it can be retried and whether it
           should be retried.
      
       (2) rxrpc_kernel_retry_call() - Disconnect the call from its current
           connection, reset the state and submit it as a new client call to a
           new address.  The new address need not match the previous address.
      
      A call may be retried even if all the data hasn't been loaded into it yet;
      a partially constructed will be retained at the same point it was at when
      an error condition was detected.  msg_data_left() can be used to find out
      how much data was packaged before the error occurred.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c038a58c
    • D
      rxrpc: Add notification of end-of-Tx phase · e833251a
      David Howells 提交于
      Add a callback to rxrpc_kernel_send_data() so that a kernel service can get
      a notification that the AF_RXRPC call has transitioned out the Tx phase and
      is now waiting for a reply or a final ACK.
      
      This is called from AF_RXRPC with the call state lock held so the
      notification is guaranteed to come before any reply is passed back.
      
      Further, modify the AFS filesystem to make use of this so that we don't have
      to change the afs_call state before sending the last bit of data.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e833251a
    • D
      rxrpc: Don't negate call->error before returning it · bd2db2d2
      David Howells 提交于
      call->error is stored as 0 or a negative error code.  Don't negate this
      value (ie. make it positive) before returning it from a kernel function
      (though it should still be negated before passing to userspace through a
      control message).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bd2db2d2
  3. 16 6月, 2017 1 次提交
    • J
      networking: convert many more places to skb_put_zero() · b080db58
      Johannes Berg 提交于
      There were many places that my previous spatch didn't find,
      as pointed out by yuan linyu in various patches.
      
      The following spatch found many more and also removes the
      now unnecessary casts:
      
          @@
          identifier p, p2;
          expression len;
          expression skb;
          type t, t2;
          @@
          (
          -p = skb_put(skb, len);
          +p = skb_put_zero(skb, len);
          |
          -p = (t)skb_put(skb, len);
          +p = skb_put_zero(skb, len);
          )
          ... when != p
          (
          p2 = (t2)p;
          -memset(p2, 0, len);
          |
          -memset(p, 0, len);
          )
      
          @@
          type t, t2;
          identifier p, p2;
          expression skb;
          @@
          t *p;
          ...
          (
          -p = skb_put(skb, sizeof(t));
          +p = skb_put_zero(skb, sizeof(t));
          |
          -p = (t *)skb_put(skb, sizeof(t));
          +p = skb_put_zero(skb, sizeof(t));
          )
          ... when != p
          (
          p2 = (t2)p;
          -memset(p2, 0, sizeof(*p));
          |
          -memset(p, 0, sizeof(*p));
          )
      
          @@
          expression skb, len;
          @@
          -memset(skb_put(skb, len), 0, len);
          +skb_put_zero(skb, len);
      
      Apply it to the tree (with one manual fixup to keep the
      comment in vxlan.c, which spatch removed.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b080db58
  4. 08 6月, 2017 2 次提交
    • D
      rxrpc: Provide a cmsg to specify the amount of Tx data for a call · e754eba6
      David Howells 提交于
      Provide a control message that can be specified on the first sendmsg() of a
      client call or the first sendmsg() of a service response to indicate the
      total length of the data to be transmitted for that call.
      
      Currently, because the length of the payload of an encrypted DATA packet is
      encrypted in front of the data, the packet cannot be encrypted until we
      know how much data it will hold.
      
      By specifying the length at the beginning of the transmit phase, each DATA
      packet length can be set before we start loading data from userspace (where
      several sendmsg() calls may contribute to a particular packet).
      
      An error will be returned if too little or too much data is presented in
      the Tx phase.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e754eba6
    • D
      rxrpc: Consolidate sendmsg parameters · 3ab26a6f
      David Howells 提交于
      Consolidate the sendmsg control message parameters into a struct rather
      than passing them individually through the argument list of
      rxrpc_sendmsg_cmsg().  This makes it easier to add more parameters.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3ab26a6f
  5. 05 6月, 2017 1 次提交
    • D
      rxrpc: Add service upgrade support for client connections · 4e255721
      David Howells 提交于
      Make it possible for a client to use AuriStor's service upgrade facility.
      
      The client does this by adding an RXRPC_UPGRADE_SERVICE control message to
      the first sendmsg() of a call.  This takes no parameters.
      
      When recvmsg() starts returning data from the call, the service ID field in
      the returned msg_name will reflect the result of the upgrade attempt.  If
      the upgrade was ignored, srx_service will match what was set in the
      sendmsg(); if the upgrade happened the srx_service will be altered to
      indicate the service the server upgraded to.
      
      Note that:
      
       (1) The choice of upgrade service is up to the server
      
       (2) Further client calls to the same server that would share a connection
           are blocked if an upgrade probe is in progress.
      
       (3) This should only be used to probe the service.  Clients should then
           use the returned service ID in all subsequent communications with that
           server (and not set the upgrade).  Note that the kernel will not
           retain this information should the connection expire from its cache.
      
       (4) If a server that supports upgrading is replaced by one that doesn't,
           whilst a connection is live, and if the replacement is running, say,
           OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement
           server will not respond to packets sent to the upgraded connection.
      
           At this point, calls will time out and the server must be reprobed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4e255721
  6. 06 4月, 2017 3 次提交
    • D
      rxrpc: Trace protocol errors in received packets · fb46f6ee
      David Howells 提交于
      Add a tracepoint (rxrpc_rx_proto) to record protocol errors in received
      packets.  The following changes are made:
      
       (1) Add a function, __rxrpc_abort_eproto(), to note a protocol error on a
           call and mark the call aborted.  This is wrapped by
           rxrpc_abort_eproto() that makes the why string usable in trace.
      
       (2) Add trace_rxrpc_rx_proto() or rxrpc_abort_eproto() to protocol error
           generation points, replacing rxrpc_abort_call() with the latter.
      
       (3) Only send an abort packet in rxkad_verify_packet*() if we actually
           managed to abort the call.
      
      Note that a trace event is also emitted if a kernel user (e.g. afs) tries
      to send data through a call when it's not in the transmission phase, though
      it's not technically a receive event.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      fb46f6ee
    • D
      rxrpc: Note a successfully aborted kernel operation · 84a4c09c
      David Howells 提交于
      Make rxrpc_kernel_abort_call() return an indication as to whether it
      actually aborted the operation or not so that kafs can trace the failure of
      the operation.  Note that 'success' in this context means changing the
      state of the call, not necessarily successfully transmitting an ABORT
      packet.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      84a4c09c
    • D
      rxrpc: Use negative error codes in rxrpc_call struct · 3a92789a
      David Howells 提交于
      Use negative error codes in struct rxrpc_call::error because that's what
      the kernel normally deals with and to make the code consistent.  We only
      turn them positive when transcribing into a cmsg for userspace recvmsg.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3a92789a
  7. 10 3月, 2017 1 次提交
  8. 08 3月, 2017 1 次提交
  9. 04 3月, 2017 1 次提交
  10. 02 3月, 2017 2 次提交
    • I
      sched/headers: Prepare to move signal wakeup & sigpending methods from... · 174cd4b1
      Ingo Molnar 提交于
      sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
      
      Fix up affected files that include this signal functionality via sched.h.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      174cd4b1
    • D
      rxrpc: Fix deadlock between call creation and sendmsg/recvmsg · 540b1c48
      David Howells 提交于
      All the routines by which rxrpc is accessed from the outside are serialised
      by means of the socket lock (sendmsg, recvmsg, bind,
      rxrpc_kernel_begin_call(), ...) and this presents a problem:
      
       (1) If a number of calls on the same socket are in the process of
           connection to the same peer, a maximum of four concurrent live calls
           are permitted before further calls need to wait for a slot.
      
       (2) If a call is waiting for a slot, it is deep inside sendmsg() or
           rxrpc_kernel_begin_call() and the entry function is holding the socket
           lock.
      
       (3) sendmsg() and recvmsg() or the in-kernel equivalents are prevented
           from servicing the other calls as they need to take the socket lock to
           do so.
      
       (4) The socket is stuck until a call is aborted and makes its slot
           available to the waiter.
      
      Fix this by:
      
       (1) Provide each call with a mutex ('user_mutex') that arbitrates access
           by the users of rxrpc separately for each specific call.
      
       (2) Make rxrpc_sendmsg() and rxrpc_recvmsg() unlock the socket as soon as
           they've got a call and taken its mutex.
      
           Note that I'm returning EWOULDBLOCK from recvmsg() if MSG_DONTWAIT is
           set but someone else has the lock.  Should I instead only return
           EWOULDBLOCK if there's nothing currently to be done on a socket, and
           sleep in this particular instance because there is something to be
           done, but we appear to be blocked by the interrupt handler doing its
           ping?
      
       (3) Make rxrpc_new_client_call() unlock the socket after allocating a new
           call, locking its user mutex and adding it to the socket's call tree.
           The call is returned locked so that sendmsg() can add data to it
           immediately.
      
           From the moment the call is in the socket tree, it is subject to
           access by sendmsg() and recvmsg() - even if it isn't connected yet.
      
       (4) Lock new service calls in the UDP data_ready handler (in
           rxrpc_new_incoming_call()) because they may already be in the socket's
           tree and the data_ready handler makes them live immediately if a user
           ID has already been preassigned.
      
           Note that the new call is locked before any notifications are sent
           that it is live, so doing mutex_trylock() *ought* to always succeed.
           Userspace is prevented from doing sendmsg() on calls that are in a
           too-early state in rxrpc_do_sendmsg().
      
       (5) Make rxrpc_new_incoming_call() return the call with the user mutex
           held so that a ping can be scheduled immediately under it.
      
           Note that it might be worth moving the ping call into
           rxrpc_new_incoming_call() and then we can drop the mutex there.
      
       (6) Make rxrpc_accept_call() take the lock on the call it is accepting and
           release the socket after adding the call to the socket's tree.  This
           is slightly tricky as we've dequeued the call by that point and have
           to requeue it.
      
           Note that requeuing emits a trace event.
      
       (7) Make rxrpc_kernel_send_data() and rxrpc_kernel_recv_data() take the
           new mutex immediately and don't bother with the socket mutex at all.
      
      This patch has the nice bonus that calls on the same socket are now to some
      extent parallelisable.
      
      Note that we might want to move rxrpc_service_prealloc() calls out from the
      socket lock and give it its own lock, so that we don't hang progress in
      other calls because we're waiting for the allocator.
      
      We probably also want to avoid calling rxrpc_notify_socket() from within
      the socket lock (rxrpc_accept_call()).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMarc Dionne <marc.c.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      540b1c48
  11. 05 1月, 2017 1 次提交
  12. 06 10月, 2016 3 次提交
    • D
      rxrpc: Need to produce an ACK for service op if op takes a long time · 9749fd2b
      David Howells 提交于
      We need to generate a DELAY ACK from the service end of an operation if we
      start doing the actual operation work and it takes longer than expected.
      This will hard-ACK the request data and allow the client to release its
      resources.
      
      To make this work:
      
       (1) We have to set the ack timer and propose an ACK when the call moves to
           the RXRPC_CALL_SERVER_ACK_REQUEST and clear the pending ACK and cancel
           the timer when we start transmitting the reply (the first DATA packet
           of the reply implicitly ACKs the request phase).
      
       (2) It must be possible to set the timer when the caller is holding
           call->state_lock, so split the lock-getting part of the timer function
           out.
      
       (3) Add trace notes for the ACK we're requesting and the timer we clear.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      9749fd2b
    • D
      rxrpc: Fix loss of PING RESPONSE ACK production due to PING ACKs · a5af7e1f
      David Howells 提交于
      Separate the output of PING ACKs from the output of other sorts of ACK so
      that if we receive a PING ACK and schedule transmission of a PING RESPONSE
      ACK, the response doesn't get cancelled by a PING ACK we happen to be
      scheduling transmission of at the same time.
      
      If a PING RESPONSE gets lost, the other side might just sit there waiting
      for it and refuse to proceed otherwise.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a5af7e1f
    • D
      rxrpc: Fix warning by splitting rxrpc_send_call_packet() · 26cb02aa
      David Howells 提交于
      Split rxrpc_send_data_packet() to separate ACK generation (which is more
      complicated) from ABORT generation.  This simplifies the code a bit and
      fixes the following warning:
      
      In file included from ../net/rxrpc/output.c:20:0:
      net/rxrpc/output.c: In function 'rxrpc_send_call_packet':
      net/rxrpc/ar-internal.h:1187:27: error: 'top' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      net/rxrpc/output.c:103:24: note: 'top' was declared here
      net/rxrpc/output.c:225:25: error: 'hard_ack' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      26cb02aa
  13. 30 9月, 2016 2 次提交
    • D
      rxrpc: Keep the call timeouts as ktimes rather than jiffies · df0adc78
      David Howells 提交于
      Keep that call timeouts as ktimes rather than jiffies so that they can be
      expressed as functions of RTT.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      df0adc78
    • D
      rxrpc: Make Tx loss-injection go through normal return and adjust tracing · a1767077
      David Howells 提交于
      In rxrpc_send_data_packet() make the loss-injection path return through the
      same code as the transmission path so that the RTT determination is
      initiated and any future timer shuffling will be done, despite the packet
      having been binned.
      
      Whilst we're at it:
      
       (1) Add to the tx_data tracepoint an indication of whether or not we're
           retransmitting a data packet.
      
       (2) When we're deciding whether or not to request an ACK, rather than
           checking if we're in fast-retransmit mode check instead if we're
           retransmitting.
      
       (3) Don't invoke the lose_skb tracepoint when losing a Tx packet as we're
           not altering the sk_buff refcount nor are we just seeing it after
           getting it off the Tx list.
      
       (4) The rxrpc_skb_tx_lost note is then no longer used so remove it.
      
       (5) rxrpc_lose_skb() no longer needs to deal with rxrpc_skb_tx_lost.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a1767077
  14. 25 9月, 2016 1 次提交
    • D
      rxrpc: Implement slow-start · 57494343
      David Howells 提交于
      Implement RxRPC slow-start, which is similar to RFC 5681 for TCP.  A
      tracepoint is added to log the state of the congestion management algorithm
      and the decisions it makes.
      
      Notes:
      
       (1) Since we send fixed-size DATA packets (apart from the final packet in
           each phase), counters and calculations are in terms of packets rather
           than bytes.
      
       (2) The ACK packet carries the equivalent of TCP SACK.
      
       (3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly
           suited to SACK of a small number of packets.  It seems that, almost
           inevitably, by the time three 'duplicate' ACKs have been seen, we have
           narrowed the loss down to one or two missing packets, and the
           FLIGHT_SIZE calculation ends up as 2.
      
       (4) In rxrpc_resend(), if there was no data that apparently needed
           retransmission, we transmit a PING ACK to ask the peer to tell us what
           its Rx window state is.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      57494343
  15. 23 9月, 2016 4 次提交
    • D
      rxrpc: Add a tracepoint for the call timer · fc7ab6d2
      David Howells 提交于
      Add a tracepoint to log call timer initiation, setting and expiry.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      fc7ab6d2
    • D
      rxrpc: Pass the last Tx packet marker in the annotation buffer · 70790dbe
      David Howells 提交于
      When the last packet of data to be transmitted on a call is queued, tx_top
      is set and then the RXRPC_CALL_TX_LAST flag is set.  Unfortunately, this
      leaves a race in the ACK processing side of things because the flag affects
      the interpretation of tx_top and also allows us to start receiving reply
      data before we've finished transmitting.
      
      To fix this, make the following changes:
      
       (1) rxrpc_queue_packet() now sets a marker in the annotation buffer
           instead of setting the RXRPC_CALL_TX_LAST flag.
      
       (2) rxrpc_rotate_tx_window() detects the marker and sets the flag in the
           same context as the routines that use it.
      
       (3) rxrpc_end_tx_phase() is simplified to just shift the call state.
           The Tx window must have been rotated before calling to discard the
           last packet.
      
       (4) rxrpc_receiving_reply() is added to handle the arrival of the first
           DATA packet of a reply to a client call (which is an implicit ACK of
           the Tx phase).
      
       (5) The last part of rxrpc_input_ack() is reordered to perform Tx
           rotation, then soft-ACK application and then to end the phase if we've
           rotated the last packet.  In the event of a terminal ACK, the soft-ACK
           application will be skipped as nAcks should be 0.
      
       (6) rxrpc_input_ackall() now has to rotate as well as ending the phase.
      
      In addition:
      
       (7) Alter the transmit tracepoint to log the rotation of the last packet.
      
       (8) Remove the no-longer relevant queue_reqack tracepoint note.  The
           ACK-REQUESTED packet header flag is now set as needed when we actually
           transmit the packet and may vary by retransmission.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      70790dbe
    • D
      rxrpc: Need to start the resend timer on initial transmission · dfc3da44
      David Howells 提交于
      When a DATA packet has its initial transmission, we may need to start or
      adjust the resend timer.  Without this we end up relying on being sent a
      NACK to initiate the resend.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      dfc3da44
    • D
      rxrpc: Preset timestamp on Tx sk_buffs · b24d2891
      David Howells 提交于
      Set the timestamp on sk_buffs holding packets to be transmitted before
      queueing them because the moment the packet is on the queue it can be seen
      by the retransmission algorithm - which may see a completely random
      timestamp.
      
      If the retransmission algorithm sees such a timestamp, it may retransmit
      the packet and, in future, tell the congestion management algorithm that
      the retransmit timer expired.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b24d2891
  16. 22 9月, 2016 4 次提交
    • D
      rxrpc: Reduce the number of ACK-Requests sent · 0d4b103c
      David Howells 提交于
      Reduce the number of ACK-Requests we set on DATA packets that we're sending
      to reduce network traffic.  We set the flag on odd-numbered DATA packets to
      start off the RTT cache until we have at least three entries in it and then
      probe once per second thereafter to keep it topped up.
      
      This could be made tunable in future.
      
      Note that from this point, the RXRPC_REQUEST_ACK flag is set on DATA
      packets as we transmit them and not stored statically in the sk_buff.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      0d4b103c
    • D
      rxrpc: Obtain RTT data by requesting ACKs on DATA packets · 50235c4b
      David Howells 提交于
      In addition to sending a PING ACK to gain RTT data, we can set the
      RXRPC_REQUEST_ACK flag on a DATA packet and get a REQUESTED-ACK ACK.  The
      ACK packet contains the serial number of the packet it is in response to,
      so we can look through the Tx buffer for a matching DATA packet.
      
      This requires that the data packets be stamped with the time of
      transmission as a ktime rather than having the resend_at time in jiffies.
      
      This further requires the resend code to do the resend determination in
      ktimes and convert to jiffies to set the timer.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      50235c4b
    • D
      rxrpc: Expedite ping response transmission · 7aa51da7
      David Howells 提交于
      Expedite the transmission of a response to a PING ACK by sending it from
      sendmsg if one is pending.  We're most likely to see a PING ACK during the
      client call Tx phase as the other side may use it to determine a number of
      parameters, such as the client's receive window size, the RTT and whether
      the client is doing slow start (similar to RFC5681).
      
      If we don't expedite it, it's left to the background processing thread to
      transmit.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7aa51da7
    • D
      rxrpc: Don't store the rxrpc header in the Tx queue sk_buffs · 5a924b89
      David Howells 提交于
      Don't store the rxrpc protocol header in sk_buffs on the transmit queue,
      but rather generate it on the fly and pass it to kernel_sendmsg() as a
      separate iov.  This reduces the amount of storage required.
      
      Note that the security header is still stored in the sk_buff as it may get
      encrypted along with the data (and doesn't change with each transmission).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5a924b89
  17. 17 9月, 2016 3 次提交
    • D
      rxrpc: Improve skb tracing · 71f3ca40
      David Howells 提交于
      Improve sk_buff tracing within AF_RXRPC by the following means:
      
       (1) Use an enum to note the event type rather than plain integers and use
           an array of event names rather than a big multi ?: list.
      
       (2) Distinguish Rx from Tx packets and account them separately.  This
           requires the call phase to be tracked so that we know what we might
           find in rxtx_buffer[].
      
       (3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the
           event type.
      
       (4) A pair of 'rotate' events are added to indicate packets that are about
           to be rotated out of the Rx and Tx windows.
      
       (5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for
           packet loss injection recording.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
       
      71f3ca40
    • D
      rxrpc: Add a tracepoint to follow the life of a packet in the Tx buffer · a124fe3e
      David Howells 提交于
      Add a tracepoint to follow the insertion of a packet into the transmit
      buffer, its transmission and its rotation out of the buffer.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a124fe3e
    • D
      rxrpc: Fix the basic transmit DATA packet content size at 1412 bytes · 182f5056
      David Howells 提交于
      Fix the basic transmit DATA packet content size at 1412 bytes so that they
      can be arbitrarily assembled into jumbo packets.
      
      In the future, I'm thinking of moving to keeping a jumbo packet header at
      the beginning of each packet in the Tx queue and creating the packet header
      on the spot when kernel_sendmsg() is invoked.  That way, jumbo packets can
      be assembled on the spur of the moment for (re-)transmission.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      182f5056
  18. 08 9月, 2016 1 次提交
    • D
      rxrpc: Rewrite the data and ack handling code · 248f219c
      David Howells 提交于
      Rewrite the data and ack handling code such that:
      
       (1) Parsing of received ACK and ABORT packets and the distribution and the
           filing of DATA packets happens entirely within the data_ready context
           called from the UDP socket.  This allows us to process and discard ACK
           and ABORT packets much more quickly (they're no longer stashed on a
           queue for a background thread to process).
      
       (2) We avoid calling skb_clone(), pskb_pull() and pskb_trim().  We instead
           keep track of the offset and length of the content of each packet in
           the sk_buff metadata.  This means we don't do any allocation in the
           receive path.
      
       (3) Jumbo DATA packet parsing is now done in data_ready context.  Rather
           than cloning the packet once for each subpacket and pulling/trimming
           it, we file the packet multiple times with an annotation for each
           indicating which subpacket is there.  From that we can directly
           calculate the offset and length.
      
       (4) A call's receive queue can be accessed without taking locks (memory
           barriers do have to be used, though).
      
       (5) Incoming calls are set up from preallocated resources and immediately
           made live.  They can than have packets queued upon them and ACKs
           generated.  If insufficient resources exist, DATA packet #1 is given a
           BUSY reply and other DATA packets are discarded).
      
       (6) sk_buffs no longer take a ref on their parent call.
      
      To make this work, the following changes are made:
      
       (1) Each call's receive buffer is now a circular buffer of sk_buff
           pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
           between the call and the socket.  This permits each sk_buff to be in
           the buffer multiple times.  The receive buffer is reused for the
           transmit buffer.
      
       (2) A circular buffer of annotations (rxtx_annotations) is kept parallel
           to the data buffer.  Transmission phase annotations indicate whether a
           buffered packet has been ACK'd or not and whether it needs
           retransmission.
      
           Receive phase annotations indicate whether a slot holds a whole packet
           or a jumbo subpacket and, if the latter, which subpacket.  They also
           note whether the packet has been decrypted in place.
      
       (3) DATA packet window tracking is much simplified.  Each phase has just
           two numbers representing the window (rx_hard_ack/rx_top and
           tx_hard_ack/tx_top).
      
           The hard_ack number is the sequence number before base of the window,
           representing the last packet the other side says it has consumed.
           hard_ack starts from 0 and the first packet is sequence number 1.
      
           The top number is the sequence number of the highest-numbered packet
           residing in the buffer.  Packets between hard_ack+1 and top are
           soft-ACK'd to indicate they've been received, but not yet consumed.
      
           Four macros, before(), before_eq(), after() and after_eq() are added
           to compare sequence numbers within the window.  This allows for the
           top of the window to wrap when the hard-ack sequence number gets close
           to the limit.
      
           Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
           to indicate when rx_top and tx_top point at the packets with the
           LAST_PACKET bit set, indicating the end of the phase.
      
       (4) Calls are queued on the socket 'receive queue' rather than packets.
           This means that we don't need have to invent dummy packets to queue to
           indicate abnormal/terminal states and we don't have to keep metadata
           packets (such as ABORTs) around
      
       (5) The offset and length of a (sub)packet's content are now passed to
           the verify_packet security op.  This is currently expected to decrypt
           the packet in place and validate it.
      
           However, there's now nowhere to store the revised offset and length of
           the actual data within the decrypted blob (there may be a header and
           padding to skip) because an sk_buff may represent multiple packets, so
           a locate_data security op is added to retrieve these details from the
           sk_buff content when needed.
      
       (6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
           individually secured and needs to be individually decrypted.  The code
           to do this is broken out into rxrpc_recvmsg_data() and shared with the
           kernel API.  It now iterates over the call's receive buffer rather
           than walking the socket receive queue.
      
      Additional changes:
      
       (1) The timers are condensed to a single timer that is set for the soonest
           of three timeouts (delayed ACK generation, DATA retransmission and
           call lifespan).
      
       (2) Transmission of ACK and ABORT packets is effected immediately from
           process-context socket ops/kernel API calls that cause them instead of
           them being punted off to a background work item.  The data_ready
           handler still has to defer to the background, though.
      
       (3) A shutdown op is added to the AF_RXRPC socket so that the AFS
           filesystem can shut down the socket and flush its own work items
           before closing the socket to deal with any in-progress service calls.
      
      Future additional changes that will need to be considered:
      
       (1) Make sure that a call doesn't hog the front of the queue by receiving
           data from the network as fast as userspace is consuming it to the
           exclusion of other calls.
      
       (2) Transmit delayed ACKs from within recvmsg() when we've consumed
           sufficiently more packets to avoid the background work item needing to
           run.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      248f219c
  19. 07 9月, 2016 3 次提交
    • D
      rxrpc: Add tracepoint for working out where aborts happen · 5a42976d
      David Howells 提交于
      Add a tracepoint for working out where local aborts happen.  Each
      tracepoint call is labelled with a 3-letter code so that they can be
      distinguished - and the DATA sequence number is added too where available.
      
      rxrpc_kernel_abort_call() also takes a 3-letter code so that AFS can
      indicate the circumstances when it aborts a call.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5a42976d
    • D
      rxrpc: Cache the security index in the rxrpc_call struct · 278ac0cd
      David Howells 提交于
      Cache the security index in the rxrpc_call struct so that we can get at it
      even when the call has been disconnected and the connection pointer
      cleared.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      278ac0cd
    • D
      rxrpc: Improve the call tracking tracepoint · fff72429
      David Howells 提交于
      Improve the call tracking tracepoint by showing more differentiation
      between some of the put and get events, including:
      
        (1) Getting and putting refs for the socket call user ID tree.
      
        (2) Getting and putting refs for queueing and failing to queue the call
            processor work item.
      
      Note that these aren't necessarily used in this patch, but will be taken
      advantage of in future patches.
      
      An enum is added for the event subtype numbers rather than coding them
      directly as decimal numbers and a table of 3-letter strings is provided
      rather than a sequence of ?: operators.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      fff72429
  20. 05 9月, 2016 2 次提交