1. 17 9月, 2016 24 次提交
  2. 14 9月, 2016 14 次提交
    • D
      rxrpc: Add IPv6 support · 75b54cb5
      David Howells 提交于
      Add IPv6 support to AF_RXRPC.  With this, AF_RXRPC sockets can be created:
      
      	service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET6);
      
      instead of:
      
      	service = socket(AF_RXRPC, SOCK_DGRAM, PF_INET);
      
      The AFS filesystem doesn't support IPv6 at the moment, though, since that
      requires upgrades to some of the RPC calls.
      
      Note that a good portion of this patch is replacing "%pI4:%u" in print
      statements with "%pISpc" which is able to handle both protocols and print
      the port.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      75b54cb5
    • D
      rxrpc: Use rxrpc_extract_addr_from_skb() rather than doing this manually · 1c2bc7b9
      David Howells 提交于
      There are two places that want to transmit a packet in response to one just
      received and manually pick the address to reply to out of the sk_buff.
      Make them use rxrpc_extract_addr_from_skb() instead so that IPv6 is handled
      automatically.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1c2bc7b9
    • D
      rxrpc: Don't specify protocol to when creating transport socket · aaa31cbc
      David Howells 提交于
      Pass 0 as the protocol argument when creating the transport socket rather
      than IPPROTO_UDP.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      aaa31cbc
    • D
      rxrpc: Create an address for sendmsg() to bind unbound socket with · cd5892c7
      David Howells 提交于
      Create an address for sendmsg() to bind unbound socket with rather than
      using a completely blank address otherwise the transport socket creation
      will fail because it will try to use address family 0.
      
      We use the address family specified in the protocol argument when the
      AF_RXRPC socket was created and SOCK_DGRAM as the default.  For anything
      else, bind() must be used.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cd5892c7
    • D
      rxrpc: Correctly initialise, limit and transmit call->rx_winsize · 75e42126
      David Howells 提交于
      call->rx_winsize should be initialised to the sysctl setting and the sysctl
      setting should be limited to the maximum we want to permit.  Further, we
      need to place this in the ACK info instead of the sysctl setting.
      
      Furthermore, discard the idea of accepting the subpackets of a jumbo packet
      that lie beyond the receive window when the first packet of the jumbo is
      within the window.  Just discard the excess subpackets instead.  This
      allows the receive window to be opened up right to the buffer size less one
      for the dead slot.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      75e42126
    • D
      rxrpc: Fix prealloc refcounting · 3432a757
      David Howells 提交于
      The preallocated call buffer holds a ref on the calls within that buffer.
      The ref was being released in the wrong place - it worked okay for incoming
      calls to the AFS cache manager service, but doesn't work right for incoming
      calls to a userspace service.
      
      Instead of releasing an extra ref service calls in rxrpc_release_call(),
      the ref needs to be released during the acceptance/rejectance process.  To
      this end:
      
       (1) The prealloc ref is now normally released during
           rxrpc_new_incoming_call().
      
       (2) For preallocated kernel API calls, the kernel API's ref needs to be
           released when the call is discarded on socket close.
      
       (3) We shouldn't take a second ref in rxrpc_accept_call().
      
       (4) rxrpc_recvmsg_new_call() needs to get a ref of its own when it adds
           the call to the to_be_accepted socket queue.
      
      In doing (4) above, we would prefer not to put the call's refcount down to
      0 as that entails doing cleanup in softirq context, but it's unlikely as
      there are several refs held elsewhere, at least one of which must be put by
      someone in process context calling rxrpc_release_call().  However, it's not
      a problem if we do have to do that.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3432a757
    • D
      rxrpc: Adjust the call ref tracepoint to show kernel API refs · cbd00891
      David Howells 提交于
      Adjust the call ref tracepoint to show references held on a call by the
      kernel API separately as much as possible and add an additional trace to at
      the allocation point from the preallocation buffer for an incoming call.
      
      Note that this doesn't show the allocation of a client call for the kernel
      separately at the moment.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cbd00891
    • D
      rxrpc: Allow tx_winsize to grow in response to an ACK · 01fd0742
      David Howells 提交于
      Allow tx_winsize to grow when the ACK info packet shows a larger receive
      window at the other end rather than only permitting it to shrink.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      01fd0742
    • D
      rxrpc: Use skb->len not skb->data_len · 89a80ed4
      David Howells 提交于
      skb->len should be used rather than skb->data_len when referring to the
      amount of data in a packet.  This will only cause a malfunction in the
      following cases:
      
       (1) We receive a jumbo packet (validation and splitting both are wrong).
      
       (2) We see if there's extra ACK info in an ACK packet (we think it's not
           there and just ignore it).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      89a80ed4
    • D
      rxrpc: Add missing unlock in rxrpc_call_accept() · b25de360
      David Howells 提交于
      Add a missing unlock in rxrpc_call_accept() in the path taken if there's no
      call to wake up.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b25de360
    • D
      rxrpc: Requeue call for recvmsg if more data · 33b603fd
      David Howells 提交于
      rxrpc_recvmsg() needs to make sure that the call it has just been
      processing gets requeued for further attention if the buffer has been
      filled and there's more data to be consumed.  The softirq producer only
      queues the call and wakes the socket if it fills the first slot in the
      window, so userspace might end up sleeping forever otherwise, despite there
      being data available.
      
      This is not a problem provided the userspace buffer is big enough or it
      empties the buffer completely before more data comes in.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      33b603fd
    • D
      rxrpc: The IDLE ACK packet should use rxrpc_idle_ack_delay · 91c2c7b6
      David Howells 提交于
      The IDLE ACK packet should use the rxrpc_idle_ack_delay setting when the
      timer is set for it.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      91c2c7b6
    • D
      rxrpc: Add missing wakeup on Tx window rotation · bc4abfcf
      David Howells 提交于
      We need to wake up the sender when Tx window rotation due to an incoming
      ACK makes space in the buffer otherwise the sender is liable to just hang
      endlessly.
      
      This problem isn't noticeable if the Tx phase transfers no more than will
      fit in a single window or the Tx window rotates fast enough that it doesn't
      get full.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bc4abfcf
    • D
      rxrpc: Make sure we initialise the peer hash key · 08a39685
      David Howells 提交于
      Peer records created for incoming connections weren't getting their hash
      key set.  This meant that incoming calls wouldn't see more than one DATA
      packet - which is not a problem for AFS CM calls with small request data
      blobs.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      08a39685
  3. 08 9月, 2016 2 次提交
    • D
      rxrpc: Rewrite the data and ack handling code · 248f219c
      David Howells 提交于
      Rewrite the data and ack handling code such that:
      
       (1) Parsing of received ACK and ABORT packets and the distribution and the
           filing of DATA packets happens entirely within the data_ready context
           called from the UDP socket.  This allows us to process and discard ACK
           and ABORT packets much more quickly (they're no longer stashed on a
           queue for a background thread to process).
      
       (2) We avoid calling skb_clone(), pskb_pull() and pskb_trim().  We instead
           keep track of the offset and length of the content of each packet in
           the sk_buff metadata.  This means we don't do any allocation in the
           receive path.
      
       (3) Jumbo DATA packet parsing is now done in data_ready context.  Rather
           than cloning the packet once for each subpacket and pulling/trimming
           it, we file the packet multiple times with an annotation for each
           indicating which subpacket is there.  From that we can directly
           calculate the offset and length.
      
       (4) A call's receive queue can be accessed without taking locks (memory
           barriers do have to be used, though).
      
       (5) Incoming calls are set up from preallocated resources and immediately
           made live.  They can than have packets queued upon them and ACKs
           generated.  If insufficient resources exist, DATA packet #1 is given a
           BUSY reply and other DATA packets are discarded).
      
       (6) sk_buffs no longer take a ref on their parent call.
      
      To make this work, the following changes are made:
      
       (1) Each call's receive buffer is now a circular buffer of sk_buff
           pointers (rxtx_buffer) rather than a number of sk_buff_heads spread
           between the call and the socket.  This permits each sk_buff to be in
           the buffer multiple times.  The receive buffer is reused for the
           transmit buffer.
      
       (2) A circular buffer of annotations (rxtx_annotations) is kept parallel
           to the data buffer.  Transmission phase annotations indicate whether a
           buffered packet has been ACK'd or not and whether it needs
           retransmission.
      
           Receive phase annotations indicate whether a slot holds a whole packet
           or a jumbo subpacket and, if the latter, which subpacket.  They also
           note whether the packet has been decrypted in place.
      
       (3) DATA packet window tracking is much simplified.  Each phase has just
           two numbers representing the window (rx_hard_ack/rx_top and
           tx_hard_ack/tx_top).
      
           The hard_ack number is the sequence number before base of the window,
           representing the last packet the other side says it has consumed.
           hard_ack starts from 0 and the first packet is sequence number 1.
      
           The top number is the sequence number of the highest-numbered packet
           residing in the buffer.  Packets between hard_ack+1 and top are
           soft-ACK'd to indicate they've been received, but not yet consumed.
      
           Four macros, before(), before_eq(), after() and after_eq() are added
           to compare sequence numbers within the window.  This allows for the
           top of the window to wrap when the hard-ack sequence number gets close
           to the limit.
      
           Two flags, RXRPC_CALL_RX_LAST and RXRPC_CALL_TX_LAST, are added also
           to indicate when rx_top and tx_top point at the packets with the
           LAST_PACKET bit set, indicating the end of the phase.
      
       (4) Calls are queued on the socket 'receive queue' rather than packets.
           This means that we don't need have to invent dummy packets to queue to
           indicate abnormal/terminal states and we don't have to keep metadata
           packets (such as ABORTs) around
      
       (5) The offset and length of a (sub)packet's content are now passed to
           the verify_packet security op.  This is currently expected to decrypt
           the packet in place and validate it.
      
           However, there's now nowhere to store the revised offset and length of
           the actual data within the decrypted blob (there may be a header and
           padding to skip) because an sk_buff may represent multiple packets, so
           a locate_data security op is added to retrieve these details from the
           sk_buff content when needed.
      
       (6) recvmsg() now has to handle jumbo subpackets, where each subpacket is
           individually secured and needs to be individually decrypted.  The code
           to do this is broken out into rxrpc_recvmsg_data() and shared with the
           kernel API.  It now iterates over the call's receive buffer rather
           than walking the socket receive queue.
      
      Additional changes:
      
       (1) The timers are condensed to a single timer that is set for the soonest
           of three timeouts (delayed ACK generation, DATA retransmission and
           call lifespan).
      
       (2) Transmission of ACK and ABORT packets is effected immediately from
           process-context socket ops/kernel API calls that cause them instead of
           them being punted off to a background work item.  The data_ready
           handler still has to defer to the background, though.
      
       (3) A shutdown op is added to the AF_RXRPC socket so that the AFS
           filesystem can shut down the socket and flush its own work items
           before closing the socket to deal with any in-progress service calls.
      
      Future additional changes that will need to be considered:
      
       (1) Make sure that a call doesn't hog the front of the queue by receiving
           data from the network as fast as userspace is consuming it to the
           exclusion of other calls.
      
       (2) Transmit delayed ACKs from within recvmsg() when we've consumed
           sufficiently more packets to avoid the background work item needing to
           run.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      248f219c
    • D
      rxrpc: Preallocate peers, conns and calls for incoming service requests · 00e90712
      David Howells 提交于
      Make it possible for the data_ready handler called from the UDP transport
      socket to completely instantiate an rxrpc_call structure and make it
      immediately live by preallocating all the memory it might need.  The idea
      is to cut out the background thread usage as much as possible.
      
      [Note that the preallocated structs are not actually used in this patch -
       that will be done in a future patch.]
      
      If insufficient resources are available in the preallocation buffers, it
      will be possible to discard the DATA packet in the data_ready handler or
      schedule a BUSY packet without the need to schedule an attempt at
      allocation in a background thread.
      
      To this end:
      
       (1) Preallocate rxrpc_peer, rxrpc_connection and rxrpc_call structs to a
           maximum number each of the listen backlog size.  The backlog size is
           limited to a maxmimum of 32.  Only this many of each can be in the
           preallocation buffer.
      
       (2) For userspace sockets, the preallocation is charged initially by
           listen() and will be recharged by accepting or rejecting pending
           new incoming calls.
      
       (3) For kernel services {,re,dis}charging of the preallocation buffers is
           handled manually.  Two notifier callbacks have to be provided before
           kernel_listen() is invoked:
      
           (a) An indication that a new call has been instantiated.  This can be
           	 used to trigger background recharging.
      
           (b) An indication that a call is being discarded.  This is used when
           	 the socket is being released.
      
           A function, rxrpc_kernel_charge_accept() is called by the kernel
           service to preallocate a single call.  It should be passed the user ID
           to be used for that call and a callback to associate the rxrpc call
           with the kernel service's side of the ID.
      
       (4) Discard the preallocation when the socket is closed.
      
       (5) Temporarily bump the refcount on the call allocated in
           rxrpc_incoming_call() so that rxrpc_release_call() can ditch the
           preallocation ref on service calls unconditionally.  This will no
           longer be necessary once the preallocation is used.
      
      Note that this does not yet control the number of active service calls on a
      client - that will come in a later patch.
      
      A future development would be to provide a setsockopt() call that allows a
      userspace server to manually charge the preallocation buffer.  This would
      allow user call IDs to be provided in advance and the awkward manual accept
      stage to be bypassed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      00e90712