1. 08 11月, 2016 1 次提交
    • P
      udp: do fwd memory scheduling on dequeue · 7c13f97f
      Paolo Abeni 提交于
      A new argument is added to __skb_recv_datagram to provide
      an explicit skb destructor, invoked under the receive queue
      lock.
      The UDP protocol uses such argument to perform memory
      reclaiming on dequeue, so that the UDP protocol does not
      set anymore skb->desctructor.
      Instead explicit memory reclaiming is performed at close() time and
      when skbs are removed from the receive queue.
      The in kernel UDP protocol users now need to call a
      skb_recv_udp() variant instead of skb_recv_datagram() to
      properly perform memory accounting on dequeue.
      
      Overall, this allows acquiring only once the receive queue
      lock on dequeue.
      
      Tested using pktgen with random src port, 64 bytes packet,
      wire-speed on a 10G link as sender and udp_sink as the receiver,
      using an l4 tuple rxhash to stress the contention, and one or more
      udp_sink instances with reuseport.
      
      nr sinks	vanilla		patched
      1		440		560
      3		2150		2300
      6		3650		3800
      9		4450		4600
      12		6250		6450
      
      v1 -> v2:
       - do rmem and allocated memory scheduling under the receive lock
       - do bulk scheduling in first_packet_length() and in udp_destruct_sock()
       - avoid the typdef for the dequeue callback
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c13f97f
  2. 13 10月, 2016 2 次提交
  3. 06 10月, 2016 12 次提交
    • D
      rxrpc: Don't request an ACK on the last DATA packet of a call's Tx phase · bf7d620a
      David Howells 提交于
      Don't request an ACK on the last DATA packet of a call's Tx phase as for a
      client there will be a reply packet or some sort of ACK to shift phase.  If
      the ACK is requested, OpenAFS sends a REQUESTED-ACK ACK with soft-ACKs in
      it and doesn't follow up with a hard-ACK.
      
      If we don't set the flag, OpenAFS will send a DELAY ACK that hard-ACKs the
      reply data, thereby allowing the call to terminate cleanly.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bf7d620a
    • D
      rxrpc: Need to produce an ACK for service op if op takes a long time · 9749fd2b
      David Howells 提交于
      We need to generate a DELAY ACK from the service end of an operation if we
      start doing the actual operation work and it takes longer than expected.
      This will hard-ACK the request data and allow the client to release its
      resources.
      
      To make this work:
      
       (1) We have to set the ack timer and propose an ACK when the call moves to
           the RXRPC_CALL_SERVER_ACK_REQUEST and clear the pending ACK and cancel
           the timer when we start transmitting the reply (the first DATA packet
           of the reply implicitly ACKs the request phase).
      
       (2) It must be possible to set the timer when the caller is holding
           call->state_lock, so split the lock-getting part of the timer function
           out.
      
       (3) Add trace notes for the ACK we're requesting and the timer we clear.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      9749fd2b
    • D
      rxrpc: Return negative error code to kernel service · cf69207a
      David Howells 提交于
      In rxrpc_kernel_recv_data(), when we return the error number incurred by a
      failed call, we must negate it before returning it as it's stored as
      positive (that's what we have to pass back to userspace).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cf69207a
    • D
      rxrpc: Add missing notification · 94bc669e
      David Howells 提交于
      The call's background processor work item needs to notify the socket when
      it completes a call so that recvmsg() or the AFS fs can deal with it.
      Without this, call expiry isn't handled.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      94bc669e
    • D
      rxrpc: Queue the call on expiry · d7833d00
      David Howells 提交于
      When a call expires, it must be queued for the background processor to deal
      with otherwise a service call that is improperly terminated will just sit
      there awaiting an ACK and won't expire.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d7833d00
    • D
      rxrpc: Partially handle OpenAFS's improper termination of calls · b3156274
      David Howells 提交于
      OpenAFS doesn't always correctly terminate client calls that it makes -
      this includes calls the OpenAFS servers make to the cache manager service.
      It should end the client call with either:
      
       (1) An ACK that has firstPacket set to one greater than the seq number of
           the reply DATA packet with the LAST_PACKET flag set (thereby
           hard-ACK'ing all packets).  nAcks should be 0 and acks[] should be
           empty (ie. no soft-ACKs).
      
       (2) An ACKALL packet.
      
      OpenAFS, though, may send an ACK packet with firstPacket set to the last
      seq number or less and soft-ACKs listed for all packets up to and including
      the last DATA packet.
      
      The transmitter, however, is obliged to keep the call live and the
      soft-ACK'd DATA packets around until they're hard-ACK'd as the receiver is
      permitted to drop any merely soft-ACK'd packet and request retransmission
      by sending an ACK packet with a NACK in it.
      
      Further, OpenAFS will also terminate a client call by beginning the next
      client call on the same connection channel.  This implicitly completes the
      previous call.
      
      This patch handles implicit ACK of a call on a channel by the reception of
      the first packet of the next call on that channel.
      
      If another call doesn't come along to implicitly ACK a call, then we have
      to time the call out.  There are some bugs there that will be addressed in
      subsequent patches.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b3156274
    • D
      rxrpc: Fix loss of PING RESPONSE ACK production due to PING ACKs · a5af7e1f
      David Howells 提交于
      Separate the output of PING ACKs from the output of other sorts of ACK so
      that if we receive a PING ACK and schedule transmission of a PING RESPONSE
      ACK, the response doesn't get cancelled by a PING ACK we happen to be
      scheduling transmission of at the same time.
      
      If a PING RESPONSE gets lost, the other side might just sit there waiting
      for it and refuse to proceed otherwise.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a5af7e1f
    • D
      rxrpc: Fix warning by splitting rxrpc_send_call_packet() · 26cb02aa
      David Howells 提交于
      Split rxrpc_send_data_packet() to separate ACK generation (which is more
      complicated) from ABORT generation.  This simplifies the code a bit and
      fixes the following warning:
      
      In file included from ../net/rxrpc/output.c:20:0:
      net/rxrpc/output.c: In function 'rxrpc_send_call_packet':
      net/rxrpc/ar-internal.h:1187:27: error: 'top' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      net/rxrpc/output.c:103:24: note: 'top' was declared here
      net/rxrpc/output.c:225:25: error: 'hard_ack' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      26cb02aa
    • D
      rxrpc: Only ping for lost reply in client call · a9f312d9
      David Howells 提交于
      When a reply is deemed lost, we send a ping to find out the other end
      received all the request data packets we sent.  This should be limited to
      client calls and we shouldn't do this on service calls.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a9f312d9
    • D
      rxrpc: Fix oops on incoming call to serviceless endpoint · 7212a57e
      David Howells 提交于
      If an call comes in to a local endpoint that isn't listening for any
      incoming calls at the moment, an oops will happen.  We need to check that
      the local endpoint's service pointer isn't NULL before we dereference it.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7212a57e
    • D
      rxrpc: Fix duplicate const · 19c0dbd5
      David Howells 提交于
      Remove a duplicate const keyword.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      19c0dbd5
    • D
      rxrpc: Accesses of rxrpc_local::service need to be RCU managed · b63452c1
      David Howells 提交于
      struct rxrpc_local->service is marked __rcu - this means that accesses of
      it need to be managed using RCU wrappers.  There are two such places in
      rxrpc_release_sock() where the value is checked and cleared.  Fix this by
      using the appropriate wrappers.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b63452c1
  4. 30 9月, 2016 12 次提交
    • D
      rxrpc: Fix the call timer handling · 405dea1d
      David Howells 提交于
      The call timer's concept of a call timeout (of which there are three) that
      is inactive is that it is the timeout has the same expiration time as the
      call expiration timeout (the expiration timer is never inactive).  However,
      I'm not resetting the timeouts when they expire, leading to repeated
      processing of expired timeouts when other timeout events occur.
      
      Fix this by:
      
       (1) Move the timer expiry detection into rxrpc_set_timer() inside the
           locked section.  This means that if a timeout is set that will expire
           immediately, we deal with it immediately.
      
       (2) If a timeout is at or before now then it has expired.  When an expiry
           is detected, an event is raised, the timeout is automatically
           inactivated and the event processor is queued.
      
       (3) If a timeout is at or after the expiry timeout then it is inactive.
           Inactive timeouts do not contribute to the timer setting.
      
       (4) The call timer callback can now just call rxrpc_set_timer() to handle
           things.
      
       (5) The call processor work function now checks the event flags rather
           than checking the timeouts directly.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      405dea1d
    • D
      rxrpc: Keep the call timeouts as ktimes rather than jiffies · df0adc78
      David Howells 提交于
      Keep that call timeouts as ktimes rather than jiffies so that they can be
      expressed as functions of RTT.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      df0adc78
    • D
      rxrpc: Remove error from struct rxrpc_skb_priv as it is unused · c31410ea
      David Howells 提交于
      Remove error from struct rxrpc_skb_priv as it is no longer used.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c31410ea
    • D
      rxrpc: The offset field in struct rxrpc_skb_priv is unnecessary · 775e5b71
      David Howells 提交于
      The offset field in struct rxrpc_skb_priv is unnecessary as the value can
      always be calculated.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      775e5b71
    • D
      rxrpc: Reduce ssthresh to peer's receive window · 08511150
      David Howells 提交于
      When we receive an ACK from the peer that tells us what the peer's receive
      window (rwind) is, we should reduce ssthresh to rwind if rwind is smaller
      than ssthresh.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      
      08511150
    • D
      rxrpc: Switch to Congestion Avoidance mode at cwnd==ssthresh · 8782def2
      David Howells 提交于
      Switch to Congestion Avoidance mode at cwnd == ssthresh rather than relying
      on cwnd getting incremented beyond ssthresh and the window size, the mode
      being shifted and then cwnd being corrected.
      
      We need to make sure we switch into CA mode so that we stop marking every
      packet for ACK.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8782def2
    • D
      rxrpc: Note serial number being ACK'd in the congestion management trace · ed1e8679
      David Howells 提交于
      Note the serial number of the packet being ACK'd in the congestion
      management trace rather than the serial number of the ACK packet.  Whilst
      the serial number of the ACK packet is useful for matching ACK packet in
      the output of wireshark, the serial number that the ACK is in response to
      is of more use in working out how different trace lines relate.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ed1e8679
    • D
      rxrpc: Request more ACKs in slow-start mode · b112a670
      David Howells 提交于
      Set the request-ACK on more DATA packets whilst we're in slow start mode so
      that we get sufficient ACKs back to supply information to configure the
      window.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b112a670
    • D
      rxrpc: Reduce the rxrpc_local::services list to a pointer · 1e9e5c95
      David Howells 提交于
      Reduce the rxrpc_local::services list to just a pointer as we don't permit
      multiple service endpoints to bind to a single transport endpoints (this is
      excluded by rxrpc_lookup_local()).
      
      The reason we don't allow this is that if you send a request to an AFS
      filesystem service, it will try to talk back to your cache manager on the
      port you sent from (this is how file change notifications are handled).  To
      prevent someone from stealing your CM callbacks, we don't let AF_RXRPC
      sockets share a UDP socket if at least one of them has a service bound.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1e9e5c95
    • D
      rxrpc: When activating client conn channels, do state check inside lock · 2629c7fa
      David Howells 提交于
      In rxrpc_activate_channels(), the connection cache state is checked outside
      of the lock, which means it can change whilst we're waking calls up,
      thereby changing whether or not we're allowed to wake calls up.
      
      Fix this by moving the check inside the locked region.  The check to see if
      all the channels are currently busy can stay outside of the locked region.
      
      Whilst we're at it:
      
       (1) Split the locked section out into its own function so that we can call
           it from other places in a later patch.
      
       (2) Determine the mask of channels dependent on the state as we're going
           to add another state in a later patch that will restrict the number of
           simultaneous calls to 1 on a connection.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      2629c7fa
    • D
      rxrpc: Make Tx loss-injection go through normal return and adjust tracing · a1767077
      David Howells 提交于
      In rxrpc_send_data_packet() make the loss-injection path return through the
      same code as the transmission path so that the RTT determination is
      initiated and any future timer shuffling will be done, despite the packet
      having been binned.
      
      Whilst we're at it:
      
       (1) Add to the tx_data tracepoint an indication of whether or not we're
           retransmitting a data packet.
      
       (2) When we're deciding whether or not to request an ACK, rather than
           checking if we're in fast-retransmit mode check instead if we're
           retransmitting.
      
       (3) Don't invoke the lose_skb tracepoint when losing a Tx packet as we're
           not altering the sk_buff refcount nor are we just seeing it after
           getting it off the Tx list.
      
       (4) The rxrpc_skb_tx_lost note is then no longer used so remove it.
      
       (5) rxrpc_lose_skb() no longer needs to deal with rxrpc_skb_tx_lost.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a1767077
    • D
      rxrpc: Fix exclusive client connections · 8732db67
      David Howells 提交于
      Exclusive connections are currently reusable (which they shouldn't be)
      because rxrpc_alloc_client_connection() checks the exclusive flag in the
      rxrpc_connection struct before it's initialised from the function
      parameters.  This means that the DONT_REUSE flag doesn't get set.
      
      Fix this by checking the function parameters for the exclusive flag.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8732db67
  5. 25 9月, 2016 8 次提交
    • D
      rxrpc: Implement slow-start · 57494343
      David Howells 提交于
      Implement RxRPC slow-start, which is similar to RFC 5681 for TCP.  A
      tracepoint is added to log the state of the congestion management algorithm
      and the decisions it makes.
      
      Notes:
      
       (1) Since we send fixed-size DATA packets (apart from the final packet in
           each phase), counters and calculations are in terms of packets rather
           than bytes.
      
       (2) The ACK packet carries the equivalent of TCP SACK.
      
       (3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly
           suited to SACK of a small number of packets.  It seems that, almost
           inevitably, by the time three 'duplicate' ACKs have been seen, we have
           narrowed the loss down to one or two missing packets, and the
           FLIGHT_SIZE calculation ends up as 2.
      
       (4) In rxrpc_resend(), if there was no data that apparently needed
           retransmission, we transmit a PING ACK to ask the peer to tell us what
           its Rx window state is.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      57494343
    • D
      rxrpc: Schedule an ACK if the reply to a client call appears overdue · 0d967960
      David Howells 提交于
      If we've sent all the request data in a client call but haven't seen any
      sign of the reply data yet, schedule an ACK to be sent to the server to
      find out if the reply data got lost.
      
      If the server hasn't yet hard-ACK'd the request data, we send a PING ACK to
      demand a response to find out whether we need to retransmit.
      
      If the server says it has received all of the data, we send an IDLE ACK to
      tell the server that we haven't received anything in the receive phase as
      yet.
      
      To make this work, a non-immediate PING ACK must carry a delay.  I've chosen
      the same as the IDLE ACK for the moment.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      0d967960
    • D
      rxrpc: Generate a summary of the ACK state for later use · 31a1b989
      David Howells 提交于
      Generate a summary of the Tx buffer packet state when an ACK is received
      for use in a later patch that does congestion management.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      31a1b989
    • D
      rxrpc: Delay the resend timer to allow for nsec->jiffies conv error · df0562a7
      David Howells 提交于
      When determining the resend timer value, we have a value in nsec but the
      timer is in jiffies which may be a million or more times more coarse.
      nsecs_to_jiffies() rounds down - which means that the resend timeout
      expressed as jiffies is very likely earlier than the one expressed as
      nanoseconds from which it was derived.
      
      The problem is that rxrpc_resend() gets triggered by the timer, but can't
      then find anything to resend yet.  It sets the timer again - but gets
      kicked off immediately again and again until the nanosecond-based expiry
      time is reached and we actually retransmit.
      
      Fix this by adding 1 to the jiffies-based resend_at value to counteract the
      rounding and make sure that the timer happens after the nanosecond-based
      expiry is passed.
      
      Alternatives would be to adjust the timestamp on the packets to align
      with the jiffie scale or to switch back to using jiffie-timestamps.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      df0562a7
    • D
      rxrpc: Reinitialise the call ACK and timer state for client reply phase · dd7c1ee5
      David Howells 提交于
      Clear the ACK reason, ACK timer and resend timer when entering the client
      reply phase when the first DATA packet is received.  New ACKs will be
      proposed once the data is queued.
      
      The resend timer is no longer relevant and we need to cancel ACKs scheduled
      to probe for a lost reply.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      dd7c1ee5
    • D
      rxrpc: Include the last reply DATA serial number in the final ACK · b69d94d7
      David Howells 提交于
      In a client call, include the serial number of the last DATA packet of the
      reply in the final ACK.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b69d94d7
    • D
      rxrpc: Send an immediate ACK if we fill in a hole · a7056c5b
      David Howells 提交于
      Send an immediate ACK if we fill in a hole in the buffer left by an
      out-of-sequence packet.  This may allow the congestion management in the peer
      to avoid a retransmission if packets got reordered on the wire.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      a7056c5b
    • D
      rxrpc: Send an ACK after every few DATA packets we receive · 805b21b9
      David Howells 提交于
      Send an ACK if we haven't sent one for the last two packets we've received.
      This keeps the other end apprised of where we've got to - which is
      important if they're doing slow-start.
      
      We do this in recvmsg so that we can dispatch a packet directly without the
      need to wake up the background thread.
      
      This should possibly be made configurable in future.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      805b21b9
  6. 23 9月, 2016 5 次提交