1. 08 6月, 2013 1 次提交
  2. 05 6月, 2013 4 次提交
  3. 03 6月, 2013 5 次提交
  4. 01 6月, 2013 4 次提交
  5. 31 5月, 2013 4 次提交
  6. 29 5月, 2013 5 次提交
  7. 28 5月, 2013 2 次提交
    • M
      ipv4: fix redirect handling for TCP packets · f96ef988
      Michal Kubecek 提交于
      Unlike ipv4_redirect() and ipv4_sk_redirect(), ip_do_redirect()
      doesn't call __build_flow_key() directly but via
      ip_rt_build_flow_key() wrapper. This leads to __build_flow_key()
      getting pointer to IPv4 header of the ICMP redirect packet
      rather than pointer to the embedded IPv4 header of the packet
      initiating the redirect.
      
      As a result, handling of ICMP redirects initiated by TCP packets
      is broken. Issue was introduced by
      
      	4895c771 ("ipv4: Add FIB nexthop exceptions.")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f96ef988
    • S
      MPLS: Add limited GSO support · 0d89d203
      Simon Horman 提交于
      In the case where a non-MPLS packet is received and an MPLS stack is
      added it may well be the case that the original skb is GSO but the
      NIC used for transmit does not support GSO of MPLS packets.
      
      The aim of this code is to provide GSO in software for MPLS packets
      whose skbs are GSO.
      
      SKB Usage:
      
      When an implementation adds an MPLS stack to a non-MPLS packet it should do
      the following to skb metadata:
      
      * Set skb->inner_protocol to the old non-MPLS ethertype of the packet.
        skb->inner_protocol is added by this patch.
      
      * Set skb->protocol to the new MPLS ethertype of the packet.
      
      * Set skb->network_header to correspond to the
        end of the L3 header, including the MPLS label stack.
      
      I have posted a patch, "[PATCH v3.29] datapath: Add basic MPLS support to
      kernel" which adds MPLS support to the kernel datapath of Open vSwtich.
      That patch sets the above requirements in datapath/actions.c:push_mpls()
      and was used to exercise this code.  The datapath patch is against the Open
      vSwtich tree but it is intended that it be added to the Open vSwtich code
      present in the mainline Linux kernel at some point.
      
      Features:
      
      I believe that the approach that I have taken is at least partially
      consistent with the handling of other protocols.  Jesse, I understand that
      you have some ideas here.  I am more than happy to change my implementation.
      
      This patch adds dev->mpls_features which may be used by devices
      to advertise features supported for MPLS packets.
      
      A new NETIF_F_MPLS_GSO feature is added for devices which support
      hardware MPLS GSO offload.  Currently no devices support this
      and MPLS GSO always falls back to software.
      
      Alternate Implementation:
      
      One possible alternate implementation is to teach netif_skb_features()
      and skb_network_protocol() about MPLS, in a similar way to their
      understanding of VLANs. I believe this would avoid the need
      for net/mpls/mpls_gso.c and in particular the calls to
      __skb_push() and __skb_push() in mpls_gso_segment().
      
      I have decided on the implementation in this patch as it should
      not introduce any overhead in the case where mpls_gso is not compiled
      into the kernel or inserted as a module.
      
      MPLS GSO suggested by Jesse Gross.
      Based in part on "v4 GRE: Add TCP segmentation offload for GRE"
      by Pravin B Shelar.
      
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d89d203
  8. 26 5月, 2013 5 次提交
  9. 24 5月, 2013 1 次提交
    • E
      tcp: xps: fix reordering issues · 547669d4
      Eric Dumazet 提交于
      commit 3853b584 ("xps: Improvements in TX queue selection")
      introduced ooo_okay flag, but the condition to set it is slightly wrong.
      
      In our traces, we have seen ACK packets being received out of order,
      and RST packets sent in response.
      
      We should test if we have any packets still in host queue.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      547669d4
  10. 23 5月, 2013 3 次提交
  11. 21 5月, 2013 1 次提交
    • E
      tcp: md5: remove spinlock usage in fast path · 71cea17e
      Eric Dumazet 提交于
      TCP md5 code uses per cpu variables but protects access to them with
      a shared spinlock, which is a contention point.
      
      [ tcp_md5sig_pool_lock is locked twice per incoming packet ]
      
      Makes things much simpler, by allocating crypto structures once, first
      time a socket needs md5 keys, and not deallocating them as they are
      really small.
      
      Next step would be to allow crypto allocations being done in a NUMA
      aware way.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71cea17e
  12. 20 5月, 2013 2 次提交
    • E
      ip_gre: fix a possible crash in ipgre_err() · 96f5a846
      Eric Dumazet 提交于
      Another fix needed in ipgre_err(), as parse_gre_header() might change
      skb->head.
      
      Bug added in commit c5441932 (GRE: Refactor GRE tunneling code.)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96f5a846
    • Y
      tcp: remove bad timeout logic in fast recovery · 3e59cb0d
      Yuchung Cheng 提交于
      tcp_timeout_skb() was intended to trigger fast recovery on timeout,
      unfortunately in reality it often causes spurious retransmission
      storms during fast recovery. The particular sign is a fast retransmit
      over the highest sacked sequence (SND.FACK).
      
      Currently the RTO timer re-arming (as in RFC6298) offers a nice cushion
      to avoid spurious timeout: when SND.UNA advances the sender re-arms
      RTO and extends the timeout by icsk_rto. The sender does not offset
      the time elapsed since the packet at SND.UNA was sent.
      
      But if the next (DUP)ACK arrives later than ~RTTVAR and triggers
      tcp_fastretrans_alert(), then tcp_timeout_skb() will mark any packet
      sent before the icsk_rto interval lost, including one that's above the
      highest sacked sequence. Most likely a large part of scorebard will be
      marked.
      
      If most packets are not lost then the subsequent DUPACKs with new SACK
      blocks will cause the sender to continue to retransmit packets beyond
      SND.FACK spuriously. Even if only one packet is lost the sender may
      falsely retransmit almost the entire window.
      
      The situation becomes common in the world of bufferbloat: the RTT
      continues to grow as the queue builds up but RTTVAR remains small and
      close to the minimum 200ms. If a data packet is lost and the DUPACK
      triggered by the next data packet is slightly delayed, then a spurious
      retransmission storm forms.
      
      As the original comment on tcp_timeout_skb() suggests: the usefulness
      of this feature is questionable. It also wastes cycles walking the
      sack scoreboard and is actually harmful because of false recovery.
      
      It's time to remove this.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NNandita Dukkipati <nanditad@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e59cb0d
  13. 17 5月, 2013 2 次提交
    • E
      tcp: speedup tcp_fixup_rcvbuf() · d2cf4367
      Eric Dumazet 提交于
      tcp_fixup_rcvbuf() contains a loop to estimate initial socket
      rcv space needed for a given mss. With large MTU (like 64K on lo),
      we can loop ~500 times and consume a lot of cpu cycles.
      
      perf top of 200 concurrent netperf -t TCP_CRR
      
      5.62%  netperf  [kernel.kallsyms]  [k] tcp_init_buffer_space
      1.71%  netperf  [kernel.kallsyms]  [k] _raw_spin_lock
      1.55%  netperf  [kernel.kallsyms]  [k] kmem_cache_free
      1.51%  netperf  [kernel.kallsyms]  [k] tcp_transmit_skb
      1.50%  netperf  [kernel.kallsyms]  [k] tcp_ack
      
      Lets use a 100% factor, and remove the loop.
      
      100% is needed anyway for tcp_adv_win_scale=1
      default value, and is also the maximum factor.
      
      Refs: commit b49960a0
            ("tcp: change tcp_adv_win_scale and tcp_rmem[2]")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2cf4367
    • E
      tcp: gso: do not generate out of order packets · 6ff50cd5
      Eric Dumazet 提交于
      GSO TCP handler has following issues :
      
      1) ooo_okay from original GSO packet is duplicated to all segments
      2) segments (but the last one) are orphaned, so transmit path can not
      get transmit queue number from the socket. This happens if GSO
      segmentation is done before stacked device for example.
      
      Result is we can send packets from a given TCP flow to different TX
      queues (if using multiqueue NICS). This generates OOO problems and
      spurious SACK & retransmits.
      
      Fix this by keeping socket pointer set for all segments.
      
      This means that every segment must also have a destructor, and the
      original gso skb truesize must be split on all segments, to keep
      precise sk->sk_wmem_alloc accounting.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ff50cd5
  14. 15 5月, 2013 1 次提交