1. 15 11月, 2012 1 次提交
  2. 14 11月, 2012 1 次提交
    • E
      tcp: tcp_replace_ts_recent() should not be called from tcp_validate_incoming() · bd090dfc
      Eric Dumazet 提交于
      We added support for RFC 5961 in latest kernels but TCP fails
      to perform exhaustive check of ACK sequence.
      
      We can update our view of peer tsval from a frame that is
      later discarded by tcp_ack()
      
      This makes timestamps enabled sessions vulnerable to injection of
      a high tsval : peers start an ACK storm, since the victim
      sends a dupack each time it receives an ACK from the other peer.
      
      As tcp_validate_incoming() is called before tcp_ack(), we should
      not peform tcp_replace_ts_recent() from it, and let callers do it
      at the right time.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: H.K. Jerry Chu <hkchu@google.com>
      Cc: Romain Francoise <romain@orebokech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd090dfc
  3. 12 11月, 2012 1 次提交
  4. 04 11月, 2012 1 次提交
  5. 03 11月, 2012 1 次提交
  6. 01 11月, 2012 2 次提交
  7. 29 10月, 2012 1 次提交
  8. 23 10月, 2012 2 次提交
  9. 19 10月, 2012 2 次提交
  10. 13 10月, 2012 2 次提交
    • S
      vti: fix sparse bit endian warnings · 8437e761
      stephen hemminger 提交于
      Use be32_to_cpu instead of htonl to keep sparse happy.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8437e761
    • A
      tcp: resets are misrouted · 4c675258
      Alexey Kuznetsov 提交于
      After commit e2446eaa ("tcp_v4_send_reset: binding oif to iif in no
      sock case").. tcp resets are always lost, when routing is asymmetric.
      Yes, backing out that patch will result in misrouting of resets for
      dead connections which used interface binding when were alive, but we
      actually cannot do anything here.  What's died that's died and correct
      handling normal unbound connections is obviously a priority.
      
      Comment to comment:
      > This has few benefits:
      >   1. tcp_v6_send_reset already did that.
      
      It was done to route resets for IPv6 link local addresses. It was a
      mistake to do so for global addresses. The patch fixes this as well.
      
      Actually, the problem appears to be even more serious than guaranteed
      loss of resets.  As reported by Sergey Soloviev <sol@eqv.ru>, those
      misrouted resets create a lot of arp traffic and huge amount of
      unresolved arp entires putting down to knees NAT firewalls which use
      asymmetric routing.
      Signed-off-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      4c675258
  11. 12 10月, 2012 1 次提交
  12. 11 10月, 2012 1 次提交
  13. 09 10月, 2012 8 次提交
  14. 06 10月, 2012 1 次提交
  15. 05 10月, 2012 1 次提交
    • E
      ipv4: add a fib_type to fib_info · f4ef85bb
      Eric Dumazet 提交于
      commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops.)
      introduced a regression for forwarding.
      
      This was hard to reproduce but the symptom was that packets were
      delivered to local host instead of being forwarded.
      
      David suggested to add fib_type to fib_info so that we dont
      inadvertently share same fib_info for different purposes.
      
      With help from Julian Anastasov who provided very helpful
      hints, reproduced here :
      
      <quote>
              Can it be a problem related to fib_info reuse
      from different routes. For example, when local IP address
      is created for subnet we have:
      
      broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src
      192.168.0.1
      192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
      local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1
      
              The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
      a reused fib_info structure where we put cached routes.
      The result can be same fib_info for 192.168.0.255 and
      192.168.0.0/24. RTN_BROADCAST is cached only for input
      routes. Incoming broadcast to 192.168.0.255 can be cached
      and can cause problems for traffic forwarded to 192.168.0.0/24.
      So, this patch should solve the problem because it
      separates the broadcast from unicast traffic.
      
              And the ip_route_input_slow caching will work for
      local and broadcast input routes (above routes 1 and 3) just
      because they differ in scope and use different fib_info.
      
      </quote>
      
      Many thanks to Chris Clayton for his patience and help.
      Reported-by: NChris Clayton <chris2553@googlemail.com>
      Bisected-by: NChris Clayton <chris2553@googlemail.com>
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Tested-by: NChris Clayton <chris2553@googlemail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4ef85bb
  16. 02 10月, 2012 4 次提交
  17. 28 9月, 2012 6 次提交
  18. 26 9月, 2012 1 次提交
  19. 25 9月, 2012 2 次提交
    • E
      net: raw: revert unrelated change · 8489c1d9
      Eric Dumazet 提交于
      Commit 5640f768 ("net: use a per task frag allocator")
      accidentally contained an unrelated change to net/ipv4/raw.c,
      later committed (without the pr_err() debugging bits) in
      net tree as commit ab43ed8b (ipv4: raw: fix icmp_filter())
      
      This patch reverts this glitch, noticed by Stephen Rothwell.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8489c1d9
    • E
      net: use a per task frag allocator · 5640f768
      Eric Dumazet 提交于
      We currently use a per socket order-0 page cache for tcp_sendmsg()
      operations.
      
      This page is used to build fragments for skbs.
      
      Its done to increase probability of coalescing small write() into
      single segments in skbs still in write queue (not yet sent)
      
      But it wastes a lot of memory for applications handling many mostly
      idle sockets, since each socket holds one page in sk->sk_sndmsg_page
      
      Its also quite inefficient to build TSO 64KB packets, because we need
      about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
      page allocator more than wanted.
      
      This patch adds a per task frag allocator and uses bigger pages,
      if available. An automatic fallback is done in case of memory pressure.
      
      (up to 32768 bytes per frag, thats order-3 pages on x86)
      
      This increases TCP stream performance by 20% on loopback device,
      but also benefits on other network devices, since 8x less frags are
      mapped on transmit and unmapped on tx completion. Alexander Duyck
      mentioned a probable performance win on systems with IOMMU enabled.
      
      Its possible some SG enabled hardware cant cope with bigger fragments,
      but their ndo_start_xmit() should already handle this, splitting a
      fragment in sub fragments, since some arches have PAGE_SIZE=65536
      
      Successfully tested on various ethernet devices.
      (ixgbe, igb, bnx2x, tg3, mellanox mlx4)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5640f768
  20. 23 9月, 2012 1 次提交
    • N
      tcp: TCP Fast Open Server - record retransmits after 3WHS · 30099b2e
      Neal Cardwell 提交于
      When recording the number of SYNACK retransmits for servers using TCP
      Fast Open, fix the code to ensure that we copy over the retransmit
      count from the request_sock after we receive the ACK that completes
      the 3-way handshake.
      
      The story here is similar to that of SYNACK RTT
      measurements. Previously we were always doing this in
      tcp_v4_syn_recv_sock(). However, for TCP Fast Open connections
      tcp_v4_conn_req_fastopen() calls tcp_v4_syn_recv_sock() at the time we
      receive the SYN. So for TFO we must copy the final SYNACK retransmit
      count in tcp_rcv_state_process().
      
      Note that copying over the SYNACK retransmit count will give us the
      correct count since, as is mentioned in a comment in
      tcp_retransmit_timer(), before we receive an ACK for our SYN-ACK a TFO
      passive connection does not retransmit anything else (e.g., data or
      FIN segments).
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30099b2e