1. 10 11月, 2012 1 次提交
  2. 09 11月, 2012 1 次提交
  3. 08 11月, 2012 1 次提交
  4. 04 11月, 2012 3 次提交
    • A
      ipv6: introduce ip6_rt_put() · 94e187c0
      Amerigo Wang 提交于
      As suggested by Eric, we could introduce a helper function
      for ipv6 too, to avoid checking if rt is NULL before
      dst_release().
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94e187c0
    • A
      ipv6: remove a useless NULL check · 1a940835
      Amerigo Wang 提交于
      In dev_forward_change(), it is useless to check if idev->dev
      is NULL, it is always non-NULL here.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a940835
    • E
      tcp: better retrans tracking for defer-accept · e6c022a4
      Eric Dumazet 提交于
      For passive TCP connections using TCP_DEFER_ACCEPT facility,
      we incorrectly increment req->retrans each time timeout triggers
      while no SYNACK is sent.
      
      SYNACK are not sent for TCP_DEFER_ACCEPT that were established (for
      which we received the ACK from client). Only the last SYNACK is sent
      so that we can receive again an ACK from client, to move the req into
      accept queue. We plan to change this later to avoid the useless
      retransmit (and potential problem as this SYNACK could be lost)
      
      TCP_INFO later gives wrong information to user, claiming imaginary
      retransmits.
      
      Decouple req->retrans field into two independent fields :
      
      num_retrans : number of retransmit
      num_timeout : number of timeouts
      
      num_timeout is the counter that is incremented at each timeout,
      regardless of actual SYNACK being sent or not, and used to
      compute the exponential timeout.
      
      Introduce inet_rtx_syn_ack() helper to increment num_retrans
      only if ->rtx_syn_ack() succeeded.
      
      Use inet_rtx_syn_ack() from tcp_check_req() to increment num_retrans
      when we re-send a SYNACK in answer to a (retransmitted) SYN.
      Prior to this patch, we were not counting these retransmits.
      
      Change tcp_v[46]_rtx_synack() to increment TCP_MIB_RETRANSSEGS
      only if a synack packet was successfully queued.
      Reported-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
      Cc: Elliott Hughes <enh@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6c022a4
  5. 03 11月, 2012 1 次提交
  6. 02 11月, 2012 1 次提交
  7. 29 10月, 2012 2 次提交
  8. 24 10月, 2012 2 次提交
  9. 23 10月, 2012 1 次提交
    • N
      ipv6: add support of equal cost multipath (ECMP) · 51ebd318
      Nicolas Dichtel 提交于
      Each nexthop is added like a single route in the routing table. All routes
      that have the same metric/weight and destination but not the same gateway
      are considering as ECMP routes. They are linked together, through a list called
      rt6i_siblings.
      
      ECMP routes can be added in one shot, with RTA_MULTIPATH attribute or one after
      the other (in both case, the flag NLM_F_EXCL should not be set).
      
      The patch is based on a previous work from
      Luc Saillard <luc.saillard@6wind.com>.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51ebd318
  10. 17 10月, 2012 1 次提交
  11. 13 10月, 2012 1 次提交
    • A
      tcp: resets are misrouted · 4c675258
      Alexey Kuznetsov 提交于
      After commit e2446eaa ("tcp_v4_send_reset: binding oif to iif in no
      sock case").. tcp resets are always lost, when routing is asymmetric.
      Yes, backing out that patch will result in misrouting of resets for
      dead connections which used interface binding when were alive, but we
      actually cannot do anything here.  What's died that's died and correct
      handling normal unbound connections is obviously a priority.
      
      Comment to comment:
      > This has few benefits:
      >   1. tcp_v6_send_reset already did that.
      
      It was done to route resets for IPv6 link local addresses. It was a
      mistake to do so for global addresses. The patch fixes this as well.
      
      Actually, the problem appears to be even more serious than guaranteed
      loss of resets.  As reported by Sergey Soloviev <sol@eqv.ru>, those
      misrouted resets create a lot of arp traffic and huge amount of
      unresolved arp entires putting down to knees NAT firewalls which use
      asymmetric routing.
      Signed-off-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      4c675258
  12. 09 10月, 2012 1 次提交
    • E
      ipv6: gro: fix PV6_GRO_CB(skb)->proto problem · 86347245
      Eric Dumazet 提交于
      It seems IPV6_GRO_CB(skb)->proto can be destroyed in skb_gro_receive()
      if a new skb is allocated (to serve as an anchor for frag_list)
      
      We copy NAPI_GRO_CB() only (not the IPV6 specific part) in :
      
      *NAPI_GRO_CB(nskb) = *NAPI_GRO_CB(p);
      
      So we leave IPV6_GRO_CB(nskb)->proto to 0 (fresh skb allocation) instead
      of IPPROTO_TCP (6)
      
      ipv6_gro_complete() isnt able to call ops->gro_complete()
      [ tcp6_gro_complete() ]
      
      Fix this by moving proto in NAPI_GRO_CB() and getting rid of
      IPV6_GRO_CB
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86347245
  13. 08 10月, 2012 1 次提交
  14. 06 10月, 2012 1 次提交
  15. 05 10月, 2012 1 次提交
  16. 03 10月, 2012 1 次提交
    • N
      ipv6: don't add link local route when there is no link local address · 62b54dd9
      Nicolas Dichtel 提交于
      When an address is added on loopback (ip -6 a a 2002::1/128 dev lo), a route
      to fe80::/64 is added in the main table:
        unreachable fe80::/64 dev lo  proto kernel  metric 256  error -101
      
      This route does not match any prefix (no fe80:: address on lo). In fact,
      addrconf_dev_config() will not add link local address because this function
      filters interfaces by type. If the link local address is added manually, the
      route to the link local prefix will be automatically added by
      addrconf_add_linklocal().
      Note also, that this route is not deleted when the address is removed.
      
      After looking at the code, it seems that addrconf_add_lroute() is redundant with
      addrconf_add_linklocal(), because this function will add the link local route
      when the link local address is configured.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62b54dd9
  17. 02 10月, 2012 2 次提交
  18. 29 9月, 2012 1 次提交
  19. 28 9月, 2012 5 次提交
  20. 26 9月, 2012 2 次提交
  21. 25 9月, 2012 1 次提交
    • E
      net: use a per task frag allocator · 5640f768
      Eric Dumazet 提交于
      We currently use a per socket order-0 page cache for tcp_sendmsg()
      operations.
      
      This page is used to build fragments for skbs.
      
      Its done to increase probability of coalescing small write() into
      single segments in skbs still in write queue (not yet sent)
      
      But it wastes a lot of memory for applications handling many mostly
      idle sockets, since each socket holds one page in sk->sk_sndmsg_page
      
      Its also quite inefficient to build TSO 64KB packets, because we need
      about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
      page allocator more than wanted.
      
      This patch adds a per task frag allocator and uses bigger pages,
      if available. An automatic fallback is done in case of memory pressure.
      
      (up to 32768 bytes per frag, thats order-3 pages on x86)
      
      This increases TCP stream performance by 20% on loopback device,
      but also benefits on other network devices, since 8x less frags are
      mapped on transmit and unmapped on tx completion. Alexander Duyck
      mentioned a probable performance win on systems with IOMMU enabled.
      
      Its possible some SG enabled hardware cant cope with bigger fragments,
      but their ndo_start_xmit() should already handle this, splitting a
      fragment in sub fragments, since some arches have PAGE_SIZE=65536
      
      Successfully tested on various ethernet devices.
      (ixgbe, igb, bnx2x, tg3, mellanox mlx4)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5640f768
  22. 23 9月, 2012 2 次提交
  23. 22 9月, 2012 1 次提交
  24. 21 9月, 2012 3 次提交
  25. 20 9月, 2012 3 次提交
    • A
      ipv6: unify fragment thresh handling code · 6b102865
      Amerigo Wang 提交于
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Michal Kubeček <mkubecek@suse.cz>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b102865
    • A
      ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static inline · d4915c08
      Amerigo Wang 提交于
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Michal Kubeček <mkubecek@suse.cz>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4915c08
    • A
      ipv6: unify conntrack reassembly expire code with standard one · b836c99f
      Amerigo Wang 提交于
      Two years ago, Shan Wei tried to fix this:
      http://patchwork.ozlabs.org/patch/43905/
      
      The problem is that RFC2460 requires an ICMP Time
      Exceeded -- Fragment Reassembly Time Exceeded message should be
      sent to the source of that fragment, if the defragmentation
      times out.
      
      "
         If insufficient fragments are received to complete reassembly of a
         packet within 60 seconds of the reception of the first-arriving
         fragment of that packet, reassembly of that packet must be
         abandoned and all the fragments that have been received for that
         packet must be discarded.  If the first fragment (i.e., the one
         with a Fragment Offset of zero) has been received, an ICMP Time
         Exceeded -- Fragment Reassembly Time Exceeded message should be
         sent to the source of that fragment.
      "
      
      As Herbert suggested, we could actually use the standard IPv6
      reassembly code which follows RFC2460.
      
      With this patch applied, I can see ICMP Time Exceeded sent
      from the receiver when the sender sent out 3/4 fragmented
      IPv6 UDP packet.
      
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Michal Kubeček <mkubecek@suse.cz>
      Cc: David Miller <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: netfilter-devel@vger.kernel.org
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b836c99f