1. 11 3月, 2014 1 次提交
  2. 07 3月, 2014 1 次提交
  3. 04 3月, 2014 1 次提交
  4. 28 2月, 2014 2 次提交
  5. 27 2月, 2014 2 次提交
  6. 26 2月, 2014 1 次提交
    • H
      ipv4: ipv6: better estimate tunnel header cut for correct ufo handling · 91a48a2e
      Hannes Frederic Sowa 提交于
      Currently the UFO fragmentation process does not correctly handle inner
      UDP frames.
      
      (The following tcpdumps are captured on the parent interface with ufo
      disabled while tunnel has ufo enabled, 2000 bytes payload, mtu 1280,
      both sit device):
      
      IPv6:
      16:39:10.031613 IP (tos 0x0, ttl 64, id 3208, offset 0, flags [DF], proto IPv6 (41), length 1300)
          192.168.122.151 > 1.1.1.1: IP6 (hlim 64, next-header Fragment (44) payload length: 1240) 2001::1 > 2001::8: frag (0x00000001:0|1232) 44883 > distinct: UDP, length 2000
      16:39:10.031709 IP (tos 0x0, ttl 64, id 3209, offset 0, flags [DF], proto IPv6 (41), length 844)
          192.168.122.151 > 1.1.1.1: IP6 (hlim 64, next-header Fragment (44) payload length: 784) 2001::1 > 2001::8: frag (0x00000001:0|776) 58979 > 46366: UDP, length 5471
      
      We can see that fragmentation header offset is not correctly updated.
      (fragmentation id handling is corrected by 916e4cf4 ("ipv6: reuse
      ip6_frag_id from ip6_ufo_append_data")).
      
      IPv4:
      16:39:57.737761 IP (tos 0x0, ttl 64, id 3209, offset 0, flags [DF], proto IPIP (4), length 1296)
          192.168.122.151 > 1.1.1.1: IP (tos 0x0, ttl 64, id 57034, offset 0, flags [none], proto UDP (17), length 1276)
          192.168.99.1.35961 > 192.168.99.2.distinct: UDP, length 2000
      16:39:57.738028 IP (tos 0x0, ttl 64, id 3210, offset 0, flags [DF], proto IPIP (4), length 792)
          192.168.122.151 > 1.1.1.1: IP (tos 0x0, ttl 64, id 57035, offset 0, flags [none], proto UDP (17), length 772)
          192.168.99.1.13531 > 192.168.99.2.20653: UDP, length 51109
      
      In this case fragmentation id is incremented and offset is not updated.
      
      First, I aligned inet_gso_segment and ipv6_gso_segment:
      * align naming of flags
      * ipv6_gso_segment: setting skb->encapsulation is unnecessary, as we
        always ensure that the state of this flag is left untouched when
        returning from upper gso segmenation function
      * ipv6_gso_segment: move skb_reset_inner_headers below updating the
        fragmentation header data, we don't care for updating fragmentation
        header data
      * remove currently unneeded comment indicating skb->encapsulation might
        get changed by upper gso_segment callback (gre and udp-tunnel reset
        encapsulation after segmentation on each fragment)
      
      If we encounter an IPIP or SIT gso skb we now check for the protocol ==
      IPPROTO_UDP and that we at least have already traversed another ip(6)
      protocol header.
      
      The reason why we have to special case GSO_IPIP and GSO_SIT is that
      we reset skb->encapsulation to 0 while skb_mac_gso_segment the inner
      protocol of GSO_UDP_TUNNEL or GSO_GRE packets.
      Reported-by: NWolfgang Walter <linux@stwm.de>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91a48a2e
  7. 22 2月, 2014 1 次提交
  8. 21 2月, 2014 1 次提交
  9. 20 2月, 2014 1 次提交
  10. 19 2月, 2014 1 次提交
  11. 18 2月, 2014 4 次提交
  12. 17 2月, 2014 1 次提交
  13. 15 2月, 2014 1 次提交
  14. 14 2月, 2014 1 次提交
    • F
      net: ip, ipv6: handle gso skbs in forwarding path · fe6cc55f
      Florian Westphal 提交于
      Marcelo Ricardo Leitner reported problems when the forwarding link path
      has a lower mtu than the incoming one if the inbound interface supports GRO.
      
      Given:
      Host <mtu1500> R1 <mtu1200> R2
      
      Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.
      
      In this case, the kernel will fail to send ICMP fragmentation needed
      messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
      checks in forward path. Instead, Linux tries to send out packets exceeding
      the mtu.
      
      When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
      not fragment the packets when forwarding, and again tries to send out
      packets exceeding R1-R2 link mtu.
      
      This alters the forwarding dstmtu checks to take the individual gso
      segment lengths into account.
      
      For ipv6, we send out pkt too big error for gso if the individual
      segments are too big.
      
      For ipv4, we either send icmp fragmentation needed, or, if the DF bit
      is not set, perform software segmentation and let the output path
      create fragments when the packet is leaving the machine.
      It is not 100% correct as the error message will contain the headers of
      the GRO skb instead of the original/segmented one, but it seems to
      work fine in my (limited) tests.
      
      Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
      sofware segmentation.
      
      However it turns out that skb_segment() assumes skb nr_frags is related
      to mss size so we would BUG there.  I don't want to mess with it considering
      Herbert and Eric disagree on what the correct behavior should be.
      
      Hannes Frederic Sowa notes that when we would shrink gso_size
      skb_segment would then also need to deal with the case where
      SKB_MAX_FRAGS would be exceeded.
      
      This uses sofware segmentation in the forward path when we hit ipv4
      non-DF packets and the outgoing link mtu is too small.  Its not perfect,
      but given the lack of bug reports wrt. GRO fwd being broken this is a
      rare case anyway.  Also its not like this could not be improved later
      once the dust settles.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Reported-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe6cc55f
  15. 12 2月, 2014 2 次提交
  16. 10 2月, 2014 1 次提交
  17. 06 2月, 2014 2 次提交
  18. 28 1月, 2014 1 次提交
    • H
      net: Fix memory leak if TPROXY used with TCP early demux · a452ce34
      Holger Eitzenberger 提交于
      I see a memory leak when using a transparent HTTP proxy using TPROXY
      together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable):
      
      unreferenced object 0xffff88008cba4a40 (size 1696):
        comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s)
        hex dump (first 32 bytes):
          0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00  .. j@..7..2.....
          02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9
          [<ffffffff81270185>] sk_prot_alloc+0x29/0xc5
          [<ffffffff812702cf>] sk_clone_lock+0x14/0x283
          [<ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b
          [<ffffffff8129a893>] netlink_broadcast+0x14/0x16
          [<ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3
          [<ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d
          [<ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0
          [<ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e
          [<ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55
          [<ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725
          [<ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154
          [<ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514
          [<ffffffff8127aa77>] process_backlog+0xee/0x1c5
          [<ffffffff8127c949>] net_rx_action+0xa7/0x200
          [<ffffffff81209d86>] add_interrupt_randomness+0x39/0x157
      
      But there are many more, resulting in the machine going OOM after some
      days.
      
      From looking at the TPROXY code, and with help from Florian, I see
      that the memory leak is introduced in tcp_v4_early_demux():
      
        void tcp_v4_early_demux(struct sk_buff *skb)
        {
          /* ... */
      
          iph = ip_hdr(skb);
          th = tcp_hdr(skb);
      
          if (th->doff < sizeof(struct tcphdr) / 4)
              return;
      
          sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo,
                             iph->saddr, th->source,
                             iph->daddr, ntohs(th->dest),
                             skb->skb_iif);
          if (sk) {
              skb->sk = sk;
      
      where the socket is assigned unconditionally to skb->sk, also bumping
      the refcnt on it.  This is problematic, because in our case the skb
      has already a socket assigned in the TPROXY target.  This then results
      in the leak I see.
      
      The very same issue seems to be with IPv6, but haven't tested.
      Reviewed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a452ce34
  19. 25 1月, 2014 1 次提交
  20. 23 1月, 2014 1 次提交
  21. 22 1月, 2014 3 次提交
  22. 20 1月, 2014 5 次提交
  23. 19 1月, 2014 1 次提交
  24. 18 1月, 2014 3 次提交
  25. 16 1月, 2014 1 次提交
    • T
      ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE · 5b84efec
      Thomas Haller 提交于
      Refactor the deletion/update of prefix routes when removing an
      address. Now also consider IFA_F_NOPREFIXROUTE and if there is an address
      present with this flag, to not cleanup the route. Instead, assume
      that userspace is taking care of this route.
      
      Also perform the same cleanup, when userspace changes an existing address
      to add NOPREFIXROUTE (to an address that didn't have this flag). This is
      done because when the address was added, a prefix route was created for it.
      Since the user now wants to handle this route by himself, we cleanup this
      route.
      
      This cleanup of the route is not totally robust. There is no guarantee,
      that the route we are about to delete was really the one added by the
      kernel. This behavior does not change by the patch, and in practice it
      should work just fine.
      Signed-off-by: NThomas Haller <thaller@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b84efec