1. 18 1月, 2014 1 次提交
  2. 15 1月, 2014 1 次提交
  3. 14 1月, 2014 1 次提交
    • N
      inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets · 70315d22
      Neal Cardwell 提交于
      Fix inet_diag_dump_icsk() to reflect the fact that both TCP_TIME_WAIT
      and TCP_FIN_WAIT2 connections are represented by inet_timewait_sock
      (not just TIME_WAIT), and for such sockets the tw_substate field holds
      the real state, which can be either TCP_TIME_WAIT or TCP_FIN_WAIT2.
      
      This brings the inet_diag state-matching code in line with the field
      it uses to populate idiag_state. This is also analogous to the info
      exported in /proc/net/tcp, where get_tcp4_sock() exports sk->sk_state
      and get_timewait4_sock() exports tw->tw_substate.
      
      Before fixing this, (a) neither "ss -nemoi" nor "ss -nemoi state
      fin-wait-2" would return a socket in TCP_FIN_WAIT2; and (b) "ss -nemoi
      state time-wait" would also return sockets in state TCP_FIN_WAIT2.
      
      This is an old bug that predates 05dbc7b5 ("tcp/dccp: remove twchain").
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70315d22
  4. 03 1月, 2014 1 次提交
    • W
      ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NIC · 7a7ffbab
      Wei-Chun Chao 提交于
      VM to VM GSO traffic is broken if it goes through VXLAN or GRE
      tunnel and the physical NIC on the host supports hardware VXLAN/GRE
      GSO offload (e.g. bnx2x and next-gen mlx4).
      
      Two issues -
      (VXLAN) VM traffic has SKB_GSO_DODGY and SKB_GSO_UDP_TUNNEL with
      SKB_GSO_TCP/UDP set depending on the inner protocol. GSO header
      integrity check fails in udp4_ufo_fragment if inner protocol is
      TCP. Also gso_segs is calculated incorrectly using skb->len that
      includes tunnel header. Fix: robust check should only be applied
      to the inner packet.
      
      (VXLAN & GRE) Once GSO header integrity check passes, NULL segs
      is returned and the original skb is sent to hardware. However the
      tunnel header is already pulled. Fix: tunnel header needs to be
      restored so that hardware can perform GSO properly on the original
      packet.
      Signed-off-by: NWei-Chun Chao <weichunc@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a7ffbab
  5. 23 12月, 2013 1 次提交
  6. 20 12月, 2013 1 次提交
    • D
      net: inet_diag: zero out uninitialized idiag_{src,dst} fields · b1aac815
      Daniel Borkmann 提交于
      Jakub reported while working with nlmon netlink sniffer that parts of
      the inet_diag_sockid are not initialized when r->idiag_family != AF_INET6.
      That is, fields of r->id.idiag_src[1 ... 3], r->id.idiag_dst[1 ... 3].
      
      In fact, it seems that we can leak 6 * sizeof(u32) byte of kernel [slab]
      memory through this. At least, in udp_dump_one(), we allocate a skb in ...
      
        rep = nlmsg_new(sizeof(struct inet_diag_msg) + ..., GFP_KERNEL);
      
      ... and then pass that to inet_sk_diag_fill() that puts the whole struct
      inet_diag_msg into the skb, where we only fill out r->id.idiag_src[0],
      r->id.idiag_dst[0] and leave the rest untouched:
      
        r->id.idiag_src[0] = inet->inet_rcv_saddr;
        r->id.idiag_dst[0] = inet->inet_daddr;
      
      struct inet_diag_msg embeds struct inet_diag_sockid that is correctly /
      fully filled out in IPv6 case, but for IPv4 not.
      
      So just zero them out by using plain memset (for this little amount of
      bytes it's probably not worth the extra check for idiag_family == AF_INET).
      
      Similarly, fix also other places where we fill that out.
      Reported-by: NJakub Zawadzki <darkjames-ws@darkjames.pl>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1aac815
  7. 19 12月, 2013 1 次提交
  8. 18 12月, 2013 1 次提交
  9. 12 12月, 2013 3 次提交
  10. 11 12月, 2013 3 次提交
    • P
      netfilter: SYNPROXY target: restrict to INPUT/FORWARD · f01b3926
      Patrick McHardy 提交于
      Fix a crash in synproxy_send_tcp() when using the SYNPROXY target in the
      PREROUTING chain caused by missing routing information.
      Reported-by: NNicki P. <xastx@gmx.de>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f01b3926
    • E
      udp: ipv4: fix an use after free in __udp4_lib_rcv() · 8afdd99a
      Eric Dumazet 提交于
      Dave Jones reported a use after free in UDP stack :
      
      [ 5059.434216] =========================
      [ 5059.434314] [ BUG: held lock freed! ]
      [ 5059.434420] 3.13.0-rc3+ #9 Not tainted
      [ 5059.434520] -------------------------
      [ 5059.434620] named/863 is freeing memory ffff88005e960000-ffff88005e96061f, with a lock still held there!
      [ 5059.434815]  (slock-AF_INET){+.-...}, at: [<ffffffff8149bd21>] udp_queue_rcv_skb+0xd1/0x4b0
      [ 5059.435012] 3 locks held by named/863:
      [ 5059.435086]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8143054d>] __netif_receive_skb_core+0x11d/0x940
      [ 5059.435295]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81467a5e>] ip_local_deliver_finish+0x3e/0x410
      [ 5059.435500]  #2:  (slock-AF_INET){+.-...}, at: [<ffffffff8149bd21>] udp_queue_rcv_skb+0xd1/0x4b0
      [ 5059.435734]
      stack backtrace:
      [ 5059.435858] CPU: 0 PID: 863 Comm: named Not tainted 3.13.0-rc3+ #9 [loadavg: 0.21 0.06 0.06 1/115 1365]
      [ 5059.436052] Hardware name:                  /D510MO, BIOS MOPNV10J.86A.0175.2010.0308.0620 03/08/2010
      [ 5059.436223]  0000000000000002 ffff88007e203ad8 ffffffff8153a372 ffff8800677130e0
      [ 5059.436390]  ffff88007e203b10 ffffffff8108cafa ffff88005e960000 ffff88007b00cfc0
      [ 5059.436554]  ffffea00017a5800 ffffffff8141c490 0000000000000246 ffff88007e203b48
      [ 5059.436718] Call Trace:
      [ 5059.436769]  <IRQ>  [<ffffffff8153a372>] dump_stack+0x4d/0x66
      [ 5059.436904]  [<ffffffff8108cafa>] debug_check_no_locks_freed+0x15a/0x160
      [ 5059.437037]  [<ffffffff8141c490>] ? __sk_free+0x110/0x230
      [ 5059.437147]  [<ffffffff8112da2a>] kmem_cache_free+0x6a/0x150
      [ 5059.437260]  [<ffffffff8141c490>] __sk_free+0x110/0x230
      [ 5059.437364]  [<ffffffff8141c5c9>] sk_free+0x19/0x20
      [ 5059.437463]  [<ffffffff8141cb25>] sock_edemux+0x25/0x40
      [ 5059.437567]  [<ffffffff8141c181>] sock_queue_rcv_skb+0x81/0x280
      [ 5059.437685]  [<ffffffff8149bd21>] ? udp_queue_rcv_skb+0xd1/0x4b0
      [ 5059.437805]  [<ffffffff81499c82>] __udp_queue_rcv_skb+0x42/0x240
      [ 5059.437925]  [<ffffffff81541d25>] ? _raw_spin_lock+0x65/0x70
      [ 5059.438038]  [<ffffffff8149bebb>] udp_queue_rcv_skb+0x26b/0x4b0
      [ 5059.438155]  [<ffffffff8149c712>] __udp4_lib_rcv+0x152/0xb00
      [ 5059.438269]  [<ffffffff8149d7f5>] udp_rcv+0x15/0x20
      [ 5059.438367]  [<ffffffff81467b2f>] ip_local_deliver_finish+0x10f/0x410
      [ 5059.438492]  [<ffffffff81467a5e>] ? ip_local_deliver_finish+0x3e/0x410
      [ 5059.438621]  [<ffffffff81468653>] ip_local_deliver+0x43/0x80
      [ 5059.438733]  [<ffffffff81467f70>] ip_rcv_finish+0x140/0x5a0
      [ 5059.438843]  [<ffffffff81468926>] ip_rcv+0x296/0x3f0
      [ 5059.438945]  [<ffffffff81430b72>] __netif_receive_skb_core+0x742/0x940
      [ 5059.439074]  [<ffffffff8143054d>] ? __netif_receive_skb_core+0x11d/0x940
      [ 5059.442231]  [<ffffffff8108c81d>] ? trace_hardirqs_on+0xd/0x10
      [ 5059.442231]  [<ffffffff81430d83>] __netif_receive_skb+0x13/0x60
      [ 5059.442231]  [<ffffffff81431c1e>] netif_receive_skb+0x1e/0x1f0
      [ 5059.442231]  [<ffffffff814334e0>] napi_gro_receive+0x70/0xa0
      [ 5059.442231]  [<ffffffffa01de426>] rtl8169_poll+0x166/0x700 [r8169]
      [ 5059.442231]  [<ffffffff81432bc9>] net_rx_action+0x129/0x1e0
      [ 5059.442231]  [<ffffffff810478cd>] __do_softirq+0xed/0x240
      [ 5059.442231]  [<ffffffff81047e25>] irq_exit+0x125/0x140
      [ 5059.442231]  [<ffffffff81004241>] do_IRQ+0x51/0xc0
      [ 5059.442231]  [<ffffffff81542bef>] common_interrupt+0x6f/0x6f
      
      We need to keep a reference on the socket, by using skb_steal_sock()
      at the right place.
      
      Note that another patch is needed to fix a race in
      udp_sk_rx_dst_set(), as we hold no lock protecting the dst.
      
      Fixes: 421b3885 ("udp: ipv4: Add udp early demux")
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8afdd99a
    • S
      inet: fix NULL pointer Oops in fib(6)_rule_suppress · 673498b8
      Stefan Tomanek 提交于
      This changes ensures that the routing entry investigated by the suppress
      function actually does point to a device struct before following that pointer,
      fixing a possible kernel oops situation when verifying the interface group
      associated with a routing table entry.
      
      According to Daniel Golle, this Oops can be triggered by a user process trying
      to establish an outgoing IPv6 connection while having no real IPv6 connectivity
      set up (only autoassigned link-local addresses).
      
      Fixes: 6ef94cfa ("fib_rules: add route suppression based on ifgroup")
      Reported-by: NDaniel Golle <daniel.golle@gmail.com>
      Tested-by: NDaniel Golle <daniel.golle@gmail.com>
      Signed-off-by: NStefan Tomanek <stefan.tomanek@wertarbyte.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      673498b8
  11. 06 12月, 2013 1 次提交
    • E
      tcp_memcontrol: Cleanup/fix cg_proto->memory_pressure handling. · 7f2cbdc2
      Eric W. Biederman 提交于
      kill memcg_tcp_enter_memory_pressure.  The only function of
      memcg_tcp_enter_memory_pressure was to reduce deal with the
      unnecessary abstraction that was tcp_memcontrol.  Now that struct
      tcp_memcontrol is gone remove this unnecessary function, the
      unnecessary function pointer, and modify sk_enter_memory_pressure to
      set this field directly, just as sk_leave_memory_pressure cleas this
      field directly.
      
      This fixes a small bug I intruduced when killing struct tcp_memcontrol
      that caused memcg_tcp_enter_memory_pressure to never be called and
      thus failed to ever set cg_proto->memory_pressure.
      
      Remove the cg_proto enter_memory_pressure function as it now serves
      no useful purpose.
      
      Don't test cg_proto->memory_presser in sk_leave_memory_pressure before
      clearing it.  The test was originally there to ensure that the pointer
      was non-NULL.  Now that cg_proto is not a pointer the pointer does not
      matter.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f2cbdc2
  12. 30 11月, 2013 2 次提交
  13. 29 11月, 2013 1 次提交
  14. 24 11月, 2013 4 次提交
  15. 21 11月, 2013 1 次提交
    • A
      ipv4: fix race in concurrent ip_route_input_slow() · dcdfdf56
      Alexei Starovoitov 提交于
      CPUs can ask for local route via ip_route_input_noref() concurrently.
      if nh_rth_input is not cached yet, CPUs will proceed to allocate
      equivalent DSTs on 'lo' and then will try to cache them in nh_rth_input
      via rt_cache_route()
      Most of the time they succeed, but on occasion the following two lines:
      	orig = *p;
      	prev = cmpxchg(p, orig, rt);
      in rt_cache_route() do race and one of the cpus fails to complete cmpxchg.
      But ip_route_input_slow() doesn't check the return code of rt_cache_route(),
      so dst is leaking. dst_destroy() is never called and 'lo' device
      refcnt doesn't go to zero, which can be seen in the logs as:
      	unregister_netdevice: waiting for lo to become free. Usage count = 1
      Adding mdelay() between above two lines makes it easily reproducible.
      Fix it similar to nh_pcpu_rth_output case.
      
      Fixes: d2d68ba9 ("ipv4: Cache input routes in fib_info nexthops.")
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dcdfdf56
  16. 20 11月, 2013 3 次提交
    • J
      genetlink: only pass array to genl_register_family_with_ops() · c53ed742
      Johannes Berg 提交于
      As suggested by David Miller, make genl_register_family_with_ops()
      a macro and pass only the array, evaluating ARRAY_SIZE() in the
      macro, this is a little safer.
      
      The openvswitch has some indirection, assing ops/n_ops directly in
      that code. This might ultimately just assign the pointers in the
      family initializations, saving the struct genl_family_and_ops and
      code (once mcast groups are handled differently.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c53ed742
    • A
      tcp: don't update snd_nxt, when a socket is switched from repair mode · dbde4979
      Andrey Vagin 提交于
      snd_nxt must be updated synchronously with sk_send_head.  Otherwise
      tp->packets_out may be updated incorrectly, what may bring a kernel panic.
      
      Here is a kernel panic from my host.
      [  103.043194] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
      [  103.044025] IP: [<ffffffff815aaaaf>] tcp_rearm_rto+0xcf/0x150
      ...
      [  146.301158] Call Trace:
      [  146.301158]  [<ffffffff815ab7f0>] tcp_ack+0xcc0/0x12c0
      
      Before this panic a tcp socket was restored. This socket had sent and
      unsent data in the write queue. Sent data was restored in repair mode,
      then the socket was switched from reapair mode and unsent data was
      restored. After that the socket was switched back into repair mode.
      
      In that moment we had a socket where write queue looks like this:
      snd_una    snd_nxt   write_seq
         |_________|________|
                   |
      	  sk_send_head
      
      After a second switching from repair mode the state of socket was
      changed:
      
      snd_una          snd_nxt, write_seq
         |_________ ________|
                   |
      	  sk_send_head
      
      This state is inconsistent, because snd_nxt and sk_send_head are not
      synchronized.
      
      Bellow you can find a call trace, how packets_out can be incremented
      twice for one skb, if snd_nxt and sk_send_head are not synchronized.
      In this case packets_out will be always positive, even when
      sk_write_queue is empty.
      
      tcp_write_wakeup
      	skb = tcp_send_head(sk);
      	tcp_fragment
      		if (!before(tp->snd_nxt, TCP_SKB_CB(buff)->end_seq))
      			tcp_adjust_pcount(sk, skb, diff);
      	tcp_event_new_data_sent
      		tp->packets_out += tcp_skb_pcount(skb);
      
      I think update of snd_nxt isn't required, when a socket is switched from
      repair mode.  Because it's initialized in tcp_connect_init. Then when a
      write queue is restored, snd_nxt is incremented in tcp_event_new_data_sent,
      so it's always is in consistent state.
      
      I have checked, that the bug is not reproduced with this patch and
      all tests about restoring tcp connections work fine.
      
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dbde4979
    • F
      xfrm: Release dst if this dst is improper for vti tunnel · 236c9f84
      fan.du 提交于
      After searching rt by the vti tunnel dst/src parameter,
      if this rt has neither attached to any transformation
      nor the transformation is not tunnel oriented, this rt
      should be released back to ip layer.
      
      otherwise causing dst memory leakage.
      Signed-off-by: NFan Du <fan.du@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      236c9f84
  17. 19 11月, 2013 2 次提交
  18. 18 11月, 2013 1 次提交
  19. 15 11月, 2013 5 次提交
  20. 14 11月, 2013 1 次提交
    • A
      core/dev: do not ignore dmac in dev_forward_skb() · 81b9eab5
      Alexei Starovoitov 提交于
      commit 06a23fe3
      ("core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()")
      and refactoring 64261f23
      ("dev: move skb_scrub_packet() after eth_type_trans()")
      
      are forcing pkt_type to be PACKET_HOST when skb traverses veth.
      
      which means that ip forwarding will kick in inside netns
      even if skb->eth->h_dest != dev->dev_addr
      
      Fix order of eth_type_trans() and skb_scrub_packet() in dev_forward_skb()
      and in ip_tunnel_rcv()
      
      Fixes: 06a23fe3 ("core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()")
      CC: Isaku Yamahata <yamahatanetdev@gmail.com>
      CC: Maciej Zenczykowski <zenczykowski@gmail.com>
      CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81b9eab5
  21. 08 11月, 2013 1 次提交
    • E
      inet: fix a UFO regression · dcd60771
      Eric Dumazet 提交于
      While testing virtio_net and skb_segment() changes, Hannes reported
      that UFO was sending wrong frames.
      
      It appears this was introduced by a recent commit :
      8c3a897b ("inet: restore gso for vxlan")
      
      The old condition to perform IP frag was :
      
      tunnel = !!skb->encapsulation;
      ...
              if (!tunnel && proto == IPPROTO_UDP) {
      
      So the new one should be :
      
      udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
      ...
              if (udpfrag) {
      
      Initialization of udpfrag must be done before call
      to ops->callbacks.gso_segment(skb, features), as
      skb_udp_tunnel_segment() clears skb->encapsulation
      
      (We want udpfrag to be true for UFO, false for VXLAN)
      
      With help from Alexei Starovoitov
      Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dcd60771
  22. 06 11月, 2013 2 次提交
    • J
      net: Explicitly initialize u64_stats_sync structures for lockdep · 827da44c
      John Stultz 提交于
      In order to enable lockdep on seqcount/seqlock structures, we
      must explicitly initialize any locks.
      
      The u64_stats_sync structure, uses a seqcount, and thus we need
      to introduce a u64_stats_init() function and use it to initialize
      the structure.
      
      This unfortunately adds a lot of fairly trivial initialization code
      to a number of drivers. But the benefit of ensuring correctness makes
      this worth while.
      
      Because these changes are required for lockdep to be enabled, and the
      changes are quite trivial, I've not yet split this patch out into 30-some
      separate patches, as I figured it would be better to get the various
      maintainers thoughts on how to best merge this change along with
      the seqcount lockdep enablement.
      
      Feedback would be appreciated!
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Mirko Lindner <mlindner@marvell.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Roger Luethi <rl@hellgate.ch>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Simon Horman <horms@verge.net.au>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Wensong Zhang <wensong@linux-vs.org>
      Cc: netdev@vger.kernel.org
      Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      827da44c
    • H
      ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE · 482fc609
      Hannes Frederic Sowa 提交于
      Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
      their sockets won't accept and install new path mtu information and they
      will always use the interface mtu for outgoing packets. It is guaranteed
      that the packet is not fragmented locally. But we won't set the DF-Flag
      on the outgoing frames.
      
      Florian Weimer had the idea to use this flag to ensure DNS servers are
      never generating outgoing fragments. They may well be fragmented on the
      path, but the server never stores or usees path mtu values, which could
      well be forged in an attack.
      
      (The root of the problem with path MTU discovery is that there is
      no reliable way to authenticate ICMP Fragmentation Needed But DF Set
      messages because they are sent from intermediate routers with their
      source addresses, and the IMCP payload will not always contain sufficient
      information to identify a flow.)
      
      Recent research in the DNS community showed that it is possible to
      implement an attack where DNS cache poisoning is feasible by spoofing
      fragments. This work was done by Amir Herzberg and Haya Shulman:
      <https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>
      
      This issue was previously discussed among the DNS community, e.g.
      <http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
      without leading to fixes.
      
      This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
      regarding local fragmentation with UFO/CORK" for the enforcement of the
      non-fragmentable checks. If other users than ip_append_page/data should
      use this semantic too, we have to add a new flag to IPCB(skb)->flags to
      suppress local fragmentation and check for this in ip_finish_output.
      
      Many thanks to Florian Weimer for the idea and feedback while implementing
      this patch.
      
      Cc: David S. Miller <davem@davemloft.net>
      Suggested-by: NFlorian Weimer <fweimer@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      482fc609
  23. 05 11月, 2013 2 次提交
    • Y
      tcp: properly handle stretch acks in slow start · 9f9843a7
      Yuchung Cheng 提交于
      Slow start now increases cwnd by 1 if an ACK acknowledges some packets,
      regardless the number of packets. Consequently slow start performance
      is highly dependent on the degree of the stretch ACKs caused by
      receiver or network ACK compression mechanisms (e.g., delayed-ACK,
      GRO, etc).  But slow start algorithm is to send twice the amount of
      packets of packets left so it should process a stretch ACK of degree
      N as if N ACKs of degree 1, then exits when cwnd exceeds ssthresh. A
      follow up patch will use the remainder of the N (if greater than 1)
      to adjust cwnd in the congestion avoidance phase.
      
      In addition this patch retires the experimental limited slow start
      (LSS) feature. LSS has multiple drawbacks but questionable benefit. The
      fractional cwnd increase in LSS requires a loop in slow start even
      though it's rarely used. Configuring such an increase step via a global
      sysctl on different BDPS seems hard. Finally and most importantly the
      slow start overshoot concern is now better covered by the Hybrid slow
      start (hystart) enabled by default.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f9843a7
    • Y
      tcp: enable sockets to use MSG_FASTOPEN by default · 0d41cca4
      Yuchung Cheng 提交于
      Applications have started to use Fast Open (e.g., Chrome browser has
      such an optional flag) and the feature has gone through several
      generations of kernels since 3.7 with many real network tests. It's
      time to enable this flag by default for applications to test more
      conveniently and extensively.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d41cca4