1. 05 6月, 2014 2 次提交
  2. 03 6月, 2014 2 次提交
    • E
      net: fix inet_getid() and ipv6_select_ident() bugs · 39c36094
      Eric Dumazet 提交于
      I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
      is disabled.
      Note how GSO/TSO packets do not have monotonically incrementing ID.
      
      06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
      06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
      06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
      06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
      06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)
      
      It appears I introduced this bug in linux-3.1.
      
      inet_getid() must return the old value of peer->ip_id_count,
      not the new one.
      
      Lets revert this part, and remove the prevention of
      a null identification field in IPv6 Fragment Extension Header,
      which is dubious and not even done properly.
      
      Fixes: 87c48fa3 ("ipv6: make fragment identifications less predictable")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39c36094
    • E
      inetpeer: get rid of ip_id_count · 73f156a6
      Eric Dumazet 提交于
      Ideally, we would need to generate IP ID using a per destination IP
      generator.
      
      linux kernels used inet_peer cache for this purpose, but this had a huge
      cost on servers disabling MTU discovery.
      
      1) each inet_peer struct consumes 192 bytes
      
      2) inetpeer cache uses a binary tree of inet_peer structs,
         with a nominal size of ~66000 elements under load.
      
      3) lookups in this tree are hitting a lot of cache lines, as tree depth
         is about 20.
      
      4) If server deals with many tcp flows, we have a high probability of
         not finding the inet_peer, allocating a fresh one, inserting it in
         the tree with same initial ip_id_count, (cf secure_ip_id())
      
      5) We garbage collect inet_peer aggressively.
      
      IP ID generation do not have to be 'perfect'
      
      Goal is trying to avoid duplicates in a short period of time,
      so that reassembly units have a chance to complete reassembly of
      fragments belonging to one message before receiving other fragments
      with a recycled ID.
      
      We simply use an array of generators, and a Jenkin hash using the dst IP
      as a key.
      
      ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
      belongs (it is only used from this file)
      
      secure_ip_id() and secure_ipv6_id() no longer are needed.
      
      Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
      unnecessary decrement/increment of the number of segments.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73f156a6
  3. 24 5月, 2014 3 次提交
  4. 22 5月, 2014 3 次提交
  5. 17 5月, 2014 1 次提交
  6. 16 5月, 2014 2 次提交
  7. 15 5月, 2014 1 次提交
  8. 14 5月, 2014 6 次提交
    • H
      ipv6: fix calculation of option len in ip6_append_data · 3a1cebe7
      Hannes Frederic Sowa 提交于
      tot_len does specify the size of struct ipv6_txoptions. We need opt_flen +
      opt_nflen to calculate the overall length of additional ipv6 extensions.
      
      I found this while auditing the ipv6 output path for a memory corruption
      reported by Alexey Preobrazhensky while he fuzzed an instrumented
      AddressSanitizer kernel with trinity. This may or may not be the cause
      of the original bug.
      
      Fixes: 4df98e76 ("ipv6: pmtudisc setting not respected with UFO/CORK")
      Reported-by: NAlexey Preobrazhensky <preobr@google.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a1cebe7
    • L
      net: support marking accepting TCP sockets · 84f39b08
      Lorenzo Colitti 提交于
      When using mark-based routing, sockets returned from accept()
      may need to be marked differently depending on the incoming
      connection request.
      
      This is the case, for example, if different socket marks identify
      different networks: a listening socket may want to accept
      connections from all networks, but each connection should be
      marked with the network that the request came in on, so that
      subsequent packets are sent on the correct network.
      
      This patch adds a sysctl to mark TCP sockets based on the fwmark
      of the incoming SYN packet. If enabled, and an unmarked socket
      receives a SYN, then the SYN packet's fwmark is written to the
      connection's inet_request_sock, and later written back to the
      accepted socket when the connection is established.  If the
      socket already has a nonzero mark, then the behaviour is the same
      as it is today, i.e., the listening socket's fwmark is used.
      
      Black-box tested using user-mode linux:
      
      - IPv4/IPv6 SYN+ACK, FIN, etc. packets are routed based on the
        mark of the incoming SYN packet.
      - The socket returned by accept() is marked with the mark of the
        incoming SYN packet.
      - Tested with syncookies=1 and syncookies=2.
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84f39b08
    • L
      net: Use fwmark reflection in PMTU discovery. · 1b3c61dc
      Lorenzo Colitti 提交于
      Currently, routing lookups used for Path PMTU Discovery in
      absence of a socket or on unmarked sockets use a mark of 0.
      This causes PMTUD not to work when using routing based on
      netfilter fwmark mangling and fwmark ip rules, such as:
      
        iptables -j MARK --set-mark 17
        ip rule add fwmark 17 lookup 100
      
      This patch causes these route lookups to use the fwmark from the
      received ICMP error when the fwmark_reflect sysctl is enabled.
      This allows the administrator to make PMTUD work by configuring
      appropriate fwmark rules to mark the inbound ICMP packets.
      
      Black-box tested using user-mode linux by pointing different
      fwmarks at routing tables egressing on different interfaces, and
      using iptables mangling to mark packets inbound on each interface
      with the interface's fwmark. ICMPv4 and ICMPv6 PMTU discovery
      work as expected when mark reflection is enabled and fail when
      it is disabled.
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b3c61dc
    • L
      net: add a sysctl to reflect the fwmark on replies · e110861f
      Lorenzo Colitti 提交于
      Kernel-originated IP packets that have no user socket associated
      with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
      are emitted with a mark of zero. Add a sysctl to make them have
      the same mark as the packet they are replying to.
      
      This allows an administrator that wishes to do so to use
      mark-based routing, firewalling, etc. for these replies by
      marking the original packets inbound.
      
      Tested using user-mode linux:
       - ICMP/ICMPv6 echo replies and errors.
       - TCP RST packets (IPv4 and IPv6).
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e110861f
    • D
      tcp: IPv6 support for fastopen server · 3a19ce0e
      Daniel Lee 提交于
      After all the preparatory works, supporting IPv6 in Fast Open is now easy.
      We pretty much just mirror v4 code. The only difference is how we
      generate the Fast Open cookie for IPv6 sockets. Since Fast Open cookie
      is 128 bits and we use AES 128, we use CBC-MAC to encrypt both the
      source and destination IPv6 addresses since the cookie is a MAC tag.
      Signed-off-by: NDaniel Lee <longinus00@gmail.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NJerry Chu <hkchu@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a19ce0e
    • Y
      tcp: improve fastopen icmp handling · 0a672f74
      Yuchung Cheng 提交于
      If a fast open socket is already accepted by the user, it should
      be treated like a connected socket to record the ICMP error in
      sk_softerr, so the user can fetch it. Do that in both tcp_v4_err
      and tcp_v6_err.
      
      Also refactor the sequence window check to improve readability
      (e.g., there were two local variables named 'req').
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDaniel Lee <longinus00@gmail.com>
      Signed-off-by: NJerry Chu <hkchu@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a672f74
  9. 13 5月, 2014 2 次提交
  10. 12 5月, 2014 3 次提交
  11. 09 5月, 2014 4 次提交
  12. 08 5月, 2014 2 次提交
    • W
      net: clean up snmp stats code · 698365fa
      WANG Cong 提交于
      commit 8f0ea0fe (snmp: reduce percpu needs by 50%)
      reduced snmp array size to 1, so technically it doesn't have to be
      an array any more. What's more, after the following commit:
      
      	commit 933393f5
      	Date:   Thu Dec 22 11:58:51 2011 -0600
      
      	    percpu: Remove irqsafe_cpu_xxx variants
      
      	    We simply say that regular this_cpu use must be safe regardless of
      	    preemption and interrupt state.  That has no material change for x86
      	    and s390 implementations of this_cpu operations.  However, arches that
      	    do not provide their own implementation for this_cpu operations will
      	    now get code generated that disables interrupts instead of preemption.
      
      probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
      almost 3 years, no one complains.
      
      So, just convert the array to a single pointer and remove snmp_mib_init()
      and snmp_mib_free() as well.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      698365fa
    • F
      net: ipv6: send pkttoobig immediately if orig frag size > mtu · 418a3156
      Florian Westphal 提交于
      If conntrack defragments incoming ipv6 frags it stores largest original
      frag size in ip6cb and sets ->local_df.
      
      We must thus first test the largest original frag size vs. mtu, and not
      vice versa.
      
      Without this patch PKTTOOBIG is still generated in ip6_fragment() later
      in the stack, but
      
      1) IPSTATS_MIB_INTOOBIGERRORS won't increment
      2) packet did (needlessly) traverse netfilter postrouting hook.
      
      Fixes: fe6cc55f ("net: ip, ipv6: handle gso skbs in forwarding path")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      418a3156
  13. 06 5月, 2014 5 次提交
  14. 01 5月, 2014 1 次提交
  15. 30 4月, 2014 1 次提交
  16. 29 4月, 2014 1 次提交
  17. 25 4月, 2014 1 次提交