1. 28 5月, 2015 1 次提交
    • F
      ip_fragment: don't forward defragmented DF packet · d6b915e2
      Florian Westphal 提交于
      We currently always send fragments without DF bit set.
      
      Thus, given following setup:
      
      mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280
         A           R1              R2         B
      
      Where R1 and R2 run linux with netfilter defragmentation/conntrack
      enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1
      will respond with icmp too big error if one of these fragments exceeded
      1400 bytes.
      
      However, if R1 receives fragment sizes 1200 and 100, it would
      forward the reassembled packet without refragmenting, i.e.
      R2 will send an icmp error in response to a packet that was never sent,
      citing mtu that the original sender never exceeded.
      
      The other minor issue is that a refragmentation on R1 will conceal the
      MTU of R2-B since refragmentation does not set DF bit on the fragments.
      
      This modifies ip_fragment so that we track largest fragment size seen
      both for DF and non-DF packets, and set frag_max_size to the largest
      value.
      
      If the DF fragment size is larger or equal to the non-df one, we will
      consider the packet a path mtu probe:
      We set DF bit on the reassembled skb and also tag it with a new IPCB flag
      to force refragmentation even if skb fits outdev mtu.
      
      We will also set DF bit on each fragment in this case.
      
      Joint work with Hannes Frederic Sowa.
      Reported-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6b915e2
  2. 19 5月, 2015 2 次提交
  3. 04 4月, 2015 2 次提交
  4. 06 3月, 2015 1 次提交
  5. 21 2月, 2015 1 次提交
  6. 12 11月, 2014 1 次提交
    • J
      net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited · ba7a46f1
      Joe Perches 提交于
      Use the more common dynamic_debug capable net_dbg_ratelimited
      and remove the LIMIT_NETDEBUG macro.
      
      All messages are still ratelimited.
      
      Some KERN_<LEVEL> uses are changed to KERN_DEBUG.
      
      This may have some negative impact on messages that were
      emitted at KERN_INFO that are not not enabled at all unless
      DEBUG is defined or dynamic_debug is enabled.  Even so,
      these messages are now _not_ emitted by default.
      
      This also eliminates the use of the net_msg_warn sysctl
      "/proc/sys/net/core/warnings".  For backward compatibility,
      the sysctl is not removed, but it has no function.  The extern
      declaration of net_msg_warn is removed from sock.h and made
      static in net/core/sysctl_net_core.c
      
      Miscellanea:
      
      o Update the sysctl documentation
      o Remove the embedded uses of pr_fmt
      o Coalesce format fragments
      o Realign arguments
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba7a46f1
  7. 05 11月, 2014 1 次提交
  8. 02 10月, 2014 1 次提交
  9. 03 8月, 2014 3 次提交
  10. 28 7月, 2014 9 次提交
  11. 05 5月, 2014 1 次提交
  12. 18 12月, 2013 1 次提交
    • T
      net: Add utility functions to clear rxhash · 7539fadc
      Tom Herbert 提交于
      In several places 'skb->rxhash = 0' is being done to clear the
      rxhash value in an skb.  This does not clear l4_rxhash which could
      still be set so that the rxhash wouldn't be recalculated on subsequent
      call to skb_get_rxhash.  This patch adds an explict function to clear
      all the rxhash related information in the skb properly.
      
      skb_clear_hash_if_not_l4 clears the rxhash only if it is not marked as
      l4_rxhash.
      
      Fixed up places where 'skb->rxhash = 0' was being called.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7539fadc
  13. 24 10月, 2013 1 次提交
    • H
      ipv4: initialize ip4_frags hash secret as late as possible · e7b519ba
      Hannes Frederic Sowa 提交于
      Defer the generation of the first hash secret for the ipv4 fragmentation
      cache as late as possible.
      
      ip4_frags.rnd gets initial seeded by inet_frags_init and regulary
      reseeded by inet_frag_secret_rebuild. Either we call ipqhashfn directly
      from ip_fragment.c in which case we initialize the secret directly.
      
      If we first get called by inet_frag_secret_rebuild we install a new secret
      by a manual call to get_random_bytes. This secret will be overwritten
      as soon as the first call to ipqhashfn happens. This is safe because we
      won't race while publishing the new secrets with anyone else.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7b519ba
  14. 17 4月, 2013 1 次提交
    • E
      net: drop dst before queueing fragments · 97599dc7
      Eric Dumazet 提交于
      Commit 4a94445c (net: Use ip_route_input_noref() in input path)
      added a bug in IP defragmentation handling, as non refcounted
      dst could escape an RCU protected section.
      
      Commit 64f3b9e2 (net: ip_expire() must revalidate route) fixed
      the case of timeouts, but not the general problem.
      
      Tom Parkin noticed crashes in UDP stack and provided a patch,
      but further analysis permitted us to pinpoint the root cause.
      
      Before queueing a packet into a frag list, we must drop its dst,
      as this dst has limited lifetime (RCU protected)
      
      When/if a packet is finally reassembled, we use the dst of the very
      last skb, still protected by RCU and valid, as the dst of the
      reassembled packet.
      
      Use same logic in IPv6, as there is no need to hold dst references.
      Reported-by: NTom Parkin <tparkin@katalix.com>
      Tested-by: NTom Parkin <tparkin@katalix.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97599dc7
  15. 25 3月, 2013 1 次提交
  16. 19 3月, 2013 1 次提交
    • H
      inet: limit length of fragment queue hash table bucket lists · 5a3da1fe
      Hannes Frederic Sowa 提交于
      This patch introduces a constant limit of the fragment queue hash
      table bucket list lengths. Currently the limit 128 is choosen somewhat
      arbitrary and just ensures that we can fill up the fragment cache with
      empty packets up to the default ip_frag_high_thresh limits. It should
      just protect from list iteration eating considerable amounts of cpu.
      
      If we reach the maximum length in one hash bucket a warning is printed.
      This is implemented on the caller side of inet_frag_find to distinguish
      between the different users of inet_fragment.c.
      
      I dropped the out of memory warning in the ipv4 fragment lookup path,
      because we already get a warning by the slab allocator.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a3da1fe
  17. 16 2月, 2013 1 次提交
  18. 30 1月, 2013 2 次提交
  19. 18 1月, 2013 1 次提交
    • J
      net: increase fragment memory usage limits · c2a93660
      Jesper Dangaard Brouer 提交于
      Increase the amount of memory usage limits for incomplete
      IP fragments.
      
      Arguing for new thresh high/low values:
      
       High threshold = 4 MBytes
       Low  threshold = 3 MBytes
      
      The fragmentation memory accounting code, tries to account for the
      real memory usage, by measuring both the size of frag queue struct
      (inet_frag_queue (ipv4:ipq/ipv6:frag_queue)) and the SKB's truesize.
      
      We want to be able to handle/hold-on-to enough fragments, to ensure
      good performance, without causing incomplete fragments to hurt
      scalability, by causing the number of inet_frag_queue to grow too much
      (resulting longer searches for frag queues).
      
      For IPv4, how much memory does the largest frag consume.
      
      Maximum size fragment is 64K, which is approx 44 fragments with
      MTU(1500) sized packets. Sizeof(struct ipq) is 200.  A 1500 byte
      packet results in a truesize of 2944 (not 2048 as I first assumed)
      
        (44*2944)+200 = 129736 bytes
      
      The current default high thresh of 262144 bytes, is obviously
      problematic, as only two 64K fragments can fit in the queue at the
      same time.
      
      How many 64K fragment can we fit into 4 MBytes:
      
        4*2^20/((44*2944)+200) = 32.34 fragment in queues
      
      An attacker could send a separate/distinct fake fragment packets per
      queue, causing us to allocate one inet_frag_queue per packet, and thus
      attacking the hash table and its lists.
      
      How many frag queue do we need to store, and given a current hash size
      of 64, what is the average list length.
      
      Using one MTU sized fragment per inet_frag_queue, each consuming
      (2944+200) 3144 bytes.
      
        4*2^20/(2944+200) = 1334 frag queues -> 21 avg list length
      
      An attack could send small fragments, the smallest packet I could send
      resulted in a truesize of 896 bytes (I'm a little surprised by this).
      
        4*2^20/(896+200)  = 3827 frag queues -> 59 avg list length
      
      When increasing these number, we also need to followup with
      improvements, that is going to help scalability.  Simply increasing
      the hash size, is not enough as the current implementation does not
      have a per hash bucket locking.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2a93660
  20. 11 12月, 2012 1 次提交
  21. 19 11月, 2012 1 次提交
  22. 20 9月, 2012 1 次提交
  23. 27 8月, 2012 1 次提交
  24. 27 7月, 2012 1 次提交
  25. 21 7月, 2012 1 次提交
  26. 28 6月, 2012 2 次提交
    • D
      Revert "ipv4: tcp: dont cache unconfirmed intput dst" · c10237e0
      David S. Miller 提交于
      This reverts commit c074da28.
      
      This change has several unwanted side effects:
      
      1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll
         thus never create a real cached route.
      
      2) All TCP traffic will use DST_NOCACHE and never use the routing
         cache at all.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c10237e0
    • E
      ipv4: tcp: dont cache unconfirmed intput dst · c074da28
      Eric Dumazet 提交于
      DDOS synflood attacks hit badly IP route cache.
      
      On typical machines, this cache is allowed to hold up to 8 Millions dst
      entries, 256 bytes for each, for a total of 2GB of memory.
      
      rt_garbage_collect() triggers and tries to cleanup things.
      
      Eventually route cache is disabled but machine is under fire and might
      OOM and crash.
      
      This patch exploits the new TCP early demux, to set a nocache
      boolean in case incoming TCP frame is for a not yet ESTABLISHED or
      TIMEWAIT socket.
      
      This 'nocache' boolean is then used in case dst entry is not found in
      route cache, to create an unhashed dst entry (DST_NOCACHE)
      
      SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache
      output dst for syncookies), so after this patch, a machine is able to
      absorb a DDOS synflood attack without polluting its IP route cache.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c074da28