1. 22 7月, 2015 1 次提交
  2. 04 4月, 2015 1 次提交
  3. 01 2月, 2015 1 次提交
  4. 19 11月, 2014 1 次提交
  5. 12 11月, 2014 1 次提交
    • J
      net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited · ba7a46f1
      Joe Perches 提交于
      Use the more common dynamic_debug capable net_dbg_ratelimited
      and remove the LIMIT_NETDEBUG macro.
      
      All messages are still ratelimited.
      
      Some KERN_<LEVEL> uses are changed to KERN_DEBUG.
      
      This may have some negative impact on messages that were
      emitted at KERN_INFO that are not not enabled at all unless
      DEBUG is defined or dynamic_debug is enabled.  Even so,
      these messages are now _not_ emitted by default.
      
      This also eliminates the use of the net_msg_warn sysctl
      "/proc/sys/net/core/warnings".  For backward compatibility,
      the sysctl is not removed, but it has no function.  The extern
      declaration of net_msg_warn is removed from sock.h and made
      static in net/core/sysctl_net_core.c
      
      Miscellanea:
      
      o Update the sysctl documentation
      o Remove the embedded uses of pr_fmt
      o Coalesce format fragments
      o Realign arguments
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba7a46f1
  6. 24 9月, 2014 1 次提交
    • E
      icmp: add a global rate limitation · 4cdf507d
      Eric Dumazet 提交于
      Current ICMP rate limiting uses inetpeer cache, which is an RBL tree
      protected by a lock, meaning that hosts can be stuck hard if all cpus
      want to check ICMP limits.
      
      When say a DNS or NTP server process is restarted, inetpeer tree grows
      quick and machine comes to its knees.
      
      iptables can not help because the bottleneck happens before ICMP
      messages are even cooked and sent.
      
      This patch adds a new global limitation, using a token bucket filter,
      controlled by two new sysctl :
      
      icmp_msgs_per_sec - INTEGER
          Limit maximal number of ICMP packets sent per second from this host.
          Only messages whose type matches icmp_ratemask are
          controlled by this limit.
          Default: 1000
      
      icmp_msgs_burst - INTEGER
          icmp_msgs_per_sec controls number of ICMP packets sent per second,
          while icmp_msgs_burst controls the burst size of these packets.
          Default: 50
      
      Note that if we really want to send millions of ICMP messages per
      second, we might extend idea and infra added in commit 04ca6973
      ("ip: make IP identifiers less predictable") :
      add a token bucket in the ip_idents hash and no longer rely on inetpeer.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cdf507d
  7. 03 8月, 2014 1 次提交
  8. 01 8月, 2014 1 次提交
  9. 08 7月, 2014 1 次提交
    • E
      ipv4: icmp: Fix pMTU handling for rare case · 68b7107b
      Edward Allcutt 提交于
      Some older router implementations still send Fragmentation Needed
      errors with the Next-Hop MTU field set to zero. This is explicitly
      described as an eventuality that hosts must deal with by the
      standard (RFC 1191) since older standards specified that those
      bits must be zero.
      
      Linux had a generic (for all of IPv4) implementation of the algorithm
      described in the RFC for searching a list of MTU plateaus for a good
      value. Commit 46517008 ("ipv4: Kill ip_rt_frag_needed().")
      removed this as part of the changes to remove the routing cache.
      Subsequently any Fragmentation Needed packet with a zero Next-Hop
      MTU has been discarded without being passed to the per-protocol
      handlers or notifying userspace for raw sockets.
      
      When there is a router which does not implement RFC 1191 on an
      MTU limited path then this results in stalled connections since
      large packets are discarded and the local protocols are not
      notified so they never attempt to lower the pMTU.
      
      One example I have seen is an OpenBSD router terminating IPSec
      tunnels. It's worth pointing out that this case is distinct from
      the BSD 4.2 bug which incorrectly calculated the Next-Hop MTU
      since the commit in question dismissed that as a valid concern.
      
      All of the per-protocols handlers implement the simple approach from
      RFC 1191 of immediately falling back to the minimum value. Although
      this is sub-optimal it is vastly preferable to connections hanging
      indefinitely.
      
      Remove the Next-Hop MTU != 0 check and allow such packets
      to follow the normal path.
      
      Fixes: 46517008 ("ipv4: Kill ip_rt_frag_needed().")
      Signed-off-by: NEdward Allcutt <edward.allcutt@openmarket.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68b7107b
  10. 14 5月, 2014 1 次提交
    • L
      net: add a sysctl to reflect the fwmark on replies · e110861f
      Lorenzo Colitti 提交于
      Kernel-originated IP packets that have no user socket associated
      with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
      are emitted with a mark of zero. Add a sysctl to make them have
      the same mark as the packet they are replying to.
      
      This allows an administrator that wishes to do so to use
      mark-based routing, firewalling, etc. for these replies by
      marking the original packets inbound.
      
      Tested using user-mode linux:
       - ICMP/ICMPv6 echo replies and errors.
       - TCP RST packets (IPv4 and IPv6).
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e110861f
  11. 09 5月, 2014 1 次提交
  12. 14 1月, 2014 1 次提交
    • H
      ipv4: introduce hardened ip_no_pmtu_disc mode · 8ed1dc44
      Hannes Frederic Sowa 提交于
      This new ip_no_pmtu_disc mode only allowes fragmentation-needed errors
      to be honored by protocols which do more stringent validation on the
      ICMP's packet payload. This knob is useful for people who e.g. want to
      run an unmodified DNS server in a namespace where they need to use pmtu
      for TCP connections (as they are used for zone transfers or fallback
      for requests) but don't want to use possibly spoofed UDP pmtu information.
      
      Currently the whitelisted protocols are TCP, SCTP and DCCP as they check
      if the returned packet is in the window or if the association is valid.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: John Heffner <johnwheffner@gmail.com>
      Suggested-by: NFlorian Weimer <fweimer@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ed1dc44
  13. 19 12月, 2013 2 次提交
  14. 29 9月, 2013 1 次提交
    • F
      ipv4: processing ancillary IP_TOS or IP_TTL · aa661581
      Francesco Fusco 提交于
      If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
      packets with the specified TTL or TOS overriding the socket values specified
      with the traditional setsockopt().
      
      The struct inet_cork stores the values of TOS, TTL and priority that are
      passed through the struct ipcm_cookie. If there are user-specified TOS
      (tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
      used to override the per-socket values. In case of TOS also the priority
      is changed accordingly.
      
      Two helper functions get_rttos and get_rtconn_flags are defined to take
      into account the presence of a user specified TOS value when computing
      RT_TOS and RT_CONN_FLAGS.
      Signed-off-by: NFrancesco Fusco <ffusco@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa661581
  15. 03 6月, 2013 2 次提交
  16. 29 5月, 2013 1 次提交
  17. 26 5月, 2013 1 次提交
    • L
      net: ipv6: Add IPv6 support to the ping socket. · 6d0bfe22
      Lorenzo Colitti 提交于
      This adds the ability to send ICMPv6 echo requests without a
      raw socket. The equivalent ability for ICMPv4 was added in
      2011.
      
      Instead of having separate code paths for IPv4 and IPv6, make
      most of the code in net/ipv4/ping.c dual-stack and only add a
      few IPv6-specific bits (like the protocol definition) to a new
      net/ipv6/ping.c. Hopefully this will reduce divergence and/or
      duplication of bugs in the future.
      
      Caveats:
      
      - Setting options via ancillary data (e.g., using IPV6_PKTINFO
        to specify the outgoing interface) is not yet supported.
      - There are no separate security settings for IPv4 and IPv6;
        everything is controlled by /proc/net/ipv4/ping_group_range.
      - The proc interface does not yet display IPv6 ping sockets
        properly.
      
      Tested with a patched copy of ping6 and using raw socket calls.
      Compiles and works with all of CONFIG_IPV6={n,m,y}.
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d0bfe22
  18. 30 4月, 2013 1 次提交
  19. 23 2月, 2013 1 次提交
  20. 27 11月, 2012 1 次提交
  21. 24 7月, 2012 2 次提交
    • D
      ipv4: Prepare for change of rt->rt_iif encoding. · 92101b3b
      David S. Miller 提交于
      Use inet_iif() consistently, and for TCP record the input interface of
      cached RX dst in inet sock.
      
      rt->rt_iif is going to be encoded differently, so that we can
      legitimately cache input routes in the FIB info more aggressively.
      
      When the input interface is "use SKB device index" the rt->rt_iif will
      be set to zero.
      
      This forces us to move the TCP RX dst cache installation into the ipv4
      specific code, and as well it should since doing the route caching for
      ipv6 is pointless at the moment since it is not inspected in the ipv6
      input paths yet.
      
      Also, remove the unlikely on dst->obsolete, all ipv4 dsts have
      obsolete set to a non-zero value to force invocation of the check
      callback.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92101b3b
    • D
      ipv4: Really ignore ICMP address requests/replies. · 838942a5
      David S. Miller 提交于
      Alexey removed kernel side support for requests, and the
      only thing we do for replies is log a message if something
      doesn't look right.
      
      As Alexey's comment indicates, this belongs in userspace (if
      anywhere), and thus we can safely just get rid of this code.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      838942a5
  22. 12 7月, 2012 5 次提交
  23. 11 7月, 2012 1 次提交
  24. 28 6月, 2012 1 次提交
    • D
      ipv4: Create and use fib_compute_spec_dst() helper. · 35ebf65e
      David S. Miller 提交于
      The specific destination is the host we direct unicast replies to.
      Usually this is the original packet source address, but if we are
      responding to a multicast or broadcast packet we have to use something
      different.
      
      Specifically we must use the source address we would use if we were to
      send a packet to the unicast source of the original packet.
      
      The routing cache precomputes this value, but we want to remove that
      precomputation because it creates a hard dependency on the expensive
      rpfilter source address validation which we'd like to make cheaper.
      
      There are only three places where this matters:
      
      1) ICMP replies.
      
      2) pktinfo CMSG
      
      3) IP options
      
      Now there will be no real users of rt->rt_spec_dst and we can simply
      remove it altogether.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35ebf65e
  25. 20 6月, 2012 1 次提交
  26. 11 6月, 2012 1 次提交
    • D
      ipv4: Kill ip_rt_frag_needed(). · 46517008
      David S. Miller 提交于
      There is zero point to this function.
      
      It's only real substance is to perform an extremely outdated BSD4.2
      ICMP check, which we can safely remove.  If you really have a MTU
      limited link being routed by a BSD4.2 derived system, here's a nickel
      go buy yourself a real router.
      
      The other actions of ip_rt_frag_needed(), checking and conditionally
      updating the peer, are done by the per-protocol handlers of the ICMP
      event.
      
      TCP, UDP, et al. have a handler which will receive this event and
      transmit it back into the associated route via dst_ops->update_pmtu().
      
      This simplification is important, because it eliminates the one place
      where we do not have a proper route context in which to make an
      inetpeer lookup.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46517008
  27. 09 6月, 2012 1 次提交
  28. 16 5月, 2012 1 次提交
  29. 29 3月, 2012 1 次提交
  30. 13 3月, 2012 1 次提交
  31. 12 3月, 2012 1 次提交
    • J
      net: Convert printks to pr_<level> · 058bd4d2
      Joe Perches 提交于
      Use a more current kernel messaging style.
      
      Convert a printk block to print_hex_dump.
      Coalesce formats, align arguments.
      Use %s, __func__ instead of embedding function names.
      
      Some messages that were prefixed with <foo>_close are
      now prefixed with <foo>_fini.  Some ah4 and esp messages
      are now not prefixed with "ip ".
      
      The intent of this patch is to later add something like
        #define pr_fmt(fmt) "IPv4: " fmt.
      to standardize the output messages.
      
      Text size is trivially reduced. (x86-32 allyesconfig)
      
      $ size net/ipv4/built-in.o*
         text	   data	    bss	    dec	    hex	filename
       887888	  31558	 249696	1169142	 11d6f6	net/ipv4/built-in.o.new
       887934	  31558	 249800	1169292	 11d78c	net/ipv4/built-in.o.old
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      058bd4d2
  32. 14 10月, 2011 1 次提交
    • E
      net: more accurate skb truesize · 87fb4b7b
      Eric Dumazet 提交于
      skb truesize currently accounts for sk_buff struct and part of skb head.
      kmalloc() roundings are also ignored.
      
      Considering that skb_shared_info is larger than sk_buff, its time to
      take it into account for better memory accounting.
      
      This patch introduces SKB_TRUESIZE(X) macro to centralize various
      assumptions into a single place.
      
      At skb alloc phase, we put skb_shared_info struct at the exact end of
      skb head, to allow a better use of memory (lowering number of
      reallocations), since kmalloc() gives us power-of-two memory blocks.
      
      Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
      aligned to cache lines, as before.
      
      Note: This patch might trigger performance regressions because of
      misconfigured protocol stacks, hitting per socket or global memory
      limits that were previously not reached. But its a necessary step for a
      more accurate memory accounting.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Andi Kleen <ak@linux.intel.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87fb4b7b
  33. 22 7月, 2011 1 次提交