1. 29 6月, 2012 1 次提交
  2. 28 6月, 2012 3 次提交
    • D
      ipv4: Kill rt->rt_spec_dst, no longer used. · 41347dcd
      David S. Miller 提交于
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41347dcd
    • D
      Revert "ipv4: tcp: dont cache unconfirmed intput dst" · c10237e0
      David S. Miller 提交于
      This reverts commit c074da28.
      
      This change has several unwanted side effects:
      
      1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll
         thus never create a real cached route.
      
      2) All TCP traffic will use DST_NOCACHE and never use the routing
         cache at all.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c10237e0
    • E
      ipv4: tcp: dont cache unconfirmed intput dst · c074da28
      Eric Dumazet 提交于
      DDOS synflood attacks hit badly IP route cache.
      
      On typical machines, this cache is allowed to hold up to 8 Millions dst
      entries, 256 bytes for each, for a total of 2GB of memory.
      
      rt_garbage_collect() triggers and tries to cleanup things.
      
      Eventually route cache is disabled but machine is under fire and might
      OOM and crash.
      
      This patch exploits the new TCP early demux, to set a nocache
      boolean in case incoming TCP frame is for a not yet ESTABLISHED or
      TIMEWAIT socket.
      
      This 'nocache' boolean is then used in case dst entry is not found in
      route cache, to create an unhashed dst entry (DST_NOCACHE)
      
      SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache
      output dst for syncookies), so after this patch, a machine is able to
      absorb a DDOS synflood attack without polluting its IP route cache.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c074da28
  3. 27 6月, 2012 1 次提交
  4. 26 6月, 2012 1 次提交
  5. 23 6月, 2012 1 次提交
  6. 18 6月, 2012 1 次提交
  7. 15 6月, 2012 1 次提交
    • D
      ipv4: Handle PMTU in all ICMP error handlers. · 36393395
      David S. Miller 提交于
      With ip_rt_frag_needed() removed, we have to explicitly update PMTU
      information in every ICMP error handler.
      
      Create two helper functions to facilitate this.
      
      1) ipv4_sk_update_pmtu()
      
         This updates the PMTU when we have a socket context to
         work with.
      
      2) ipv4_update_pmtu()
      
         Raw version, used when no socket context is available.  For this
         interface, we essentially just pass in explicit arguments for
         the flow identity information we would have extracted from the
         socket.
      
         And you'll notice that ipv4_sk_update_pmtu() is simply implemented
         in terms of ipv4_update_pmtu()
      
      Note that __ip_route_output_key() is used, rather than something like
      ip_route_output_flow() or ip_route_output_key().  This is because we
      absolutely do not want to end up with a route that does IPSEC
      encapsulation and the like.  Instead, we only want the route that
      would get us to the node described by the outermost IP header.
      Reported-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36393395
  8. 13 6月, 2012 1 次提交
    • T
      ipv4: Add interface option to enable routing of 127.0.0.0/8 · d0daebc3
      Thomas Graf 提交于
      Routing of 127/8 is tradtionally forbidden, we consider
      packets from that address block martian when routing and do
      not process corresponding ARP requests.
      
      This is a sane default but renders a huge address space
      practically unuseable.
      
      The RFC states that no address within the 127/8 block should
      ever appear on any network anywhere but it does not forbid
      the use of such addresses outside of the loopback device in
      particular. For example to address a pool of virtual guests
      behind a load balancer.
      
      This patch adds a new interface option 'route_localnet'
      enabling routing of the 127/8 address block and processing
      of ARP requests on a specific interface.
      
      Note that for the feature to work, the default local route
      covering 127/8 dev lo needs to be removed.
      
      Example:
        $ sysctl -w net.ipv4.conf.eth0.route_localnet=1
        $ ip route del 127.0.0.0/8 dev lo table local
        $ ip addr add 127.1.0.1/16 dev eth0
        $ ip route flush cache
      
      V2: Fix invalid check to auto flush cache (thanks davem)
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0daebc3
  9. 11 6月, 2012 5 次提交
    • D
      inet: Avoid potential NULL peer dereference. · 7b34ca2a
      David S. Miller 提交于
      We handle NULL in rt{,6}_set_peer but then our caller will try to pass
      that NULL pointer into inet_putpeer() which isn't ready for it.
      
      Fix this by moving the NULL check one level up, and then remove the
      now unnecessary NULL check from inetpeer_ptr_set_peer().
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b34ca2a
    • D
      inet: Use FIB table peer roots in routes. · 8b96d22d
      David S. Miller 提交于
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b96d22d
    • D
      inet: Add family scope inetpeer flushes. · b48c80ec
      David S. Miller 提交于
      This implementation can deal with having many inetpeer roots, which is
      a necessary prerequisite for per-FIB table rooted peer tables.
      
      Each family (AF_INET, AF_INET6) has a sequence number which we bump
      when we get a family invalidation request.
      
      Each peer lookup cheaply checks whether the flush sequence of the
      root we are using is out of date, and if so flushes it and updates
      the sequence number.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b48c80ec
    • D
      ipv4: Kill ip_rt_frag_needed(). · 46517008
      David S. Miller 提交于
      There is zero point to this function.
      
      It's only real substance is to perform an extremely outdated BSD4.2
      ICMP check, which we can safely remove.  If you really have a MTU
      limited link being routed by a BSD4.2 derived system, here's a nickel
      go buy yourself a real router.
      
      The other actions of ip_rt_frag_needed(), checking and conditionally
      updating the peer, are done by the per-protocol handlers of the ICMP
      event.
      
      TCP, UDP, et al. have a handler which will receive this event and
      transmit it back into the associated route via dst_ops->update_pmtu().
      
      This simplification is important, because it eliminates the one place
      where we do not have a proper route context in which to make an
      inetpeer lookup.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46517008
    • D
      inet: Hide route peer accesses behind helpers. · 97bab73f
      David S. Miller 提交于
      We encode the pointer(s) into an unsigned long with one state bit.
      
      The state bit is used so we can store the inetpeer tree root to use
      when resolving the peer later.
      
      Later the peer roots will be per-FIB table, and this change works to
      facilitate that.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97bab73f
  10. 10 6月, 2012 3 次提交
  11. 09 6月, 2012 3 次提交
  12. 24 5月, 2012 1 次提交
  13. 20 5月, 2012 1 次提交
  14. 16 5月, 2012 2 次提交
  15. 21 4月, 2012 3 次提交
  16. 19 4月, 2012 1 次提交
  17. 18 4月, 2012 1 次提交
  18. 16 4月, 2012 2 次提交
  19. 05 4月, 2012 1 次提交
  20. 02 4月, 2012 1 次提交
  21. 29 3月, 2012 1 次提交
  22. 28 3月, 2012 1 次提交
    • B
      net/ipv4: fix IPv4 multicast over network namespaces · 4e7b2f14
      Benjamin LaHaise 提交于
      When using multicast over a local bridge feeding a number of LXC guests
      using veth, the LXC guests are unable to get a response from other guests
      when pinging 224.0.0.1.  Multicast packets did not appear to be getting
      delivered to the network namespaces of the guest hosts, and further
      inspection showed that the incoming route was pointing to the loopback
      device of the host, not the guest.  This lead to the wrong network namespace
      being picked up by sockets (like ICMP).  Fix this by using the correct
      network namespace when creating the inbound route entry.
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e7b2f14
  23. 13 3月, 2012 1 次提交
  24. 12 3月, 2012 1 次提交
    • J
      net: Convert printks to pr_<level> · 058bd4d2
      Joe Perches 提交于
      Use a more current kernel messaging style.
      
      Convert a printk block to print_hex_dump.
      Coalesce formats, align arguments.
      Use %s, __func__ instead of embedding function names.
      
      Some messages that were prefixed with <foo>_close are
      now prefixed with <foo>_fini.  Some ah4 and esp messages
      are now not prefixed with "ip ".
      
      The intent of this patch is to later add something like
        #define pr_fmt(fmt) "IPv4: " fmt.
      to standardize the output messages.
      
      Text size is trivially reduced. (x86-32 allyesconfig)
      
      $ size net/ipv4/built-in.o*
         text	   data	    bss	    dec	    hex	filename
       887888	  31558	 249696	1169142	 11d6f6	net/ipv4/built-in.o.new
       887934	  31558	 249800	1169292	 11d78c	net/ipv4/built-in.o.old
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      058bd4d2
  25. 08 3月, 2012 2 次提交