1. 21 7月, 2012 1 次提交
    • D
      ipv4: Delete routing cache. · 89aef892
      David S. Miller 提交于
      The ipv4 routing cache is non-deterministic, performance wise, and is
      subject to reasonably easy to launch denial of service attacks.
      
      The routing cache works great for well behaved traffic, and the world
      was a much friendlier place when the tradeoffs that led to the routing
      cache's design were considered.
      
      What it boils down to is that the performance of the routing cache is
      a product of the traffic patterns seen by a system rather than being a
      product of the contents of the routing tables.  The former of which is
      controllable by external entitites.
      
      Even for "well behaved" legitimate traffic, high volume sites can see
      hit rates in the routing cache of only ~%10.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89aef892
  2. 19 7月, 2012 1 次提交
  3. 13 7月, 2012 1 次提交
  4. 06 7月, 2012 1 次提交
  5. 30 6月, 2012 1 次提交
    • P
      netlink: add netlink_kernel_cfg parameter to netlink_kernel_create · a31f2d17
      Pablo Neira Ayuso 提交于
      This patch adds the following structure:
      
      struct netlink_kernel_cfg {
              unsigned int    groups;
              void            (*input)(struct sk_buff *skb);
              struct mutex    *cb_mutex;
      };
      
      That can be passed to netlink_kernel_create to set optional configurations
      for netlink kernel sockets.
      
      I've populated this structure by looking for NULL and zero parameters at the
      existing code. The remaining parameters that always need to be set are still
      left in the original interface.
      
      That includes optional parameters for the netlink socket creation. This allows
      easy extensibility of this interface in the future.
      
      This patch also adapts all callers to use this new interface.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a31f2d17
  6. 29 6月, 2012 3 次提交
    • D
      ipv4: Elide fib_validate_source() completely when possible. · 7a9bc9b8
      David S. Miller 提交于
      If rpfilter is off (or the SKB has an IPSEC path) and there are not
      tclassid users, we don't have to do anything at all when
      fib_validate_source() is invoked besides setting the itag to zero.
      
      We monitor tclassid uses with a counter (modified only under RTNL and
      marked __read_mostly) and we protect the fib_validate_source() real
      work with a test against this counter and whether rpfilter is to be
      done.
      
      Having a way to know whether we need no tclassid processing or not
      also opens the door for future optimized rpfilter algorithms that do
      not perform full FIB lookups.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a9bc9b8
    • D
      ipv4: Adjust in_dev handling in fib_validate_source() · 9e56e380
      David S. Miller 提交于
      Checking for in_dev being NULL is pointless.
      
      In fact, all of our callers have in_dev precomputed already,
      so just pass it in and remove the NULL checking.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e56e380
    • D
      ipv4: Fix bugs in fib_compute_spec_dst(). · a207a4b2
      David S. Miller 提交于
      Based upon feedback from Julian Anastasov.
      
      1) Use route flags to determine multicast/broadcast, not the
         packet flags.
      
      2) Leave saddr unspecified in flow key.
      
      3) Adjust how we invoke inet_select_addr().  Pass ip_hdr(skb)->saddr as
         second arg, and if it was zeronet use link scope.
      
      4) Use loopback as input interface in flow key.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a207a4b2
  7. 28 6月, 2012 2 次提交
    • D
      ipv4: Kill rt->rt_spec_dst, no longer used. · 41347dcd
      David S. Miller 提交于
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41347dcd
    • D
      ipv4: Create and use fib_compute_spec_dst() helper. · 35ebf65e
      David S. Miller 提交于
      The specific destination is the host we direct unicast replies to.
      Usually this is the original packet source address, but if we are
      responding to a multicast or broadcast packet we have to use something
      different.
      
      Specifically we must use the source address we would use if we were to
      send a packet to the unicast source of the original packet.
      
      The routing cache precomputes this value, but we want to remove that
      precomputation because it creates a hard dependency on the expensive
      rpfilter source address validation which we'd like to make cheaper.
      
      There are only three places where this matters:
      
      1) ICMP replies.
      
      2) pktinfo CMSG
      
      3) IP options
      
      Now there will be no real users of rt->rt_spec_dst and we can simply
      remove it altogether.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35ebf65e
  8. 16 4月, 2012 1 次提交
  9. 29 3月, 2012 1 次提交
  10. 12 3月, 2012 1 次提交
    • J
      net: Convert printks to pr_<level> · 058bd4d2
      Joe Perches 提交于
      Use a more current kernel messaging style.
      
      Convert a printk block to print_hex_dump.
      Coalesce formats, align arguments.
      Use %s, __func__ instead of embedding function names.
      
      Some messages that were prefixed with <foo>_close are
      now prefixed with <foo>_fini.  Some ah4 and esp messages
      are now not prefixed with "ip ".
      
      The intent of this patch is to later add something like
        #define pr_fmt(fmt) "IPv4: " fmt.
      to standardize the output messages.
      
      Text size is trivially reduced. (x86-32 allyesconfig)
      
      $ size net/ipv4/built-in.o*
         text	   data	    bss	    dec	    hex	filename
       887888	  31558	 249696	1169142	 11d6f6	net/ipv4/built-in.o.new
       887934	  31558	 249800	1169292	 11d78c	net/ipv4/built-in.o.old
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      058bd4d2
  11. 10 6月, 2011 1 次提交
    • G
      rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679
      Greg Rose 提交于
      The message size allocated for rtnl ifinfo dumps was limited to
      a single page.  This is not enough for additional interface info
      available with devices that support SR-IOV and caused a bug in
      which VF info would not be displayed if more than approximately
      40 VFs were created per interface.
      
      Implement a new function pointer for the rtnl_register service that will
      calculate the amount of data required for the ifinfo dump and allocate
      enough data to satisfy the request.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c7ac8679
  12. 11 4月, 2011 2 次提交
  13. 31 3月, 2011 1 次提交
  14. 25 3月, 2011 1 次提交
  15. 22 3月, 2011 1 次提交
    • J
      ipv4: fix route deletion for IPs on many subnets · e6abbaa2
      Julian Anastasov 提交于
      Alex Sidorenko reported for problems with local
      routes left after IP addresses are deleted. It happens
      when same IPs are used in more than one subnet for the
      device.
      
      	Fix fib_del_ifaddr to restrict the checks for duplicate
      local and broadcast addresses only to the IFAs that use
      our primary IFA or another primary IFA with same address.
      And we expect the prefsrc to be matched when the routes
      are deleted because it is possible they to differ only by
      prefsrc. This patch prevents local and broadcast routes
      to be leaked until their primary IP is deleted finally
      from the box.
      
      	As the secondary address promotion needs to delete
      the routes for all secondaries that used the old primary IFA,
      add option to ignore these secondaries from the checks and
      to assume they are already deleted, so that we can safely
      delete the route while these IFAs are still on the device list.
      Reported-by: NAlex Sidorenko <alexandre.sidorenko@hp.com>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6abbaa2
  16. 13 3月, 2011 3 次提交
  17. 10 3月, 2011 1 次提交
  18. 08 3月, 2011 1 次提交
    • D
      ipv4: Cache source address in nexthop entries. · 1fc050a1
      David S. Miller 提交于
      When doing output route lookups, we have to select the source address
      if the user has not specified an explicit one.
      
      First, if the route has an explicit preferred source address
      specified, then we use that.
      
      Otherwise we search the route's outgoing interface for a suitable
      address.
      
      This search can be precomputed and cached at route insertion time.
      
      The only missing part is that we have to refresh this precomputed
      value any time addresses are added or removed from the interface, and
      this is accomplished by fib_update_nh_saddrs().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fc050a1
  19. 19 2月, 2011 1 次提交
  20. 02 2月, 2011 1 次提交
  21. 01 2月, 2011 1 次提交
    • D
      ipv4: Consolidate all default route selection implementations. · 0c838ff1
      David S. Miller 提交于
      Both fib_trie and fib_hash have a local implementation of
      fib_table_select_default().  This is completely unnecessary
      code duplication.
      
      Since we now remember the fib_table and the head of the fib
      alias list of the default route, we can implement one single
      generic version of this routine.
      
      Looking at the fib_hash implementation you may get the impression
      that it's possible for there to be multiple top-level routes in
      the table for the default route.  The truth is, it isn't, the
      insert code will only allow one entry to exist in the zero
      prefix hash table, because all keys evaluate to zero and all
      keys in a hash table must be unique.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c838ff1
  22. 24 12月, 2010 1 次提交
    • D
      Revert "ipv4: Allow configuring subnets as local addresses" · e0584649
      David S. Miller 提交于
      This reverts commit 4465b469.
      
      Conflicts:
      
      	net/ipv4/fib_frontend.c
      
      As reported by Ben Greear, this causes regressions:
      
      > Change 4465b469 caused rules
      > to stop matching the input device properly because the
      > FLOWI_FLAG_MATCH_ANY_IIF is always defined in ip_dev_find().
      >
      > This breaks rules such as:
      >
      > ip rule add pref 512 lookup local
      > ip rule del pref 0 lookup local
      > ip link set eth2 up
      > ip -4 addr add 172.16.0.102/24 broadcast 172.16.0.255 dev eth2
      > ip rule add to 172.16.0.102 iif eth2 lookup local pref 10
      > ip rule add iif eth2 lookup 10001 pref 20
      > ip route add 172.16.0.0/24 dev eth2 table 10001
      > ip route add unreachable 0/0 table 10001
      >
      > If you had a second interface 'eth0' that was on a different
      > subnet, pinging a system on that interface would fail:
      >
      >   [root@ct503-60 ~]# ping 192.168.100.1
      >   connect: Invalid argument
      Reported-by: NBen Greear <greearb@candelatech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0584649
  23. 21 12月, 2010 1 次提交
  24. 18 11月, 2010 1 次提交
  25. 29 10月, 2010 1 次提交
  26. 19 10月, 2010 1 次提交
  27. 17 10月, 2010 1 次提交
  28. 06 10月, 2010 1 次提交
    • E
      fib: RCU conversion of fib_lookup() · ebc0ffae
      Eric Dumazet 提交于
      fib_lookup() converted to be called in RCU protected context, no
      reference taken and released on a contended cache line (fib_clntref)
      
      fib_table_lookup() and fib_semantic_match() get an additional parameter.
      
      struct fib_info gets an rcu_head field, and is freed after an rcu grace
      period.
      
      Stress test :
      (Sending 160.000.000 UDP frames on same neighbour,
      IP route cache disabled, dual E5540 @2.53GHz,
      32bit kernel, FIB_HASH) (about same results for FIB_TRIE)
      
      Before patch :
      
      real	1m31.199s
      user	0m13.761s
      sys	23m24.780s
      
      After patch:
      
      real	1m5.375s
      user	0m14.997s
      sys	15m50.115s
      
      Before patch Profile :
      
      13044.00 15.4% __ip_route_output_key vmlinux
       8438.00 10.0% dst_destroy           vmlinux
       5983.00  7.1% fib_semantic_match    vmlinux
       5410.00  6.4% fib_rules_lookup      vmlinux
       4803.00  5.7% neigh_lookup          vmlinux
       4420.00  5.2% _raw_spin_lock        vmlinux
       3883.00  4.6% rt_set_nexthop        vmlinux
       3261.00  3.9% _raw_read_lock        vmlinux
       2794.00  3.3% fib_table_lookup      vmlinux
       2374.00  2.8% neigh_resolve_output  vmlinux
       2153.00  2.5% dst_alloc             vmlinux
       1502.00  1.8% _raw_read_lock_bh     vmlinux
       1484.00  1.8% kmem_cache_alloc      vmlinux
       1407.00  1.7% eth_header            vmlinux
       1406.00  1.7% ipv4_dst_destroy      vmlinux
       1298.00  1.5% __copy_from_user_ll   vmlinux
       1174.00  1.4% dev_queue_xmit        vmlinux
       1000.00  1.2% ip_output             vmlinux
      
      After patch Profile :
      
      13712.00 15.8% dst_destroy             vmlinux
       8548.00  9.9% __ip_route_output_key   vmlinux
       7017.00  8.1% neigh_lookup            vmlinux
       4554.00  5.3% fib_semantic_match      vmlinux
       4067.00  4.7% _raw_read_lock          vmlinux
       3491.00  4.0% dst_alloc               vmlinux
       3186.00  3.7% neigh_resolve_output    vmlinux
       3103.00  3.6% fib_table_lookup        vmlinux
       2098.00  2.4% _raw_read_lock_bh       vmlinux
       2081.00  2.4% kmem_cache_alloc        vmlinux
       2013.00  2.3% _raw_spin_lock          vmlinux
       1763.00  2.0% __copy_from_user_ll     vmlinux
       1763.00  2.0% ip_output               vmlinux
       1761.00  2.0% ipv4_dst_destroy        vmlinux
       1631.00  1.9% eth_header              vmlinux
       1440.00  1.7% _raw_read_unlock_bh     vmlinux
      
      Reference results, if IP route cache is enabled :
      
      real	0m29.718s
      user	0m10.845s
      sys	7m37.341s
      
      25213.00 29.5% __ip_route_output_key   vmlinux
       9011.00 10.5% dst_release             vmlinux
       4817.00  5.6% ip_push_pending_frames  vmlinux
       4232.00  5.0% ip_finish_output        vmlinux
       3940.00  4.6% udp_sendmsg             vmlinux
       3730.00  4.4% __copy_from_user_ll     vmlinux
       3716.00  4.4% ip_route_output_flow    vmlinux
       2451.00  2.9% __xfrm_lookup           vmlinux
       2221.00  2.6% ip_append_data          vmlinux
       1718.00  2.0% _raw_spin_lock_bh       vmlinux
       1655.00  1.9% __alloc_skb             vmlinux
       1572.00  1.8% sock_wfree              vmlinux
       1345.00  1.6% kfree                   vmlinux
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebc0ffae
  29. 05 10月, 2010 1 次提交
  30. 01 10月, 2010 1 次提交
  31. 29 9月, 2010 1 次提交
    • T
      ipv4: Allow configuring subnets as local addresses · 4465b469
      Tom Herbert 提交于
      This patch allows a host to be configured to respond to any address in
      a specified range as if it were local, without actually needing to
      configure the address on an interface.  This is done through routing
      table configuration.  For instance, to configure a host to respond
      to any address in 10.1/16 received on eth0 as a local address we can do:
      
      ip rule add from all iif eth0 lookup 200
      ip route add local 10.1/16 dev lo proto kernel scope host src 127.0.0.1 table 200
      
      This host is now reachable by any 10.1/16 address (route lookup on
      input for packets received on eth0 can find the route).  On output, the
      rule will not be matched so that this host can still send packets to
      10.1/16 (not sent on loopback).  Presumably, external routing can be
      configured to make sense out of this.
      
      To make this work, we needed to modify the logic in finding the
      interface which is assigned a given source address for output
      (dev_ip_find).  We perform a normal fib_lookup instead of just a
      lookup on the local table, and in the lookup we ignore the input
      interface for matching.
      
      This patch is useful to implement IP-anycast for subnets of virtual
      addresses.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4465b469
  32. 08 9月, 2010 1 次提交
  33. 13 7月, 2010 1 次提交
  34. 03 6月, 2010 1 次提交