1. 24 8月, 2012 1 次提交
  2. 23 8月, 2012 1 次提交
    • E
      net: remove delay at device dismantle · 0115e8e3
      Eric Dumazet 提交于
      I noticed extra one second delay in device dismantle, tracked down to
      a call to dst_dev_event() while some call_rcu() are still in RCU queues.
      
      These call_rcu() were posted by rt_free(struct rtable *rt) calls.
      
      We then wait a little (but one second) in netdev_wait_allrefs() before
      kicking again NETDEV_UNREGISTER.
      
      As the call_rcu() are now completed, dst_dev_event() can do the needed
      device swap on busy dst.
      
      To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
      after a rcu_barrier(), but outside of RTNL lock.
      
      Use NETDEV_UNREGISTER_FINAL with care !
      
      Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL
      
      Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
      IP cache removal.
      
      With help from Gao feng
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0115e8e3
  3. 15 8月, 2012 1 次提交
    • B
      ipv6: addrconf: Avoid calling netdevice notifiers with RCU read-side lock · 4acd4945
      Ben Hutchings 提交于
      Cong Wang reports that lockdep detected suspicious RCU usage while
      enabling IPV6 forwarding:
      
       [ 1123.310275] ===============================
       [ 1123.442202] [ INFO: suspicious RCU usage. ]
       [ 1123.558207] 3.6.0-rc1+ #109 Not tainted
       [ 1123.665204] -------------------------------
       [ 1123.768254] include/linux/rcupdate.h:430 Illegal context switch in RCU read-side critical section!
       [ 1123.992320]
       [ 1123.992320] other info that might help us debug this:
       [ 1123.992320]
       [ 1124.307382]
       [ 1124.307382] rcu_scheduler_active = 1, debug_locks = 0
       [ 1124.522220] 2 locks held by sysctl/5710:
       [ 1124.648364]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81768498>] rtnl_trylock+0x15/0x17
       [ 1124.882211]  #1:  (rcu_read_lock){.+.+.+}, at: [<ffffffff81871df8>] rcu_lock_acquire+0x0/0x29
       [ 1125.085209]
       [ 1125.085209] stack backtrace:
       [ 1125.332213] Pid: 5710, comm: sysctl Not tainted 3.6.0-rc1+ #109
       [ 1125.441291] Call Trace:
       [ 1125.545281]  [<ffffffff8109d915>] lockdep_rcu_suspicious+0x109/0x112
       [ 1125.667212]  [<ffffffff8107c240>] rcu_preempt_sleep_check+0x45/0x47
       [ 1125.781838]  [<ffffffff8107c260>] __might_sleep+0x1e/0x19b
      [...]
       [ 1127.445223]  [<ffffffff81757ac5>] call_netdevice_notifiers+0x4a/0x4f
      [...]
       [ 1127.772188]  [<ffffffff8175e125>] dev_disable_lro+0x32/0x6b
       [ 1127.885174]  [<ffffffff81872d26>] dev_forward_change+0x30/0xcb
       [ 1128.013214]  [<ffffffff818738c4>] addrconf_forward_change+0x85/0xc5
      [...]
      
      addrconf_forward_change() uses RCU iteration over the netdev list,
      which is unnecessary since it already holds the RTNL lock.  We also
      cannot reasonably require netdevice notifier functions not to sleep.
      Reported-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4acd4945
  4. 19 7月, 2012 1 次提交
    • E
      ipv6: add ipv6_addr_hash() helper · ddbe5032
      Eric Dumazet 提交于
      Introduce ipv6_addr_hash() helper doing a XOR on all bits
      of an IPv6 address, with an optimized x86_64 version.
      
      Use it in flow dissector, as suggested by Andrew McGregor,
      to reduce hash collision probabilities in fq_codel (and other
      users of flow dissector)
      
      Use it in ip6_tunnel.c and use more bit shuffling, as suggested
      by David Laight, as existing hash was ignoring most of them.
      
      Use it in sunrpc and use more bit shuffling, using hash_32().
      
      Use it in net/ipv6/addrconf.c, using hash_32() as well.
      
      As a cleanup, use it in net/ipv4/tcp_metrics.c
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrew McGregor <andrewmcgr@gmail.com>
      Cc: Dave Taht <dave.taht@gmail.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddbe5032
  5. 16 5月, 2012 4 次提交
  6. 11 5月, 2012 1 次提交
    • A
      6lowpan: IPv6 link local address · 06a4c1c5
      alex.bluesman.smirnov@gmail.com 提交于
      According to the RFC4944 (Transmission of IPv6 Packets over
      IEEE 802.15.4 Networks), chapter 7:
      
      The IPv6 link-local address [RFC4291] for an IEEE 802.15.4 interface
      is formed by appending the Interface Identifier, as defined above, to
      the prefix FE80::/64.
      
        10 bits            54 bits                  64 bits
      +----------+-----------------------+----------------------------+
      |1111111010|         (zeros)       |    Interface Identifier    |
      +----------+-----------------------+----------------------------+
      
      This patch adds IPv6 address generation support for the 6lowpan
      interfaces.
      Signed-off-by: NAlexander Smirnov <alex.bluesman.smirnov@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06a4c1c5
  7. 21 4月, 2012 1 次提交
  8. 15 4月, 2012 1 次提交
  9. 14 4月, 2012 1 次提交
    • G
      ipv6: fix problem with expired dst cache · 1716a961
      Gao feng 提交于
      If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet.
      this dst cache will not check expire because it has no RTF_EXPIRES flag.
      So this dst cache will always be used until the dst gc run.
      
      Change the struct dst_entry,add a union contains new pointer from and expires.
      When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use.
      we can use this field to point to where the dst cache copy from.
      The dst.from is only used in IPV6.
      
      rt6_check_expired check if rt6_info.dst.from is expired.
      
      ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF
      and RTF_DEFAULT.then hold the ort.
      
      ip6_dst_destroy release the ort.
      
      Add some functions to operate the RTF_EXPIRES flag and expires(from) together.
      and change the code to use these new adding functions.
      
      Changes from v5:
      modify ip6_route_add and ndisc_router_discovery to use new adding functions.
      
      Only set dst.from when the ort has flag RTF_ADDRCONF
      and RTF_DEFAULT.then hold the ort.
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1716a961
  10. 02 4月, 2012 2 次提交
    • E
      net/ipv6/addrconf.c: Checkpatch cleanups · 8e5e8f30
      Eldad Zack 提交于
      net/ipv6/addrconf.c:340: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
      net/ipv6/addrconf.c:342: ERROR: "foo * bar" should be "foo *bar"
      net/ipv6/addrconf.c:444: ERROR: "foo * bar" should be "foo *bar"
      net/ipv6/addrconf.c:1337: WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
      net/ipv6/addrconf.c:1526: ERROR: "(foo*)" should be "(foo *)"
      net/ipv6/addrconf.c:1671: ERROR: open brace '{' following function declarations go on the next line
      net/ipv6/addrconf.c:1914: ERROR: "foo * bar" should be "foo *bar"
      net/ipv6/addrconf.c:2368: ERROR: "foo * bar" should be "foo *bar"
      net/ipv6/addrconf.c:2370: ERROR: "foo * bar" should be "foo *bar"
      net/ipv6/addrconf.c:2416: ERROR: "foo * bar" should be "foo *bar"
      net/ipv6/addrconf.c:2437: ERROR: "foo    * bar" should be "foo    *bar"
      net/ipv6/addrconf.c:2573: ERROR: "foo * bar" should be "foo *bar"
      net/ipv6/addrconf.c:3797: ERROR: "foo* bar" should be "foo *bar"
      Signed-off-by: NEldad Zack <eldad@fogrefinery.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e5e8f30
    • D
      ipv6: Stop using NLA_PUT*(). · c78679e8
      David S. Miller 提交于
      These macros contain a hidden goto, and are thus extremely error
      prone and make code hard to audit.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c78679e8
  11. 12 3月, 2012 1 次提交
    • L
      ipv6: Fix Smatch warning. · 8b2aaede
      Li Wei 提交于
      With commit d6ddef9e(IPv6: Fix not join all-router mcast group
      when forwarding set.) I check 'dev' after it's dereference that
      leads to a Smatch complaint:
      
      net/ipv6/addrconf.c:438 ipv6_add_dev()
      	 warn: variable dereferenced before check 'dev' (see line 432)
      
      net/ipv6/addrconf.c
         431		/* protected by rtnl_lock */
         432		rcu_assign_pointer(dev->ip6_ptr, ndev);
                                         ^^^^^^^^^^^^
      Old dereference.
      
         433
         434		/* Join all-node multicast group */
         435		ipv6_dev_mc_inc(dev, &in6addr_linklocal_allnodes);
         436
         437		/* Join all-router multicast group if forwarding is set
      */
         438		if (ndev->cnf.forwarding && dev && (dev->flags &
      IFF_MULTICAST))
                                                  ^^^
      
      Remove the check to avoid the complaint as 'dev' can't be NULL.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NLi Wei <lw@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b2aaede
  12. 07 3月, 2012 1 次提交
  13. 19 1月, 2012 1 次提交
    • F
      net: race condition in ipv6 forwarding and disable_ipv6 parameters · 013d97e9
      Francesco Ruggeri 提交于
      There is a race condition in addrconf_sysctl_forward() and
      addrconf_sysctl_disable().
      These functions change idev->cnf.forwarding (resp. idev->cnf.disable_ipv6)
      and then try to grab the rtnl lock before performing any actions.
      If that fails they restore the original value and restart the syscall.
      This creates race conditions if ipv6 code tries to access
      these parameters, or if multiple instances try to do the same operation.
      As an example of the former, if __ipv6_ifa_notify() finds a 0 in
      idev->cnf.forwarding when invoked by addrconf_ifdown() it may not free
      anycast addresses, ultimately resulting in the net_device not being freed.
      This patch reads the user parameters into a temporary location and only
      writes the actual parameters when the rtnl lock is acquired.
      Tested in 2.6.38.8.
      Signed-off-by: NFrancesco Ruggeri <fruggeri@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      013d97e9
  14. 13 1月, 2012 1 次提交
  15. 05 1月, 2012 2 次提交
    • M
      ipv6/addrconf: speedup /proc/net/if_inet6 filling · 1d578303
      Mihai Maruseac 提交于
      This ensures a linear behaviour when filling /proc/net/if_inet6 thus making
      ifconfig run really fast on IPv6 only addresses. In fact, with this patch and
      the IPv4 one sent a while ago, ifconfig will run in linear time regardless of
      address type.
      
      IPv4 related patch: f04565dd
      	 dev: use name hash for dev_seq_ops
      	 ...
      
      Some statistics (running ifconfig > /dev/null on a different setup):
      
      iface count / IPv6 no-patch time / IPv6 patched time / IPv4 time
      ----------------------------------------------------------------
            6250  |       0.23 s       |      0.13 s       |  0.11 s
           12500  |       0.62 s       |      0.28 s       |  0.22 s
           25000  |       2.91 s       |      0.57 s       |  0.46 s
           50000  |      11.37 s       |      1.21 s       |  0.94 s
          128000  |      86.78 s       |      3.05 s       |  2.54 s
      Signed-off-by: NMihai Maruseac <mmaruseac@ixiacom.com>
      Cc: Daniel Baluta <dbaluta@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d578303
    • N
      ipv6: Check RA for sllao when configuring optimistic ipv6 address (v2) · e6bff995
      Neil Horman 提交于
      Recently Dave noticed that a test we did in ipv6_add_addr to see if we next hop
      route for the interface we're adding an addres to was wrong (see commit
      7ffbcecb).  for one, it never triggers, and two,
      it was completely wrong to begin with.  This test was meant to cover this
      section of RFC 4429:
      
      3.3 Modifications to RFC 2462 Stateless Address Autoconfiguration
      
         * (modifies section 5.5) A host MAY choose to configure a new address
              as an Optimistic Address.  A host that does not know the SLLAO
              of its router SHOULD NOT configure a new address as Optimistic.
              A router SHOULD NOT configure an Optimistic Address.
      
      This patch should bring us into proper compliance with the above clause.  Since
      we only add a SLAAC address after we've received a RA which may or may not
      contain a source link layer address option, we can pass a pointer to that option
      to addrconf_prefix_rcv (which may be null if the option is not present), and
      only set the optimistic flag if the option was found in the RA.
      
      Change notes:
      (v2) modified the new parameter to addrconf_prefix_rcv to be a bool rather than
      a pointer to make its use more clear as per request from davem.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6bff995
  16. 29 12月, 2011 2 次提交
  17. 13 12月, 2011 1 次提交
    • L
      ipv6: Fix for adding multicast route for loopback device automatically. · 4af04aba
      Li Wei 提交于
      There is no obvious reason to add a default multicast route for loopback
      devices, otherwise there would be a route entry whose dst.error set to
      -ENETUNREACH that would blocking all multicast packets.
      
      ====================
      
      [ more detailed explanation ]
      
      The problem is that the resulting routing table depends on the sequence
      of interface's initialization and in some situation, that would block all
      muticast packets. Suppose there are two interfaces on my computer
      (lo and eth0), if we initailize 'lo' before 'eth0', the resuting routing
      table(for multicast) would be
      
      # ip -6 route show | grep ff00::
      unreachable ff00::/8 dev lo metric 256 error -101
      ff00::/8 dev eth0 metric 256
      
      When sending multicasting packets, routing subsystem will return the first
      route entry which with a error set to -101(ENETUNREACH).
      
      I know the kernel will set the default ipv6 address for 'lo' when it is up
      and won't set the default multicast route for it, but there is no reason to
      stop 'init' program from setting address for 'lo', and that is exactly what
      systemd did.
      
      I am sure there is something wrong with kernel or systemd, currently I preferred
      kernel caused this problem.
      
      ====================
      Signed-off-by: NLi Wei <lw@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4af04aba
  18. 07 12月, 2011 1 次提交
  19. 06 12月, 2011 1 次提交
  20. 23 11月, 2011 1 次提交
  21. 01 11月, 2011 1 次提交
  22. 30 10月, 2011 1 次提交
    • A
      ipv6: fix route lookup in addrconf_prefix_rcv() · 14ef37b6
      Andreas Hofmeister 提交于
      The route lookup to find a previously auto-configured route for a prefixes used
      to use rt6_lookup(), with the prefix from the RA used as an address. However,
      that kind of lookup ignores routing tables, the prefix length and route flags,
      so when there were other matching routes, even in different tables and/or with
      a different prefix length, the wrong route would be manipulated.
      
      Now, a new function "addrconf_get_prefix_route()" is used for the route lookup,
      which searches in RT6_TABLE_PREFIX and takes the prefix-length and route flags
      into account.
      Signed-off-by: NAndreas Hofmeister <andi@collax.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14ef37b6
  23. 21 9月, 2011 1 次提交
  24. 17 9月, 2011 1 次提交
    • T
      ipv6: Send ICMPv6 RSes only when RAs are accepted · 026359bc
      Tore Anderson 提交于
      This patch improves the logic determining when to send ICMPv6 Router
      Solicitations, so that they are 1) always sent when the kernel is
      accepting Router Advertisements, and 2) never sent when the kernel is
      not accepting RAs. In other words, the operational setting of the
      "accept_ra" sysctl is used.
      
      The change also makes the special "Hybrid Router" forwarding mode
      ("forwarding" sysctl set to 2) operate exactly the same as the standard
      Router mode (forwarding=1). The only difference between the two was
      that RSes was being sent in the Hybrid Router mode only. The sysctl
      documentation describing the special Hybrid Router mode has therefore
      been removed.
      
      Rationale for the change:
      
      Currently, the value of forwarding sysctl is the only thing determining
      whether or not to send RSes. If it has the value 0 or 2, they are sent,
      otherwise they are not. This leads to inconsistent behaviour in the
      following cases:
      
      * accept_ra=0, forwarding=0
      * accept_ra=0, forwarding=2
      * accept_ra=1, forwarding=2
      * accept_ra=2, forwarding=1
      
      In the first three cases, the kernel will send RSes, even though it will
      not accept any RAs received in reply. In the last case, it will not send
      any RSes, even though it will accept and process any RAs received. (Most
      routers will send unsolicited RAs periodically, so suppressing RSes in
      the last case will merely delay auto-configuration, not prevent it.)
      
      Also, it is my opinion that having the forwarding sysctl control RS
      sending behaviour (completely independent of whether RAs are being
      accepted or not) is simply not what most users would intuitively expect
      to be the case.
      Signed-off-by: NTore Anderson <tore@fud.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      026359bc
  25. 03 8月, 2011 1 次提交
  26. 02 8月, 2011 2 次提交
  27. 26 7月, 2011 1 次提交
    • Y
      ipv6: Do not leave router anycast address for /127 prefixes. · 32019e65
      YOSHIFUJI Hideaki 提交于
      Original commit 2bda8a0c... "Disable router anycast
      address for /127 prefixes" says:
      
      |   No need for matching code in addrconf_leave_anycast() as it
      |   will silently ignore any attempt to leave an unknown anycast
      |   address.
      
      After analysis, because 1) we may add two or more prefixes on the
      same interface, or 2)user may have manually joined that anycast,
      we may hit chances to have anycast address which as if we had
      generated one by /127 prefix and we should not leave from subnet-
      router anycast address unconditionally.
      
      CC: Bjørn Mork <bjorn@mork.no>
      CC: Brian Haley <brian.haley@hp.com>
      Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32019e65
  28. 18 7月, 2011 2 次提交
  29. 07 7月, 2011 1 次提交
  30. 10 6月, 2011 1 次提交
    • G
      rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679
      Greg Rose 提交于
      The message size allocated for rtnl ifinfo dumps was limited to
      a single page.  This is not enough for additional interface info
      available with devices that support SR-IOV and caused a bug in
      which VF info would not be displayed if more than approximately
      40 VFs were created per interface.
      
      Implement a new function pointer for the rtnl_register service that will
      calculate the amount of data required for the ifinfo dump and allocate
      enough data to satisfy the request.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c7ac8679
  31. 09 6月, 2011 1 次提交
  32. 20 5月, 2011 1 次提交
    • E
      ipv6: reduce per device ICMP mib sizes · be281e55
      Eric Dumazet 提交于
      ipv6 has per device ICMP SNMP counters, taking too much space because
      they use percpu storage.
      
      needed size per device is :
      (512+4)*sizeof(long)*number_of_possible_cpus*2
      
      On a 32bit kernel, 16 possible cpus, this wastes more than 64kbytes of
      memory per ipv6 enabled network device, taken in vmalloc pool.
      
      Since ICMP messages are rare, just use shared counters (atomic_long_t)
      
      Per network space ICMP counters are still using percpu memory, we might
      also convert them to shared counters in a future patch.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Denys Fedoryshchenko <denys@visp.net.lb>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be281e55