1. 20 4月, 2016 1 次提交
  2. 14 4月, 2016 1 次提交
  3. 14 3月, 2016 1 次提交
  4. 04 3月, 2016 1 次提交
    • D
      net: ipv6: Fix refcnt on host routes · 799977d9
      David Ahern 提交于
      Andrew and Ying Huang's test robot both reported usage count problems that
      trace back to the 'keep address on ifdown' patch.
      
      >From Andrew:
      We execute CRIU test on linux-next. On the current linux-next kernel
      they hangs on creating a network namespace.
      
      The kernel log contains many massages like this:
      [ 1036.122108] unregister_netdevice: waiting for lo to become free.
      Usage count = 2
      [ 1046.165156] unregister_netdevice: waiting for lo to become free.
      Usage count = 2
      [ 1056.210287] unregister_netdevice: waiting for lo to become free.
      Usage count = 2
      
      I tried to revert this patch and the bug disappeared.
      
      Here is a set of commands to reproduce this bug:
      
      [root@linux-next-test linux-next]# uname -a
      Linux linux-next-test 4.5.0-rc6-next-20160301+ #3 SMP Wed Mar 2
      17:32:18 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
      
      [root@linux-next-test ~]# unshare -n
      [root@linux-next-test ~]# ip link set up dev lo
      [root@linux-next-test ~]# ip a
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
      group default qlen 1
          link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
          inet 127.0.0.1/8 scope host lo
             valid_lft forever preferred_lft forever
          inet6 ::1/128 scope host
             valid_lft forever preferred_lft forever
      [root@linux-next-test ~]# logout
      [root@linux-next-test ~]# unshare -n
      
       -----
      
      The problem is a change made to RTM_DELADDR case in __ipv6_ifa_notify that
      was added in an early version of the offending patch and is no longer
      needed.
      
      Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
      Cc: Andrey Wagin <avagin@gmail.com>
      Cc: Ying Huang <ying.huang@linux.intel.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NJeremiah Mahler <jmmahler@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      799977d9
  5. 02 3月, 2016 1 次提交
  6. 26 2月, 2016 1 次提交
    • D
      net: ipv6: Make address flushing on ifdown optional · f1705ec1
      David Ahern 提交于
      Currently, all ipv6 addresses are flushed when the interface is configured
      down, including global, static addresses:
      
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
              inet6 2100:1::2/120 scope global
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
                 valid_lft forever preferred_lft forever
          $ ip link set dev eth1 down
          $ ip -6 addr show dev eth1
          << nothing; all addresses have been flushed>>
      
      Add a new sysctl to make this behavior optional. The new setting defaults to
      flush all addresses to maintain backwards compatibility. When the set global
      addresses with no expire times are not flushed on an admin down. The sysctl
      is per-interface or system-wide for all interfaces
      
          $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
      or
          $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1
      
      Will keep addresses on eth1 on an admin down.
      
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
              inet6 2100:1::2/120 scope global
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
                 valid_lft forever preferred_lft forever
          $ ip link set dev eth1 down
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000
              inet6 2100:1::2/120 scope global tentative
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
                 valid_lft forever preferred_lft forever
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1705ec1
  7. 20 2月, 2016 1 次提交
  8. 11 2月, 2016 2 次提交
  9. 06 2月, 2016 1 次提交
    • S
      ipv6: addrconf: Fix recursive spin lock call · 16186a82
      subashab@codeaurora.org 提交于
      A rcu stall with the following backtrace was seen on a system with
      forwarding, optimistic_dad and use_optimistic set. To reproduce,
      set these flags and allow ipv6 autoconf.
      
      This occurs because the device write_lock is acquired while already
      holding the read_lock. Back trace below -
      
      INFO: rcu_preempt self-detected stall on CPU { 1}  (t=2100 jiffies
       g=3992 c=3991 q=4471)
      <6> Task dump for CPU 1:
      <2> kworker/1:0     R  running task    12168    15   2 0x00000002
      <2> Workqueue: ipv6_addrconf addrconf_dad_work
      <6> Call trace:
      <2> [<ffffffc000084da8>] el1_irq+0x68/0xdc
      <2> [<ffffffc000cc4e0c>] _raw_write_lock_bh+0x20/0x30
      <2> [<ffffffc000bc5dd8>] __ipv6_dev_ac_inc+0x64/0x1b4
      <2> [<ffffffc000bcbd2c>] addrconf_join_anycast+0x9c/0xc4
      <2> [<ffffffc000bcf9f0>] __ipv6_ifa_notify+0x160/0x29c
      <2> [<ffffffc000bcfb7c>] ipv6_ifa_notify+0x50/0x70
      <2> [<ffffffc000bd035c>] addrconf_dad_work+0x314/0x334
      <2> [<ffffffc0000b64c8>] process_one_work+0x244/0x3fc
      <2> [<ffffffc0000b7324>] worker_thread+0x2f8/0x418
      <2> [<ffffffc0000bb40c>] kthread+0xe0/0xec
      
      v2: do addrconf_dad_kick inside read lock and then acquire write
      lock for ipv6_ifa_notify as suggested by Eric
      
      Fixes: 7fd2561e ("net: ipv6: Add a sysctl to make optimistic
      addresses useful candidates")
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Erik Kline <ek@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16186a82
  10. 11 1月, 2016 1 次提交
  11. 23 12月, 2015 1 次提交
  12. 19 12月, 2015 1 次提交
    • B
      ipv6: addrconf: use stable address generator for ARPHRD_NONE · cc9da6cc
      Bjørn Mork 提交于
      Add a new address generator mode, using the stable address generator
      with an automatically generated secret. This is intended as a default
      address generator mode for device types with no EUI64 implementation.
      The new generator is used for ARPHRD_NONE interfaces initially, adding
      default IPv6 autoconf support to e.g. tun interfaces.
      
      If the addrgenmode is set to 'random', either by default or manually,
      and no stable secret is available, then a random secret is used as
      input for the stable-privacy address generator.  The secret can be
      read and modified like manually configured secrets, using the proc
      interface.  Modifying the secret will change the addrgen mode to
      'stable-privacy' to indicate that it operates on a known secret.
      
      Existing behaviour of the 'stable-privacy' mode is kept unchanged. If
      a known secret is available when the device is created, then the mode
      will default to 'stable-privacy' as before.  The mode can be manually
      set to 'random' but it will behave exactly like 'stable-privacy' in
      this case. The secret will not change.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: 吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc9da6cc
  13. 16 12月, 2015 1 次提交
  14. 15 12月, 2015 1 次提交
    • A
      ipv6: addrconf: drop ieee802154 specific things · 5241c2d7
      Alexander Aring 提交于
      This patch removes ARPHRD_IEEE802154 from addrconf handling. In the
      earlier days of 802.15.4 6LoWPAN, the interface type was ARPHRD_IEEE802154
      which introduced several issues, because 802.15.4 interfaces used the
      same type.
      
      Since commit 965e613d ("ieee802154: 6lowpan: fix ARPHRD to
      ARPHRD_6LOWPAN") we use ARPHRD_6LOWPAN for 6LoWPAN interfaces. This
      patch will remove ARPHRD_IEEE802154 which is currently deadcode, because
      ARPHRD_IEEE802154 doesn't reach the minimum 1280 MTU of IPv6.
      
      Also we use 6LoWPAN EUI64 specific defines instead using link-layer
      constanst from 802.15.4 link-layer header.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5241c2d7
  15. 06 12月, 2015 2 次提交
  16. 04 12月, 2015 1 次提交
    • P
      net: ipv6: restrict hop_limit sysctl setting to range [1; 255] · d6df198d
      Phil Sutter 提交于
      Setting a value bigger than 255 resulted in using only the lower eight
      bits of that value as it is assigned to the u8 header field. To avoid
      this unexpected result, reject such values.
      
      Setting a value of zero is technically possible, but hosts receiving
      such a packet have to treat it like hop_limit was set to one, according
      to RFC2460. Therefore I don't see a use-case for that.
      
      Setting a route's hop_limit to zero in iproute2 means to use the sysctl
      default, which is not the case here: Setting e.g.
      net.conf.eth0.hop_limit=0 will not make the kernel use
      net.conf.all.hop_limit for outgoing packets on eth0. To avoid these
      kinds of confusion, reject zero.
      Signed-off-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6df198d
  17. 02 12月, 2015 1 次提交
  18. 05 11月, 2015 1 次提交
  19. 30 10月, 2015 1 次提交
  20. 22 10月, 2015 1 次提交
    • A
      netlink: Rightsize IFLA_AF_SPEC size calculation · b1974ed0
      Arad, Ronen 提交于
      if_nlmsg_size() overestimates the minimum allocation size of netlink
      dump request (when called from rtnl_calcit()) or the size of the
      message (when called from rtnl_getlink()). This is because
      ext_filter_mask is not supported by rtnl_link_get_af_size() and
      rtnl_link_get_size().
      
      The over-estimation is significant when at least one netdev has many
      VLANs configured (8 bytes for each configured VLAN).
      
      This patch-set "rightsizes" the protocol specific attribute size
      calculation by propagating ext_filter_mask to rtnl_link_get_af_size()
      and adding this a argument to get_link_af_size op in rtnl_af_ops.
      
      Bridge module already used filtering aware sizing for notifications.
      br_get_link_af_size_filtered() is consistent with the modified
      get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops.
      br_get_link_af_size() becomes unused and thus removed.
      Signed-off-by: NRonen Arad <ronen.arad@intel.com>
      Acked-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1974ed0
  21. 13 10月, 2015 1 次提交
  22. 11 10月, 2015 1 次提交
  23. 25 9月, 2015 1 次提交
  24. 16 9月, 2015 2 次提交
    • S
      rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats · d5566fd7
      Sowmini Varadhan 提交于
      Many commonly used functions like getifaddrs() invoke RTM_GETLINK
      to dump the interface information, and do not need the
      the AF_INET6 statististics that are always returned by default
      from rtnl_fill_ifinfo().
      
      Computing the statistics can be an expensive operation that impacts
      scaling, so it is desirable to avoid this if the information is
      not needed.
      
      This patch adds a the RTEXT_FILTER_SKIP_STATS extended info flag that
      can be passed with netlink_request() to avoid statistics computation
      for the ifinfo path.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5566fd7
    • M
      ipv6: Avoid double dst_free · 8e3d5be7
      Martin KaFai Lau 提交于
      It is a prep work to get dst freeing from fib tree undergo
      a rcu grace period.
      
      The following is a common paradigm:
      if (ip6_del_rt(rt))
      	dst_free(rt)
      
      which means, if rt cannot be deleted from the fib tree, dst_free(rt) now.
      1. We don't know the ip6_del_rt(rt) failure is because it
         was not managed by fib tree (e.g. DST_NOCACHE) or it had already been
         removed from the fib tree.
      2. If rt had been managed by the fib tree, ip6_del_rt(rt) failure means
         dst_free(rt) has been called already.  A second
         dst_free(rt) is not always obviously safe.  The rt may have
         been destroyed already.
      3. If rt is a DST_NOCACHE, dst_free(rt) should not be called.
      4. It is a stopper to make dst freeing from fib tree undergo a
         rcu grace period.
      
      This patch is to use a DST_NOCACHE flag to indicate a rt is
      not managed by the fib tree.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e3d5be7
  25. 31 8月, 2015 2 次提交
    • R
      net: Optimize snmp stat aggregation by walking all the percpu data at once · a3a77372
      Raghavendra K T 提交于
      Docker container creation linearly increased from around 1.6 sec to 7.5 sec
      (at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field.
      
      reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks
      through per cpu data of an item (iteratively for around 36 items).
      
      idea: This patch tries to aggregate the statistics by going through
      all the items of each cpu sequentially which is reducing cache
      misses.
      
      Docker creation got faster by more than 2x after the patch.
      
      Result:
                             Before           After
      Docker creation time   6.836s           3.25s
      cache miss             2.7%             1.41%
      
      perf before:
          50.73%  docker           [kernel.kallsyms]       [k] snmp_fold_field
           9.07%  swapper          [kernel.kallsyms]       [k] snooze_loop
           3.49%  docker           [kernel.kallsyms]       [k] veth_stats_one
           2.85%  swapper          [kernel.kallsyms]       [k] _raw_spin_lock
      
      perf after:
          10.57%  docker           docker                [.] scanblock
           8.37%  swapper          [kernel.kallsyms]     [k] snooze_loop
           6.91%  docker           [kernel.kallsyms]     [k] snmp_get_cpu_field
           6.67%  docker           [kernel.kallsyms]     [k] veth_stats_one
      
      changes/ideas suggested:
      Using buffer in stack (Eric), Usage of memset (David), Using memcpy in
      place of unaligned_put (Joe).
      Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3a77372
    • M
      net/ipv6: Export addrconf_ifid_eui48 · 399e6f95
      Matan Barak 提交于
      For loopback purposes, RoCE devices should have a default GID in the
      port GID table, even when the interface is down. In order to do so,
      we use the IPv6 link local address which would have been genenrated
      for the related Ethernet netdevice when it goes up as a default GID.
      
      addrconf_ifid_eui48 is used to gernerate this address, export it.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      399e6f95
  26. 21 8月, 2015 1 次提交
  27. 14 8月, 2015 2 次提交
    • A
      net: addr IFLA_OPERSTATE to netlink message for ipv6 ifinfo · 0344338b
      Andy Gospodarek 提交于
      This is useful information to include in ipv6 netlink messages that
      report interface information.  IFLA_OPERSTATE is already included in
      ipv4 messages, but missing for ipv6.  This closes that gap.
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0344338b
    • A
      net: ipv6 sysctl option to ignore routes when nexthop link is down · 35103d11
      Andy Gospodarek 提交于
      Like the ipv4 patch with a similar title, this adds a sysctl to allow
      the user to change routing behavior based on whether or not the
      interface associated with the nexthop was an up or down link.  The
      default setting preserves the current behavior, but anyone that enables
      it will notice that nexthops on down interfaces will no longer be
      selected:
      
      net.ipv6.conf.all.ignore_routes_with_linkdown = 0
      net.ipv6.conf.default.ignore_routes_with_linkdown = 0
      net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
      ...
      
      When the above sysctls are set, not only will link status be reported to
      userspace, but an indication that a nexthop is dead and will not be used
      is also reported.
      
      1000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
      1000::/8 via 8000::2 dev p8p1  metric 1024  pref medium
      7000::/8 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
      8000::/8 dev p8p1  proto kernel  metric 256  pref medium
      9000::/8 via 8000::2 dev p8p1  metric 2048  pref medium
      9000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
      fe80::/64 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
      fe80::/64 dev p8p1  proto kernel  metric 256  pref medium
      
      This also adds devconf support and notification when sysctl values
      change.
      
      v2: drop use of rt6i_nhflags since it is not needed right now
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35103d11
  28. 31 7月, 2015 1 次提交
    • H
      net/ipv6: add sysctl option accept_ra_min_hop_limit · 8013d1d7
      Hangbin Liu 提交于
      Commit 6fd99094 ("ipv6: Don't reduce hop limit for an interface")
      disabled accept hop limit from RA if it is smaller than the current hop
      limit for security stuff. But this behavior kind of break the RFC definition.
      
      RFC 4861, 6.3.4.  Processing Received Router Advertisements
         A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
         and Retrans Timer) may contain a value denoting that it is
         unspecified.  In such cases, the parameter should be ignored and the
         host should continue using whatever value it is already using.
      
         If the received Cur Hop Limit value is non-zero, the host SHOULD set
         its CurHopLimit variable to the received value.
      
      So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
      hop limit value they can accept from RA. And set default to 1 to meet RFC
      standards.
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8013d1d7
  29. 23 7月, 2015 1 次提交
  30. 16 7月, 2015 2 次提交
  31. 11 7月, 2015 1 次提交
  32. 02 5月, 2015 1 次提交
    • M
      ipv6: Consider RTF_CACHE when searching the fib6 tree · 1f56a01f
      Martin KaFai Lau 提交于
      It is a prep work for the later bug-fix patch which will stop /128 route
      from disappearing after pmtu update.
      
      The later bug-fix patch will allow a /128 route and its RTF_CACHE clone
      both exist at the same fib6_node.  To do this, we need to prepare the
      existing fib6 tree search to expect RTF_CACHE for /128 route.
      
      Note that the fn->leaf is sorted by rt6i_metric.  Hence,
      RTF_CACHE (if there is any) is always at the front.  This property
      leads to the following:
      
      1. When doing ip6_route_del(), it should honor the RTF_CACHE flag which
         the caller is used to ask for deleting clone or non-clone.
         The rtm_to_fib6_config() should also check the RTM_F_CLONED and
         then set RTF_CACHE accordingly so that:
         - 'ip -6 r del...' will make ip6_route_del() to delete a route
           and all its clones. Note that its clones is flushed by fib6_del()
         - 'ip -6 r flush table cache' will make ip6_route_del() to
            only delete clone(s).
      
      2. Exclude RTF_CACHE from addrconf_get_prefix_route() which
         should not configure on a cloned route.
      
      3. No change is need for rt6_device_match() since it currently could
         return a RTF_CACHE clone route, so the later bug-fix patch will not
         affect it.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f56a01f
  33. 03 4月, 2015 1 次提交
  34. 01 4月, 2015 1 次提交