提交 · 607ea7cda6315be0ad8be2f98bc9de6f2d656ae6 · openanolis / cloud-kernel

20 4月, 2016 1 次提交

net/ipv6/addrconf: simplify sysctl registration · 607ea7cd

由 Konstantin Khlebnikov 提交于 4月 18, 2016

Struct ctl_table_header holds pointer to sysctl table which could be used
for freeing it after unregistration. IPv4 sysctls already use that.
Remove redundant NULL assignment: ndev allocated using kzalloc.

This also saves some bytes: sysctl table could be shorter than
DEVCONF_MAX+1 if some options are disable in config.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

607ea7cd

14 4月, 2016 1 次提交

ipv6, token: allow for clearing the current device token · 47e27d5e

由 Daniel Borkmann 提交于 4月 08, 2016

The original tokenized iid support implemented via f53adae4 ("net: ipv6:
add tokenized interface identifier support") didn't allow for clearing a
device token as it was intended that this addressing mode was the only one
active for globally scoped IPv6 addresses. Later we relaxed that restriction
via 617fe29d ("net: ipv6: only invalidate previously tokenized addresses"),
and we should also allow for clearing tokens as there's no good reason why
it shouldn't be allowed.

Fixes: 617fe29d ("net: ipv6: only invalidate previously tokenized addresses")
Reported-by: NRobin H. Johnson <robbat2@gentoo.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47e27d5e

14 3月, 2016 1 次提交

netconf: add macro to represent all attributes · 136ba622

由 Zhang Shengju 提交于 3月 10, 2016

This patch adds macro NETCONFA_ALL to represent all type of netconf
attributes for IPv4 and IPv6.
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

136ba622

04 3月, 2016 1 次提交

net: ipv6: Fix refcnt on host routes · 799977d9

由 David Ahern 提交于 3月 02, 2016

Andrew and Ying Huang's test robot both reported usage count problems that
trace back to the 'keep address on ifdown' patch.

>From Andrew:
We execute CRIU test on linux-next. On the current linux-next kernel
they hangs on creating a network namespace.

The kernel log contains many massages like this:
[ 1036.122108] unregister_netdevice: waiting for lo to become free.
Usage count = 2
[ 1046.165156] unregister_netdevice: waiting for lo to become free.
Usage count = 2
[ 1056.210287] unregister_netdevice: waiting for lo to become free.
Usage count = 2

I tried to revert this patch and the bug disappeared.

Here is a set of commands to reproduce this bug:

[root@linux-next-test linux-next]# uname -a
Linux linux-next-test 4.5.0-rc6-next-20160301+ #3 SMP Wed Mar 2
17:32:18 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

[root@linux-next-test ~]# unshare -n
[root@linux-next-test ~]# ip link set up dev lo
[root@linux-next-test ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
[root@linux-next-test ~]# logout
[root@linux-next-test ~]# unshare -n

 -----

The problem is a change made to RTM_DELADDR case in __ipv6_ifa_notify that
was added in an early version of the offending patch and is no longer
needed.

Fixes: f1705ec1 ("net: ipv6: Make address flushing on ifdown optional")
Cc: Andrey Wagin <avagin@gmail.com>
Cc: Ying Huang <ying.huang@linux.intel.com>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Tested-by: NJeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

799977d9

02 3月, 2016 1 次提交

net: ipv6/l3mdev: Move host route on saved address if necessary · 4f25a111

由 David Ahern 提交于 2月 27, 2016

Commit f1705ec1 allows IPv6 addresses to be retained on a link down.
The address can have a cached host route which can point to the wrong
FIB table if the L3 enslavement is changed (e.g., route can point to local
table instead of VRF table if device is added to an L3 domain).

On link up check the table of the cached host route against the FIB
table associated with the device and correct if needed.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f25a111

26 2月, 2016 1 次提交

net: ipv6: Make address flushing on ifdown optional · f1705ec1

由 David Ahern 提交于 2月 24, 2016

Currently, all ipv6 addresses are flushed when the interface is configured
down, including global, static addresses:

    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
        inet6 2100:1::2/120 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
           valid_lft forever preferred_lft forever
    $ ip link set dev eth1 down
    $ ip -6 addr show dev eth1
    << nothing; all addresses have been flushed>>

Add a new sysctl to make this behavior optional. The new setting defaults to
flush all addresses to maintain backwards compatibility. When the set global
addresses with no expire times are not flushed on an admin down. The sysctl
is per-interface or system-wide for all interfaces

    $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
or
    $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1

Will keep addresses on eth1 on an admin down.

    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
        inet6 2100:1::2/120 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
           valid_lft forever preferred_lft forever
    $ ip link set dev eth1 down
    $ ip -6 addr show dev eth1
    3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000
        inet6 2100:1::2/120 scope global tentative
           valid_lft forever preferred_lft forever
        inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
           valid_lft forever preferred_lft forever
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1705ec1

20 2月, 2016 1 次提交

rtnl: RTM_GETNETCONF: fix wrong return value · a97eb33f

由 Anton Protopopov 提交于 2月 16, 2016

An error response from a RTM_GETNETCONF request can return the positive
error value EINVAL in the struct nlmsgerr that can mislead userspace.
Signed-off-by: NAnton Protopopov <a.s.protopopov@gmail.com>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a97eb33f

11 2月, 2016 2 次提交

ipv6: add option to drop unsolicited neighbor advertisements · 7a02bf89

由 Johannes Berg 提交于 2月 04, 2016

In certain 802.11 wireless deployments, there will be NA proxies
that use knowledge of the network to correctly answer requests.
To prevent unsolicitd advertisements on the shared medium from
being a problem, on such deployments wireless needs to drop them.

Enable this by providing an option called "drop_unsolicited_na".
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a02bf89

ipv6: add option to drop unicast encapsulated in L2 multicast · abbc3043

由 Johannes Berg 提交于 2月 04, 2016

In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv6 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.
Reviewed-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

abbc3043

06 2月, 2016 1 次提交

ipv6: addrconf: Fix recursive spin lock call · 16186a82

由 subashab@codeaurora.org 提交于 2月 02, 2016

A rcu stall with the following backtrace was seen on a system with
forwarding, optimistic_dad and use_optimistic set. To reproduce,
set these flags and allow ipv6 autoconf.

This occurs because the device write_lock is acquired while already
holding the read_lock. Back trace below -

INFO: rcu_preempt self-detected stall on CPU { 1}  (t=2100 jiffies
 g=3992 c=3991 q=4471)
<6> Task dump for CPU 1:
<2> kworker/1:0     R  running task    12168    15   2 0x00000002
<2> Workqueue: ipv6_addrconf addrconf_dad_work
<6> Call trace:
<2> [<ffffffc000084da8>] el1_irq+0x68/0xdc
<2> [<ffffffc000cc4e0c>] _raw_write_lock_bh+0x20/0x30
<2> [<ffffffc000bc5dd8>] __ipv6_dev_ac_inc+0x64/0x1b4
<2> [<ffffffc000bcbd2c>] addrconf_join_anycast+0x9c/0xc4
<2> [<ffffffc000bcf9f0>] __ipv6_ifa_notify+0x160/0x29c
<2> [<ffffffc000bcfb7c>] ipv6_ifa_notify+0x50/0x70
<2> [<ffffffc000bd035c>] addrconf_dad_work+0x314/0x334
<2> [<ffffffc0000b64c8>] process_one_work+0x244/0x3fc
<2> [<ffffffc0000b7324>] worker_thread+0x2f8/0x418
<2> [<ffffffc0000bb40c>] kthread+0xe0/0xec

v2: do addrconf_dad_kick inside read lock and then acquire write
lock for ipv6_ifa_notify as suggested by Eric

Fixes: 7fd2561e ("net: ipv6: Add a sysctl to make optimistic
addresses useful candidates")

Cc: Eric Dumazet <edumazet@google.com>
Cc: Erik Kline <ek@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16186a82

11 1月, 2016 1 次提交

ipv6: always add flag an address that failed DAD with DADFAILED · 3d171f39

由 Lubomir Rintel 提交于 1月 08, 2016

The userspace needs to know why is the address being removed so that it can
perhaps obtain a new address.

Without the DADFAILED flag it's impossible to distinguish removal of a
temporary and tentative address due to DAD failure from other reasons (device
removed, manual address removal).
Signed-off-by: NLubomir Rintel <lkundrak@v3.sk>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d171f39

23 12月, 2015 1 次提交

addrconf: always initialize sysctl table data · 5449a5ca

由 WANG Cong 提交于 12月 21, 2015

When sysctl performs restrict writes, it allows to write from
a middle position of a sysctl file, which requires us to initialize
the table data before calling proc_dostring() for the write case.

Fixes: 3d1bec99 ("ipv6: introduce secret_stable to ipv6_devconf")
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5449a5ca

19 12月, 2015 1 次提交

ipv6: addrconf: use stable address generator for ARPHRD_NONE · cc9da6cc

由 Bjørn Mork 提交于 12月 16, 2015

Add a new address generator mode, using the stable address generator
with an automatically generated secret. This is intended as a default
address generator mode for device types with no EUI64 implementation.
The new generator is used for ARPHRD_NONE interfaces initially, adding
default IPv6 autoconf support to e.g. tun interfaces.

If the addrgenmode is set to 'random', either by default or manually,
and no stable secret is available, then a random secret is used as
input for the stable-privacy address generator. The secret can be
read and modified like manually configured secrets, using the proc
interface. Modifying the secret will change the addrgen mode to
'stable-privacy' to indicate that it operates on a known secret.

Existing behaviour of the 'stable-privacy' mode is kept unchanged. If
a known secret is available when the device is created, then the mode
will default to 'stable-privacy' as before. The mode can be manually
set to 'random' but it will behave exactly like 'stable-privacy' in
this case. The secret will not change.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: 吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc9da6cc

16 12月, 2015 1 次提交

ipv6: automatically enable stable privacy mode if stable_secret set · 9b29c696

由 Hannes Frederic Sowa 提交于 12月 15, 2015

Bjørn reported that while we switch all interfaces to privacy stable mode
when setting the secret, we don't set this mode for new interfaces. This
does not make sense, so change this behaviour.

Fixes: 622c81d5 ("ipv6: generation of stable privacy addresses for link-local and autoconf")
Reported-by: NBjørn Mork <bjorn@mork.no>
Cc: Bjørn Mork <bjorn@mork.no>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b29c696

15 12月, 2015 1 次提交

ipv6: addrconf: drop ieee802154 specific things · 5241c2d7

由 Alexander Aring 提交于 12月 14, 2015

This patch removes ARPHRD_IEEE802154 from addrconf handling. In the
earlier days of 802.15.4 6LoWPAN, the interface type was ARPHRD_IEEE802154
which introduced several issues, because 802.15.4 interfaces used the
same type.

Since commit 965e613d ("ieee802154: 6lowpan: fix ARPHRD to
ARPHRD_6LOWPAN") we use ARPHRD_6LOWPAN for 6LoWPAN interfaces. This
patch will remove ARPHRD_IEEE802154 which is currently deadcode, because
ARPHRD_IEEE802154 doesn't reach the minimum 1280 MTU of IPv6.

Also we use 6LoWPAN EUI64 specific defines instead using link-layer
constanst from 802.15.4 link-layer header.

Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5241c2d7

06 12月, 2015 2 次提交

ipv6: keep existing flags when setting IFA_F_OPTIMISTIC · 9a1ec461

由 Bjørn Mork 提交于 12月 04, 2015

Commit 64236f3f ("ipv6: introduce IFA_F_STABLE_PRIVACY flag")
failed to update the setting of the IFA_F_OPTIMISTIC flag, causing
the IFA_F_STABLE_PRIVACY flag to be lost if IFA_F_OPTIMISTIC is set.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: 64236f3f ("ipv6: introduce IFA_F_STABLE_PRIVACY flag")
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a1ec461

ipv6: Only act upon NETDEV_*_TYPE_CHANGE if we have ipv6 addresses · 3ef0952c

由 Andrew Lunn 提交于 12月 03, 2015

An interface changing type may not have IPv6 addresses. Don't
call the address configuration type change in this case.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ef0952c

04 12月, 2015 1 次提交

net: ipv6: restrict hop_limit sysctl setting to range [1; 255] · d6df198d

由 Phil Sutter 提交于 12月 01, 2015

Setting a value bigger than 255 resulted in using only the lower eight
bits of that value as it is assigned to the u8 header field. To avoid
this unexpected result, reject such values.

Setting a value of zero is technically possible, but hosts receiving
such a packet have to treat it like hop_limit was set to one, according
to RFC2460. Therefore I don't see a use-case for that.

Setting a route's hop_limit to zero in iproute2 means to use the sysctl
default, which is not the case here: Setting e.g.
net.conf.eth0.hop_limit=0 will not make the kernel use
net.conf.all.hop_limit for outgoing packets on eth0. To avoid these
kinds of confusion, reject zero.
Signed-off-by: NPhil Sutter <phil@nwl.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6df198d

02 12月, 2015 1 次提交

Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests" · 304d888b

由 Nicolas Dichtel 提交于 11月 27, 2015

This reverts commit ab450605.

In IPv6, we cannot inherit the dst of the original dst. ndisc packets
are IPv6 packets and may take another route than the original packet.

This patch breaks the following scenario: a packet comes from eth0 and
is forwarded through vxlan1. The encapsulated packet triggers an NS
which cannot be sent because of the wrong route.

CC: Jiri Benc <jbenc@redhat.com>
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

304d888b

05 11月, 2015 1 次提交

ipv6: clean up dev_snmp6 proc entry when we fail to initialize inet6_dev · 2a189f9e

由 Sabrina Dubroca 提交于 11月 04, 2015

In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up
the dev_snmp6 entry that we have already registered for this device.
Call snmp6_unregister_dev in this case.

Fixes: a317a2f1 ("ipv6: fail early when creating netdev named all or default")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2a189f9e

30 10月, 2015 1 次提交

ipv6: recreate ipv6 link-local addresses when increasing MTU over IPV6_MIN_MTU · b7b0b1d2

由 Alexander Duyck 提交于 10月 26, 2015

This change makes it so that we reinitialize the interface if the MTU is
increased back above IPV6_MIN_MTU and the interface is up.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7b0b1d2

22 10月, 2015 1 次提交

netlink: Rightsize IFLA_AF_SPEC size calculation · b1974ed0

由 Arad, Ronen 提交于 10月 19, 2015

if_nlmsg_size() overestimates the minimum allocation size of netlink
dump request (when called from rtnl_calcit()) or the size of the
message (when called from rtnl_getlink()). This is because
ext_filter_mask is not supported by rtnl_link_get_af_size() and
rtnl_link_get_size().

The over-estimation is significant when at least one netdev has many
VLANs configured (8 bytes for each configured VLAN).

This patch-set "rightsizes" the protocol specific attribute size
calculation by propagating ext_filter_mask to rtnl_link_get_af_size()
and adding this a argument to get_link_af_size op in rtnl_af_ops.

Bridge module already used filtering aware sizing for notifications.
br_get_link_af_size_filtered() is consistent with the modified
get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops.
br_get_link_af_size() becomes unused and thus removed.
Signed-off-by: NRonen Arad <ronen.arad@intel.com>
Acked-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1974ed0

13 10月, 2015 1 次提交

net: Add VRF support to IPv6 stack · ca254490

由 David Ahern 提交于 10月 12, 2015

As with IPv4 support for VRFs added to IPv6 stack by replacing hardcoded
table ids with possibly device specific ones and manipulating the oif in
the flowi6. The flow flags are used to skip oif compare in nexthop lookups
if the device is enslaved to a VRF via the L3 master device.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca254490

11 10月, 2015 1 次提交

ipv6: gre: setup default multicast routes over PtP links · d9e4ce65

由 Hannes Frederic Sowa 提交于 10月 08, 2015

GRE point-to-point interfaces should also support ipv6 multicast. Setting
up default multicast routes on interface creation was forgotten. Add it.

Bugzilla: <https://bugzilla.kernel.org/show_bug.cgi?id=103231>
Cc: Julien Muchembled <jm@jmuchemb.eu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Nicolas Dumazet <ndumazet@google.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9e4ce65

25 9月, 2015 1 次提交

ipv6: remove unused neigh parameter from ndisc functions · 38cf595b

由 Jiri Benc 提交于 9月 22, 2015

Since commit 12fd84f4 ("ipv6: Remove unused neigh argument for
icmp6_dst_alloc() and its callers."), the neigh parameter of ndisc_send_na
and ndisc_send_ns is unused.

CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38cf595b

16 9月, 2015 2 次提交

rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats · d5566fd7

由 Sowmini Varadhan 提交于 9月 11, 2015

Many commonly used functions like getifaddrs() invoke RTM_GETLINK
to dump the interface information, and do not need the
the AF_INET6 statististics that are always returned by default
from rtnl_fill_ifinfo().

Computing the statistics can be an expensive operation that impacts
scaling, so it is desirable to avoid this if the information is
not needed.

This patch adds a the RTEXT_FILTER_SKIP_STATS extended info flag that
can be passed with netlink_request() to avoid statistics computation
for the ifinfo path.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5566fd7

ipv6: Avoid double dst_free · 8e3d5be7

由 Martin KaFai Lau 提交于 9月 15, 2015

It is a prep work to get dst freeing from fib tree undergo
a rcu grace period.

The following is a common paradigm:
if (ip6_del_rt(rt))
	dst_free(rt)

which means, if rt cannot be deleted from the fib tree, dst_free(rt) now.
1. We don't know the ip6_del_rt(rt) failure is because it
   was not managed by fib tree (e.g. DST_NOCACHE) or it had already been
   removed from the fib tree.
2. If rt had been managed by the fib tree, ip6_del_rt(rt) failure means
   dst_free(rt) has been called already.  A second
   dst_free(rt) is not always obviously safe.  The rt may have
   been destroyed already.
3. If rt is a DST_NOCACHE, dst_free(rt) should not be called.
4. It is a stopper to make dst freeing from fib tree undergo a
   rcu grace period.

This patch is to use a DST_NOCACHE flag to indicate a rt is
not managed by the fib tree.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e3d5be7

31 8月, 2015 2 次提交

net: Optimize snmp stat aggregation by walking all the percpu data at once · a3a77372

由 Raghavendra K T 提交于 8月 30, 2015

Docker container creation linearly increased from around 1.6 sec to 7.5 sec
(at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field.

reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks
through per cpu data of an item (iteratively for around 36 items).

idea: This patch tries to aggregate the statistics by going through
all the items of each cpu sequentially which is reducing cache
misses.

Docker creation got faster by more than 2x after the patch.

Result:
                       Before           After
Docker creation time   6.836s           3.25s
cache miss             2.7%             1.41%

perf before:
    50.73%  docker           [kernel.kallsyms]       [k] snmp_fold_field
     9.07%  swapper          [kernel.kallsyms]       [k] snooze_loop
     3.49%  docker           [kernel.kallsyms]       [k] veth_stats_one
     2.85%  swapper          [kernel.kallsyms]       [k] _raw_spin_lock

perf after:
    10.57%  docker           docker                [.] scanblock
     8.37%  swapper          [kernel.kallsyms]     [k] snooze_loop
     6.91%  docker           [kernel.kallsyms]     [k] snmp_get_cpu_field
     6.67%  docker           [kernel.kallsyms]     [k] veth_stats_one

changes/ideas suggested:
Using buffer in stack (Eric), Usage of memset (David), Using memcpy in
place of unaligned_put (Joe).
Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3a77372

net/ipv6: Export addrconf_ifid_eui48 · 399e6f95

由 Matan Barak 提交于 7月 30, 2015

For loopback purposes, RoCE devices should have a default GID in the
port GID table, even when the interface is down. In order to do so,
we use the IPv6 link local address which would have been genenrated
for the related Ethernet netdevice when it goes up as a default GID.

addrconf_ifid_eui48 is used to gernerate this address, export it.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

399e6f95

21 8月, 2015 1 次提交

ipv6: ndisc: inherit metadata dst when creating ndisc requests · ab450605

由 Jiri Benc 提交于 8月 20, 2015

If output device wants to see the dst, inherit the dst of the original skb
in the ndisc request.

This is an IPv6 counterpart of commit 0accfc26 ("arp: Inherit metadata
dst when creating ARP requests").
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab450605

14 8月, 2015 2 次提交

net: addr IFLA_OPERSTATE to netlink message for ipv6 ifinfo · 0344338b

由 Andy Gospodarek 提交于 8月 13, 2015

This is useful information to include in ipv6 netlink messages that
report interface information. IFLA_OPERSTATE is already included in
ipv4 messages, but missing for ipv6. This closes that gap.
Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0344338b

net: ipv6 sysctl option to ignore routes when nexthop link is down · 35103d11

由 Andy Gospodarek 提交于 8月 13, 2015

Like the ipv4 patch with a similar title, this adds a sysctl to allow
the user to change routing behavior based on whether or not the
interface associated with the nexthop was an up or down link.  The
default setting preserves the current behavior, but anyone that enables
it will notice that nexthops on down interfaces will no longer be
selected:

net.ipv6.conf.all.ignore_routes_with_linkdown = 0
net.ipv6.conf.default.ignore_routes_with_linkdown = 0
net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, not only will link status be reported to
userspace, but an indication that a nexthop is dead and will not be used
is also reported.

1000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
1000::/8 via 8000::2 dev p8p1  metric 1024  pref medium
7000::/8 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
8000::/8 dev p8p1  proto kernel  metric 256  pref medium
9000::/8 via 8000::2 dev p8p1  metric 2048  pref medium
9000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
fe80::/64 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
fe80::/64 dev p8p1  proto kernel  metric 256  pref medium

This also adds devconf support and notification when sysctl values
change.

v2: drop use of rt6i_nhflags since it is not needed right now
Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35103d11

31 7月, 2015 1 次提交

net/ipv6: add sysctl option accept_ra_min_hop_limit · 8013d1d7

由 Hangbin Liu 提交于 7月 30, 2015

Commit 6fd99094 ("ipv6: Don't reduce hop limit for an interface")
disabled accept hop limit from RA if it is smaller than the current hop
limit for security stuff. But this behavior kind of break the RFC definition.

RFC 4861, 6.3.4. Processing Received Router Advertisements
A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
and Retrans Timer) may contain a value denoting that it is
unspecified. In such cases, the parameter should be ignored and the
host should continue using whatever value it is already using.

If the received Cur Hop Limit value is non-zero, the host SHOULD set
its CurHopLimit variable to the received value.

So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
hop limit value they can accept from RA. And set default to 1 to meet RFC
standards.
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Acked-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8013d1d7

23 7月, 2015 1 次提交

ipv6: sysctl to restrict candidate source addresses · 3985e8a3

由 Erik Kline 提交于 7月 22, 2015

Per RFC 6724, section 4, "Candidate Source Addresses":

    It is RECOMMENDED that the candidate source addresses be the set
    of unicast addresses assigned to the interface that will be used
    to send to the destination (the "outgoing" interface).

Add a sysctl to enable this behaviour.
Signed-off-by: NErik Kline <ek@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3985e8a3

16 7月, 2015 2 次提交

ipv6: Remove unused arguments for __ipv6_dev_get_saddr(). · c15df306

由 YOSHIFUJI Hideaki 提交于 7月 16, 2015

Signed-off-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c15df306

ipv6: Fix finding best source address in ipv6_dev_get_saddr(). · c0b8da1e

由 YOSHIFUJI Hideaki/吉藤英明提交于 7月 13, 2015

Commit 9131f3de ("ipv6: Do not iterate over all interfaces when
finding source address on specific interface.") did not properly
update best source address available.  Plus, it introduced
possible NULL pointer dereference.

Bug was reported by Erik Kline <ek@google.com>.
Based on patch proposed by Hajime Tazaki <thehajime@gmail.com>.

Fixes: 9131f3de ("ipv6: Do not
	iterate over all interfaces when finding source address
	on specific interface.")
Signed-off-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Acked-by: NHajime Tazaki <thehajime@gmail.com>
Acked-by: NErik Kline <ek@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0b8da1e

11 7月, 2015 1 次提交

ipv6: Do not iterate over all interfaces when finding source address on specific interface. · 9131f3de

由 YOSHIFUJI Hideaki/吉藤英明提交于 7月 10, 2015

If outgoing interface is specified and the candidate address is
restricted to the outgoing interface, it is enough to iterate
over that given interface only.
Signed-off-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Acked-by: NErik Kline <ek@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9131f3de

02 5月, 2015 1 次提交

ipv6: Consider RTF_CACHE when searching the fib6 tree · 1f56a01f

由 Martin KaFai Lau 提交于 4月 28, 2015

It is a prep work for the later bug-fix patch which will stop /128 route
from disappearing after pmtu update.

The later bug-fix patch will allow a /128 route and its RTF_CACHE clone
both exist at the same fib6_node.  To do this, we need to prepare the
existing fib6 tree search to expect RTF_CACHE for /128 route.

Note that the fn->leaf is sorted by rt6i_metric.  Hence,
RTF_CACHE (if there is any) is always at the front.  This property
leads to the following:

1. When doing ip6_route_del(), it should honor the RTF_CACHE flag which
   the caller is used to ask for deleting clone or non-clone.
   The rtm_to_fib6_config() should also check the RTM_F_CLONED and
   then set RTF_CACHE accordingly so that:
   - 'ip -6 r del...' will make ip6_route_del() to delete a route
     and all its clones. Note that its clones is flushed by fib6_del()
   - 'ip -6 r flush table cache' will make ip6_route_del() to
      only delete clone(s).

2. Exclude RTF_CACHE from addrconf_get_prefix_route() which
   should not configure on a cloned route.

3. No change is need for rt6_device_match() since it currently could
   return a RTF_CACHE clone route, so the later bug-fix patch will not
   affect it.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f56a01f

03 4月, 2015 1 次提交

dev: introduce dev_get_iflink() · a54acb3a

由 Nicolas Dichtel 提交于 4月 02, 2015

The goal of this patch is to prepare the removal of the iflink field. It
introduces a new ndo function, which will be implemented by virtual interfaces.

There is no functional change into this patch. All readers of iflink field
now call dev_get_iflink().
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a54acb3a

01 4月, 2015 1 次提交

netlink: implement nla_put_in_addr and nla_put_in6_addr · 930345ea

由 Jiri Benc 提交于 3月 29, 2015

IP addresses are often stored in netlink attributes. Add generic functions
to do that.

For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
not used universally throughout the kernel, in way too many places __be32 is
used to store IPv4 address.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

930345ea

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功