提交 · 8f5daf2a49545374b5b433f0ee3ae62fc6e63d4d · gsplhtlxg / clone-Linux

23 12月, 2015 1 次提交

addrconf: always initialize sysctl table data · 5449a5ca

由 WANG Cong 提交于 12月 21, 2015

When sysctl performs restrict writes, it allows to write from
a middle position of a sysctl file, which requires us to initialize
the table data before calling proc_dostring() for the write case.

Fixes: 3d1bec99 ("ipv6: introduce secret_stable to ipv6_devconf")
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5449a5ca

16 12月, 2015 1 次提交

ipv6: automatically enable stable privacy mode if stable_secret set · 9b29c696

由 Hannes Frederic Sowa 提交于 12月 15, 2015

Bjørn reported that while we switch all interfaces to privacy stable mode
when setting the secret, we don't set this mode for new interfaces. This
does not make sense, so change this behaviour.

Fixes: 622c81d5 ("ipv6: generation of stable privacy addresses for link-local and autoconf")
Reported-by: NBjørn Mork <bjorn@mork.no>
Cc: Bjørn Mork <bjorn@mork.no>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b29c696

06 12月, 2015 1 次提交

ipv6: keep existing flags when setting IFA_F_OPTIMISTIC · 9a1ec461

由 Bjørn Mork 提交于 12月 04, 2015

Commit 64236f3f ("ipv6: introduce IFA_F_STABLE_PRIVACY flag")
failed to update the setting of the IFA_F_OPTIMISTIC flag, causing
the IFA_F_STABLE_PRIVACY flag to be lost if IFA_F_OPTIMISTIC is set.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: 64236f3f ("ipv6: introduce IFA_F_STABLE_PRIVACY flag")
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a1ec461

02 12月, 2015 1 次提交

Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests" · 304d888b

由 Nicolas Dichtel 提交于 11月 27, 2015

This reverts commit ab450605.

In IPv6, we cannot inherit the dst of the original dst. ndisc packets
are IPv6 packets and may take another route than the original packet.

This patch breaks the following scenario: a packet comes from eth0 and
is forwarded through vxlan1. The encapsulated packet triggers an NS
which cannot be sent because of the wrong route.

CC: Jiri Benc <jbenc@redhat.com>
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

304d888b

05 11月, 2015 1 次提交

ipv6: clean up dev_snmp6 proc entry when we fail to initialize inet6_dev · 2a189f9e

由 Sabrina Dubroca 提交于 11月 04, 2015

In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up
the dev_snmp6 entry that we have already registered for this device.
Call snmp6_unregister_dev in this case.

Fixes: a317a2f1 ("ipv6: fail early when creating netdev named all or default")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2a189f9e

30 10月, 2015 1 次提交

ipv6: recreate ipv6 link-local addresses when increasing MTU over IPV6_MIN_MTU · b7b0b1d2

由 Alexander Duyck 提交于 10月 26, 2015

This change makes it so that we reinitialize the interface if the MTU is
increased back above IPV6_MIN_MTU and the interface is up.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7b0b1d2

22 10月, 2015 1 次提交

netlink: Rightsize IFLA_AF_SPEC size calculation · b1974ed0

由 Arad, Ronen 提交于 10月 19, 2015

if_nlmsg_size() overestimates the minimum allocation size of netlink
dump request (when called from rtnl_calcit()) or the size of the
message (when called from rtnl_getlink()). This is because
ext_filter_mask is not supported by rtnl_link_get_af_size() and
rtnl_link_get_size().

The over-estimation is significant when at least one netdev has many
VLANs configured (8 bytes for each configured VLAN).

This patch-set "rightsizes" the protocol specific attribute size
calculation by propagating ext_filter_mask to rtnl_link_get_af_size()
and adding this a argument to get_link_af_size op in rtnl_af_ops.

Bridge module already used filtering aware sizing for notifications.
br_get_link_af_size_filtered() is consistent with the modified
get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops.
br_get_link_af_size() becomes unused and thus removed.
Signed-off-by: NRonen Arad <ronen.arad@intel.com>
Acked-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1974ed0

13 10月, 2015 1 次提交

net: Add VRF support to IPv6 stack · ca254490

由 David Ahern 提交于 10月 12, 2015

As with IPv4 support for VRFs added to IPv6 stack by replacing hardcoded
table ids with possibly device specific ones and manipulating the oif in
the flowi6. The flow flags are used to skip oif compare in nexthop lookups
if the device is enslaved to a VRF via the L3 master device.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca254490

11 10月, 2015 1 次提交

ipv6: gre: setup default multicast routes over PtP links · d9e4ce65

由 Hannes Frederic Sowa 提交于 10月 08, 2015

GRE point-to-point interfaces should also support ipv6 multicast. Setting
up default multicast routes on interface creation was forgotten. Add it.

Bugzilla: <https://bugzilla.kernel.org/show_bug.cgi?id=103231>
Cc: Julien Muchembled <jm@jmuchemb.eu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Nicolas Dumazet <ndumazet@google.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9e4ce65

25 9月, 2015 1 次提交

ipv6: remove unused neigh parameter from ndisc functions · 38cf595b

由 Jiri Benc 提交于 9月 22, 2015

Since commit 12fd84f4 ("ipv6: Remove unused neigh argument for
icmp6_dst_alloc() and its callers."), the neigh parameter of ndisc_send_na
and ndisc_send_ns is unused.

CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38cf595b

16 9月, 2015 2 次提交

rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats · d5566fd7

由 Sowmini Varadhan 提交于 9月 11, 2015

Many commonly used functions like getifaddrs() invoke RTM_GETLINK
to dump the interface information, and do not need the
the AF_INET6 statististics that are always returned by default
from rtnl_fill_ifinfo().

Computing the statistics can be an expensive operation that impacts
scaling, so it is desirable to avoid this if the information is
not needed.

This patch adds a the RTEXT_FILTER_SKIP_STATS extended info flag that
can be passed with netlink_request() to avoid statistics computation
for the ifinfo path.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5566fd7

ipv6: Avoid double dst_free · 8e3d5be7

由 Martin KaFai Lau 提交于 9月 15, 2015

It is a prep work to get dst freeing from fib tree undergo
a rcu grace period.

The following is a common paradigm:
if (ip6_del_rt(rt))
	dst_free(rt)

which means, if rt cannot be deleted from the fib tree, dst_free(rt) now.
1. We don't know the ip6_del_rt(rt) failure is because it
   was not managed by fib tree (e.g. DST_NOCACHE) or it had already been
   removed from the fib tree.
2. If rt had been managed by the fib tree, ip6_del_rt(rt) failure means
   dst_free(rt) has been called already.  A second
   dst_free(rt) is not always obviously safe.  The rt may have
   been destroyed already.
3. If rt is a DST_NOCACHE, dst_free(rt) should not be called.
4. It is a stopper to make dst freeing from fib tree undergo a
   rcu grace period.

This patch is to use a DST_NOCACHE flag to indicate a rt is
not managed by the fib tree.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e3d5be7

31 8月, 2015 2 次提交

net: Optimize snmp stat aggregation by walking all the percpu data at once · a3a77372

由 Raghavendra K T 提交于 8月 30, 2015

Docker container creation linearly increased from around 1.6 sec to 7.5 sec
(at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field.

reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks
through per cpu data of an item (iteratively for around 36 items).

idea: This patch tries to aggregate the statistics by going through
all the items of each cpu sequentially which is reducing cache
misses.

Docker creation got faster by more than 2x after the patch.

Result:
                       Before           After
Docker creation time   6.836s           3.25s
cache miss             2.7%             1.41%

perf before:
    50.73%  docker           [kernel.kallsyms]       [k] snmp_fold_field
     9.07%  swapper          [kernel.kallsyms]       [k] snooze_loop
     3.49%  docker           [kernel.kallsyms]       [k] veth_stats_one
     2.85%  swapper          [kernel.kallsyms]       [k] _raw_spin_lock

perf after:
    10.57%  docker           docker                [.] scanblock
     8.37%  swapper          [kernel.kallsyms]     [k] snooze_loop
     6.91%  docker           [kernel.kallsyms]     [k] snmp_get_cpu_field
     6.67%  docker           [kernel.kallsyms]     [k] veth_stats_one

changes/ideas suggested:
Using buffer in stack (Eric), Usage of memset (David), Using memcpy in
place of unaligned_put (Joe).
Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3a77372

net/ipv6: Export addrconf_ifid_eui48 · 399e6f95

由 Matan Barak 提交于 7月 30, 2015

For loopback purposes, RoCE devices should have a default GID in the
port GID table, even when the interface is down. In order to do so,
we use the IPv6 link local address which would have been genenrated
for the related Ethernet netdevice when it goes up as a default GID.

addrconf_ifid_eui48 is used to gernerate this address, export it.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

399e6f95

21 8月, 2015 1 次提交

ipv6: ndisc: inherit metadata dst when creating ndisc requests · ab450605

由 Jiri Benc 提交于 8月 20, 2015

If output device wants to see the dst, inherit the dst of the original skb
in the ndisc request.

This is an IPv6 counterpart of commit 0accfc26 ("arp: Inherit metadata
dst when creating ARP requests").
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab450605

14 8月, 2015 2 次提交

net: addr IFLA_OPERSTATE to netlink message for ipv6 ifinfo · 0344338b

由 Andy Gospodarek 提交于 8月 13, 2015

This is useful information to include in ipv6 netlink messages that
report interface information. IFLA_OPERSTATE is already included in
ipv4 messages, but missing for ipv6. This closes that gap.
Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0344338b

net: ipv6 sysctl option to ignore routes when nexthop link is down · 35103d11

由 Andy Gospodarek 提交于 8月 13, 2015

Like the ipv4 patch with a similar title, this adds a sysctl to allow
the user to change routing behavior based on whether or not the
interface associated with the nexthop was an up or down link.  The
default setting preserves the current behavior, but anyone that enables
it will notice that nexthops on down interfaces will no longer be
selected:

net.ipv6.conf.all.ignore_routes_with_linkdown = 0
net.ipv6.conf.default.ignore_routes_with_linkdown = 0
net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, not only will link status be reported to
userspace, but an indication that a nexthop is dead and will not be used
is also reported.

1000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
1000::/8 via 8000::2 dev p8p1  metric 1024  pref medium
7000::/8 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
8000::/8 dev p8p1  proto kernel  metric 256  pref medium
9000::/8 via 8000::2 dev p8p1  metric 2048  pref medium
9000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
fe80::/64 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
fe80::/64 dev p8p1  proto kernel  metric 256  pref medium

This also adds devconf support and notification when sysctl values
change.

v2: drop use of rt6i_nhflags since it is not needed right now
Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35103d11

31 7月, 2015 1 次提交

net/ipv6: add sysctl option accept_ra_min_hop_limit · 8013d1d7

由 Hangbin Liu 提交于 7月 30, 2015

Commit 6fd99094 ("ipv6: Don't reduce hop limit for an interface")
disabled accept hop limit from RA if it is smaller than the current hop
limit for security stuff. But this behavior kind of break the RFC definition.

RFC 4861, 6.3.4. Processing Received Router Advertisements
A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
and Retrans Timer) may contain a value denoting that it is
unspecified. In such cases, the parameter should be ignored and the
host should continue using whatever value it is already using.

If the received Cur Hop Limit value is non-zero, the host SHOULD set
its CurHopLimit variable to the received value.

So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
hop limit value they can accept from RA. And set default to 1 to meet RFC
standards.
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Acked-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8013d1d7

23 7月, 2015 1 次提交

ipv6: sysctl to restrict candidate source addresses · 3985e8a3

由 Erik Kline 提交于 7月 22, 2015

Per RFC 6724, section 4, "Candidate Source Addresses":

    It is RECOMMENDED that the candidate source addresses be the set
    of unicast addresses assigned to the interface that will be used
    to send to the destination (the "outgoing" interface).

Add a sysctl to enable this behaviour.
Signed-off-by: NErik Kline <ek@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3985e8a3

16 7月, 2015 2 次提交

ipv6: Remove unused arguments for __ipv6_dev_get_saddr(). · c15df306

由 YOSHIFUJI Hideaki 提交于 7月 16, 2015

Signed-off-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c15df306

ipv6: Fix finding best source address in ipv6_dev_get_saddr(). · c0b8da1e

由 YOSHIFUJI Hideaki/吉藤英明提交于 7月 13, 2015

Commit 9131f3de ("ipv6: Do not iterate over all interfaces when
finding source address on specific interface.") did not properly
update best source address available.  Plus, it introduced
possible NULL pointer dereference.

Bug was reported by Erik Kline <ek@google.com>.
Based on patch proposed by Hajime Tazaki <thehajime@gmail.com>.

Fixes: 9131f3de ("ipv6: Do not
	iterate over all interfaces when finding source address
	on specific interface.")
Signed-off-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Acked-by: NHajime Tazaki <thehajime@gmail.com>
Acked-by: NErik Kline <ek@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0b8da1e

11 7月, 2015 1 次提交

ipv6: Do not iterate over all interfaces when finding source address on specific interface. · 9131f3de

由 YOSHIFUJI Hideaki/吉藤英明提交于 7月 10, 2015

If outgoing interface is specified and the candidate address is
restricted to the outgoing interface, it is enough to iterate
over that given interface only.
Signed-off-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Acked-by: NErik Kline <ek@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9131f3de

02 5月, 2015 1 次提交

ipv6: Consider RTF_CACHE when searching the fib6 tree · 1f56a01f

由 Martin KaFai Lau 提交于 4月 28, 2015

It is a prep work for the later bug-fix patch which will stop /128 route
from disappearing after pmtu update.

The later bug-fix patch will allow a /128 route and its RTF_CACHE clone
both exist at the same fib6_node.  To do this, we need to prepare the
existing fib6 tree search to expect RTF_CACHE for /128 route.

Note that the fn->leaf is sorted by rt6i_metric.  Hence,
RTF_CACHE (if there is any) is always at the front.  This property
leads to the following:

1. When doing ip6_route_del(), it should honor the RTF_CACHE flag which
   the caller is used to ask for deleting clone or non-clone.
   The rtm_to_fib6_config() should also check the RTM_F_CLONED and
   then set RTF_CACHE accordingly so that:
   - 'ip -6 r del...' will make ip6_route_del() to delete a route
     and all its clones. Note that its clones is flushed by fib6_del()
   - 'ip -6 r flush table cache' will make ip6_route_del() to
      only delete clone(s).

2. Exclude RTF_CACHE from addrconf_get_prefix_route() which
   should not configure on a cloned route.

3. No change is need for rt6_device_match() since it currently could
   return a RTF_CACHE clone route, so the later bug-fix patch will not
   affect it.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f56a01f

03 4月, 2015 1 次提交

dev: introduce dev_get_iflink() · a54acb3a

由 Nicolas Dichtel 提交于 4月 02, 2015

The goal of this patch is to prepare the removal of the iflink field. It
introduces a new ndo function, which will be implemented by virtual interfaces.

There is no functional change into this patch. All readers of iflink field
now call dev_get_iflink().
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a54acb3a

01 4月, 2015 2 次提交

netlink: implement nla_put_in_addr and nla_put_in6_addr · 930345ea

由 Jiri Benc 提交于 3月 29, 2015

IP addresses are often stored in netlink attributes. Add generic functions
to do that.

For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
not used universally throughout the kernel, in way too many places __be32 is
used to store IPv4 address.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

930345ea

ipv6: coding style: comparison for equality with NULL · 63159f29

由 Ian Morris 提交于 3月 29, 2015

The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x == NULL and sometimes as !x. !x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63159f29

25 3月, 2015 1 次提交

ipv6: fix sparse warnings in privacy stable addresses generation · ff40217e

由 Hannes Frederic Sowa 提交于 3月 24, 2015

Those warnings reported by sparse endianness check (via kbuild test robot)
are harmless, nevertheless fix them up and make the code a little bit
easier to read.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Fixes: 622c81d5 ("ipv6: generation of stable privacy addresses for link-local and autoconf")
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff40217e

24 3月, 2015 6 次提交

ipv6: introduce idgen_delay and idgen_retries knobs · 1855b7c3

由 Hannes Frederic Sowa 提交于 3月 23, 2015

This is specified by RFC 7217.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1855b7c3

ipv6: do retries on stable privacy addresses · 5f40ef77

由 Hannes Frederic Sowa 提交于 3月 23, 2015

If a DAD conflict is detected, we want to retry privacy stable address
generation up to idgen_retries (= 3) times with a delay of idgen_delay
(= 1 second). Add the logic to addrconf_dad_failure.

By design, we don't clean up dad failed permanent addresses.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f40ef77

ipv6: collapse state_lock and lock · 8e8e676d

由 Hannes Frederic Sowa 提交于 3月 23, 2015

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e8e676d

ipv6: introduce IFA_F_STABLE_PRIVACY flag · 64236f3f

由 Hannes Frederic Sowa 提交于 3月 23, 2015

We need to mark appropriate addresses so we can do retries in case their
DAD failed.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64236f3f

ipv6: generation of stable privacy addresses for link-local and autoconf · 622c81d5

由 Hannes Frederic Sowa 提交于 3月 23, 2015

This patch implements the stable privacy address generation for
link-local and autoconf addresses as specified in RFC7217.

  RID = F(Prefix, Net_Iface, Network_ID, DAD_Counter, secret_key)

is the RID (random identifier). As the hash function F we chose one
round of sha1. Prefix will be either the link-local prefix or the
router advertised one. As Net_Iface we use the MAC address of the
device. DAD_Counter and secret_key are implemented as specified.

We don't use Network_ID, as it couples the code too closely to other
subsystems. It is specified as optional in the RFC.

As Net_Iface we only use the MAC address: we simply have no stable
identifier in the kernel we could possibly use: because this code might
run very early, we cannot depend on names, as they might be changed by
user space early on during the boot process.

A new address generation mode is introduced,
IN6_ADDR_GEN_MODE_STABLE_PRIVACY. With iproute2 one can switch back to
none or eui64 address configuration mode although the stable_secret is
already set.

We refuse writes to ipv6/conf/all/stable_secret but only allow
ipv6/conf/default/stable_secret and the interface specific file to be
written to. The default stable_secret is used as the parameter for the
namespace, the interface specific can overwrite the secret, e.g. when
switching a network configuration from one system to another while
inheriting the secret.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

622c81d5

ipv6: introduce secret_stable to ipv6_devconf · 3d1bec99

由 Hannes Frederic Sowa 提交于 3月 23, 2015

This patch implements the procfs logic for the stable_address knob:
The secret is formatted as an ipv6 address and will be stored per
interface and per namespace. We track initialized flag and return EIO
errors until the secret is set.

We don't inherit the secret to newly created namespaces.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d1bec99

19 3月, 2015 1 次提交

ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop} · 54ff9ef3

由 Marcelo Ricardo Leitner 提交于 3月 18, 2015

in favor of their inner __ ones, which doesn't grab rtnl.

As these functions need to operate on a locked socket, we can't be
grabbing rtnl by then. It's too late and doing so causes reversed
locking.

So this patch:
- move rtnl handling to callers instead while already fixing some
  reversed locking situations, like on vxlan and ipvs code.
- renames __ ones to not have the __ mark:
  __ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group
  __ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop}
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54ff9ef3

28 2月, 2015 1 次提交

multicast: Extend ip address command to enable multicast group join/leave on · 93a714d6

由 Madhu Challa 提交于 2月 25, 2015

Joining multicast group on ethernet level via "ip maddr" command would
not work if we have an Ethernet switch that does igmp snooping since
the switch would not replicate multicast packets on ports that did not
have IGMP reports for the multicast addresses.

Linux vxlan interfaces created via "ip link add vxlan" have the group option
that enables then to do the required join.

By extending ip address command with option "autojoin" we can get similar
functionality for openvswitch vxlan interfaces as well as other tunneling
mechanisms that need to receive multicast traffic. The kernel code is
structured similar to how the vxlan driver does a group join / leave.

example:
ip address add 224.1.1.10/24 dev eth5 autojoin
ip address del 224.1.1.10/24 dev eth5
Signed-off-by: NMadhu Challa <challa@noironetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93a714d6

24 2月, 2015 1 次提交

ipv6: addrconf: validate new MTU before applying it · 77751427

由 Marcelo Leitner 提交于 2月 23, 2015

Currently we don't check if the new MTU is valid or not and this allows
one to configure a smaller than minimum allowed by RFCs or even bigger
than interface own MTU, which is a problem as it may lead to packet
drops.

If you have a daemon like NetworkManager running, this may be exploited
by remote attackers by forging RA packets with an invalid MTU, possibly
leading to a DoS. (NetworkManager currently only validates for values
too small, but not for too big ones.)

The fix is just to make sure the new value is valid. That is, between
IPV6_MIN_MTU and interface's MTU.

Note that similar check is already performed at
ndisc_router_discovery(), for when kernel itself parses the RA.
Signed-off-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77751427

07 2月, 2015 1 次提交

ipv6: addrconf: add missing validate_link_af handler · 11b1f828

由 Daniel Borkmann 提交于 2月 05, 2015

We still need a validate_link_af() handler with an appropriate nla policy,
similarly as we have in IPv4 case, otherwise size validations are not being
done properly in that case.

Fixes: f53adae4 ("net: ipv6: add tokenized interface identifier support")
Fixes: bc91b0f0 ("ipv6: addrconf: implement address generation modes")
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11b1f828

06 2月, 2015 1 次提交

net: ipv6: allow explicitly choosing optimistic addresses · c58da4c6

由 Erik Kline 提交于 2月 04, 2015

RFC 4429 ("Optimistic DAD") states that optimistic addresses
should be treated as deprecated addresses.  From section 2.1:

   Unless noted otherwise, components of the IPv6 protocol stack
   should treat addresses in the Optimistic state equivalently to
   those in the Deprecated state, indicating that the address is
   available for use but should not be used if another suitable
   address is available.

Optimistic addresses are indeed avoided when other addresses are
available (i.e. at source address selection time), but they have
not heretofore been available for things like explicit bind() and
sendmsg() with struct in6_pktinfo, etc.

This change makes optimistic addresses treated more like
deprecated addresses than tentative ones.
Signed-off-by: NErik Kline <ek@google.com>
Acked-by: NLorenzo Colitti <lorenzo@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c58da4c6

31 1月, 2015 1 次提交

net: mark some potential candidates __read_mostly · 207895fd

由 Daniel Borkmann 提交于 1月 29, 2015

They are all either written once or extremly rarely (e.g. from init
code), so we can move them to the .data..read_mostly section.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

207895fd

26 1月, 2015 1 次提交

net: ipv6: Add sysctl entry to disable MTU updates from RA · c2943f14

由 Harout Hedeshian 提交于 1月 20, 2015

The kernel forcefully applies MTU values received in router
advertisements provided the new MTU is less than the current. This
behavior is undesirable when the user space is managing the MTU. Instead
a sysctl flag 'accept_ra_mtu' is introduced such that the user space
can control whether or not RA provided MTU updates should be applied. The
default behavior is unchanged; user space must explicitly set this flag
to 0 for RA MTUs to be ignored.
Signed-off-by: NHarout Hedeshian <harouth@codeaurora.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2943f14