提交 · af4d768ad28cbf6542ba70dba10b49127b31b762 · openanolis / cloud-kernel

29 5月, 2018 1 次提交

net/ipv4: Add support for specifying metric of connected routes · af4d768a

由 David Ahern 提交于 5月 27, 2018

Add support for IFA_RT_PRIORITY to ipv4 addresses.

If the metric is changed on an existing address then the new route
is inserted before removing the old one. Since the metric is one
of the route keys, the prefix route can not be replaced.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af4d768a

28 3月, 2018 1 次提交

net: Drop pernet_operations::async · 2f635cee

由 Kirill Tkhai 提交于 3月 27, 2018

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f635cee

13 2月, 2018 1 次提交

net: Convert pernet_subsys, registered from inet_init() · f84c6821

由 Kirill Tkhai 提交于 2月 13, 2018

arp_net_ops just addr/removes /proc entry.

devinet_ops allocates and frees duplicate of init_net tables
and (un)registers sysctl entries.

fib_net_ops allocates and frees pernet tables, creates/destroys
netlink socket and (un)initializes /proc entries. Foreign
pernet_operations do not touch them.

ip_rt_proc_ops only modifies pernet /proc entries.

xfrm_net_ops creates/destroys /proc entries, allocates/frees
pernet statistics, hashes and tables, and (un)initializes
sysctl files. These are not touched by foreigh pernet_operations

xfrm4_net_ops allocates/frees private pernet memory, and
configures sysctls.

sysctl_route_ops creates/destroys sysctls.

rt_genid_ops only initializes fields of just allocated net.

ipv4_inetpeer_ops allocated/frees net private memory.

igmp_net_ops just creates/destroys /proc files and socket,
noone else interested in.

tcp_sk_ops seems to be safe, because tcp_sk_init() does not
depend on any other pernet_operations modifications. Iteration
over hash table in inet_twsk_purge() is made under RCU lock,
and it's safe to iterate the table this way. Removing from
the table happen from inet_twsk_deschedule_put(), but this
function is safe without any extern locks, as it's synchronized
inside itself. There are many examples, it's used in different
context. So, it's safe to leave tcp_sk_exit_batch() unlocked.

tcp_net_metrics_ops is synchronized on tcp_metrics_lock and safe.

udplite4_net_ops only creates/destroys pernet /proc file.

icmp_sk_ops creates percpu sockets, not touched by foreign
pernet_operations.

ipmr_net_ops creates/destroys pernet fib tables, (un)registers
fib rules and /proc files. This seem to be safe to execute
in parallel with foreign pernet_operations.

af_inet_ops just sets up default parameters of newly created net.

ipv4_mib_ops creates and destroys pernet percpu statistics.

raw_net_ops, tcp4_net_ops, udp4_net_ops, ping_v4_net_ops
and ip_proc_ops only create/destroy pernet /proc files.

ip4_frags_ops creates and destroys sysctl file.

So, it's safe to make the pernet_operations async.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f84c6821

30 1月, 2018 1 次提交

ipv4: Get the address of interface correctly. · 30e948a3

由 Tonghao Zhang 提交于 1月 28, 2018

When using ioctl to get address of interface, we can't
get it anymore. For example, the command is show as below.

	# ifconfig eth0

In the patch ("03aef17b"), the devinet_ioctl does not
return a suitable value, even though we can find it in
the kernel. Then fix it now.

Fixes: 03aef17b ("devinet_ioctl(): take copyin/copyout to caller")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30e948a3

25 1月, 2018 2 次提交

devinet_ioctl(): take copyin/copyout to caller · 03aef17b

由 Al Viro 提交于 7月 01, 2017

Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

03aef17b

net: separate SIOCGIFCONF handling from dev_ioctl() · 36fd633e

由 Al Viro 提交于 6月 26, 2017

Only two of dev_ioctl() callers may pass SIOCGIFCONF to it.
Separating that codepath from the rest of dev_ioctl() allows both
to simplify dev_ioctl() itself (all other cases work with struct ifreq *)
*and* seriously simplify the compat side of that beast: all it takes
is passing to inet_gifconf() an extra argument - the size of individual
records (sizeof(struct ifreq) or sizeof(struct compat_ifreq)).  With
dev_ifconf() called directly from sock_do_ioctl()/compat_dev_ifconf()
that's easy to arrange.

As the result, compat side of SIOCGIFCONF doesn't need any
allocations, copy_in_user() back and forth, etc.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

36fd633e

14 12月, 2017 1 次提交

ipv4: igmp: guard against silly MTU values · b5476022

由 Eric Dumazet 提交于 12月 11, 2017

IPv4 stack reacts to changes to small MTU, by disabling itself under
RTNL.

But there is a window where threads not using RTNL can see a wrong
device mtu. This can lead to surprises, in igmp code where it is
assumed the mtu is suitable.

Fix this by reading device mtu once and checking IPv4 minimal MTU.

This patch adds missing IPV4_MIN_MTU define, to not abuse
ETH_MIN_MTU anymore.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5476022

20 10月, 2017 1 次提交

net: Add extack to validator_info structs used for address notifier · de95e047

由 David Ahern 提交于 10月 18, 2017

Add extack to in_validator_info and in6_validator_info. Update the one
user of each, ipvlan, to return an error message for failures.

Only manual configuration of an address is plumbed in the IPv6 code path.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de95e047

18 10月, 2017 1 次提交

ipv4: mark expected switch fall-throughs · fcfd6dfa

由 Gustavo A. R. Silva 提交于 10月 16, 2017

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in some cases I placed the "fall through" comment
on its own line, which is what GCC is expecting to find.

Addresses-Coverity-ID: 115108
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fcfd6dfa

17 10月, 2017 1 次提交

net: core: rcu-ify rtnl af_ops · 5fa85a09

由 Florian Westphal 提交于 10月 16, 2017

rtnl af_ops currently rely on rtnl mutex: unregister (called from module
exit functions) takes the rtnl mutex and all users that do af_ops lookup
also take the rtnl mutex. IOW, parallel rmmod will block until doit()
callback is done.

As none of the af_ops implementation sleep we can use rcu instead.

doit functions that need the af_ops can now use rcu instead of the
rtnl mutex provided the mutex isn't needed for other reasons.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5fa85a09

22 9月, 2017 1 次提交

net: avoid a full fib lookup when rp_filter is disabled. · 6e617de8

由 Paolo Abeni 提交于 9月 20, 2017

Since commit 1dced6a8 ("ipv4: Restore accept_local behaviour
in fib_validate_source()") a full fib lookup is needed even if
the rp_filter is disabled, if accept_local is false - which is
the default.

What we really need in the above scenario is just checking
that the source IP address is not local, and in most case we
can do that is a cheaper way looking up the ifaddr hash table.

This commit adds a helper for such lookup, and uses it to
validate the src address when rp_filter is disabled and no
'local' routes are created by the user space in the relevant
namespace.

A new ipv4 netns flag is added to account for such routes.
We need that to preserve the same behavior we had before this
patch.

It also drops the checks to bail early from __fib_validate_source,
added by the commit 1dced6a8 ("ipv4: Restore accept_local
behaviour in fib_validate_source()") they do not give any
measurable performance improvement: if we do the lookup with are
on a slower path.

This improves UDP performances for unconnected sockets
when rp_filter is disabled by 5% and also gives small but
measurable performance improvement for TCP flood scenarios.

v1 -> v2:
 - use the ifaddr lookup helper in __ip_dev_find(), as suggested
   by Eric
 - fall-back to full lookup if custom local routes are present
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e617de8

10 8月, 2017 1 次提交

rtnetlink: make rtnl_register accept a flags parameter · b97bac64

由 Florian Westphal 提交于 8月 09, 2017

This change allows us to later indicate to rtnetlink core that certain
doit functions should be called without acquiring rtnl_mutex.

This change should have no effect, we simply replace the last (now
unused) calcit argument with the new flag.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b97bac64

01 7月, 2017 1 次提交

net: convert in_device.refcnt from atomic_t to refcount_t · 7658b36f

由 Reshetova, Elena 提交于 6月 30, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7658b36f

10 6月, 2017 1 次提交

Ipvlan should return an error when an address is already in use. · 3ad7d246

由 Krister Johansen 提交于 6月 08, 2017

The ipvlan code already knows how to detect when a duplicate address is
about to be assigned to an ipvlan device. However, that failure is not
propogated outward and leads to a silent failure.

Introduce a validation step at ip address creation time and allow device
drivers to register to validate the incoming ip addresses. The ipvlan
code is the first consumer. If it detects an address in use, we can
return an error to the user before beginning to commit the new ifa in
the networking code.

This can be especially useful if it is necessary to provision many
ipvlans in containers. The provisioning software (or operator) can use
this to detect situations where an ip address is unexpectedly in use.
Signed-off-by: NKrister Johansen <kjlx@templeofstupid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ad7d246

18 4月, 2017 1 次提交

net: rtnetlink: plumb extended ack to doit function · c21ef3e3

由 David Ahern 提交于 4月 16, 2017

Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
for doit functions that call it directly.

This is the first step to using extended error reporting in rtnetlink.
>From here individual subsystems can be updated to set netlink_ext_ack as
needed.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c21ef3e3

14 4月, 2017 1 次提交

netlink: pass extended ACK struct to parsing functions · fceb6435

由 Johannes Berg 提交于 4月 12, 2017

Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fceb6435

29 3月, 2017 2 次提交

net: devinet: Add support for RTM_DELNETCONF · b5c9641d

由 David Ahern 提交于 3月 28, 2017

Send RTM_DELNETCONF notifications when a device is deleted. The message only
needs the device index, so modify inet_netconf_fill_devconf to skip devconf
references if it is NULL.

Allows a userspace cache to remove entries as devices are deleted.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5c9641d

net: devinet: Refactor inet_netconf_notify_devconf to take event · 3b022865

由 David Ahern 提交于 3月 28, 2017

Refactor inet_netconf_notify_devconf to take the event as an input arg.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b022865

13 3月, 2017 1 次提交

net: Eliminate duplicated codes by creating one new function in_dev_select_addr · 8b57fd1e

由 Gao Feng 提交于 3月 10, 2017

There are two duplicated loops codes which used to select right
address in current codes. Now eliminate these codes by creating
one new function in_dev_select_addr.
Signed-off-by: NGao Feng <fgao@ikuai8.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b57fd1e

02 3月, 2017 1 次提交

sched/headers: Prepare to move signal wakeup & sigpending methods from... · 174cd4b1

由 Ingo Molnar 提交于 2月 02, 2017

sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>

Fix up affected files that include this signal functionality via sched.h.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

174cd4b1

03 2月, 2017 1 次提交

net: ipv4: remove fib_lookup.h from devinet.c include list · 66109109

由 David Ahern 提交于 2月 01, 2017

nothing in devinet.c relies on fib_lookup.h; remove it from the includes
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66109109

25 12月, 2016 1 次提交

Replace <asm/uaccess.h> with <linux/uaccess.h> globally · 7c0f6ba6

由 Linus Torvalds 提交于 12月 24, 2016

This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c0f6ba6

02 9月, 2016 1 次提交

netconf: add a notif when settings are created · 29c994e3

由 Nicolas Dichtel 提交于 8月 30, 2016

All changes are notified, but the initial state was missing.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29c994e3

10 7月, 2016 1 次提交

ipv4: do not abuse GFP_ATOMIC in inet_netconf_notify_devconf() · fa17806c

由 Eric Dumazet 提交于 7月 08, 2016

inet_forward_change() runs with RTNL held.
We are allowed to sleep if required.

If we use __in_dev_get_rtnl() instead of __in_dev_get_rcu(),
we no longer have to use GFP_ATOMIC allocations in
inet_netconf_notify_devconf(), meaning we are less likely to miss
notifications under memory pressure, and wont touch precious memory
reserves either and risk dropping incoming packets.

inet_netconf_get_devconf() can also use GFP_KERNEL allocation.

Fixes: edc9e748 ("rtnl/ipv4: use netconf msg to advertise forwarding status")
Fixes: 9e551110 ("rtnl/ipv4: add support of RTM_GETNETCONF")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa17806c

14 3月, 2016 2 次提交

ipv4: Don't do expensive useless work during inetdev destroy. · fbd40ea0

由 David S. Miller 提交于 3月 13, 2016

When an inetdev is destroyed, every address assigned to the interface
is removed.  And in this scenerio we do two pointless things which can
be very expensive if the number of assigned interfaces is large:

1) Address promotion.  We are deleting all addresses, so there is no
   point in doing this.

2) A full nf conntrack table purge for every address.  We only need to
   do this once, as is already caught by the existing
   masq_dev_notifier so masq_inet_event() can skip this.
Reported-by: NSolar Designer <solar@openwall.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Tested-by: NCyrill Gorcunov <gorcunov@openvz.org>

fbd40ea0

netconf: add macro to represent all attributes · 136ba622

由 Zhang Shengju 提交于 3月 10, 2016

This patch adds macro NETCONFA_ALL to represent all type of netconf
attributes for IPv4 and IPv6.
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

136ba622

27 2月, 2016 2 次提交

net: l3mdev: prefer VRF master for source address selection · 17b693cd

由 David Lamparter 提交于 2月 24, 2016

When selecting an address in context of a VRF, the vrf master should be
preferred for address selection.  If it isn't, the user has a hard time
getting the system to select to their preference - the code will pick
the address off the first in-VRF interface it can find, which on a
router could well be a non-routable address.
Signed-off-by: NDavid Lamparter <equinox@diac24.net>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
[dsa: Fixed comment style and removed extra blank link ]
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

17b693cd

net: l3mdev: address selection should only consider devices in L3 domain · 3f2fb9a8

由 David Ahern 提交于 2月 24, 2016

David Lamparter noted a use case where the source address selection fails
to pick an address from a VRF interface - unnumbered interfaces.

Relevant commands from his script:
    ip addr add 9.9.9.9/32 dev lo
    ip link set lo up

    ip link add name vrf0 type vrf table 101
    ip rule add oif vrf0 table 101
    ip rule add iif vrf0 table 101
    ip link set vrf0 up
    ip addr add 10.0.0.3/32 dev vrf0

    ip link add name dummy2 type dummy
    ip link set dummy2 master vrf0 up

    --> note dummy2 has no address - unnumbered device

    ip route add 10.2.2.2/32 dev dummy2 table 101
    ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02

    tcpdump -ni dummy2 &

And using ping instead of his socat example:
    $ ping -I vrf0 -c1 10.2.2.2
    ping: Warning: source address might be selected on device other than vrf0.
    PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data.

>From tcpdump:
    12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64

Note the source address is from lo and is not a VRF local address. With
this patch:

    $ ping -I vrf0 -c1 10.2.2.2
    PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data.

>From tcpdump:
    12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64

Now the source address comes from vrf0.

The ipv4 function for selecting source address takes a const argument.
Removing the const requires touching a lot of places, so instead
l3mdev_master_ifindex_rcu is changed to take a const argument and then
do the typecast to non-const as required by netdev_master_upper_dev_get_rcu.
This is similar to what l3mdev_fib_table_rcu does.

IPv6 for unnumbered interfaces appears to be selecting the addresses
properly.

Cc: David Lamparter <david@opensourcerouting.org>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f2fb9a8

20 2月, 2016 1 次提交

rtnl: RTM_GETNETCONF: fix wrong return value · a97eb33f

由 Anton Protopopov 提交于 2月 16, 2016

An error response from a RTM_GETNETCONF request can return the positive
error value EINVAL in the struct nlmsgerr that can mislead userspace.
Signed-off-by: NAnton Protopopov <a.s.protopopov@gmail.com>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a97eb33f

11 2月, 2016 2 次提交

ipv4: add option to drop gratuitous ARP packets · 97daf331

由 Johannes Berg 提交于 2月 04, 2016

In certain 802.11 wireless deployments, there will be ARP proxies
that use knowledge of the network to correctly answer requests.
To prevent gratuitous ARP frames on the shared medium from being
a problem, on such deployments wireless needs to drop them.

Enable this by providing an option called "drop_gratuitous_arp".
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97daf331

ipv4: add option to drop unicast encapsulated in L2 multicast · 12b74dfa

由 Johannes Berg 提交于 2月 04, 2016

In order to solve a problem with 802.11, the so-called hole-196 attack,
add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if
enabled, causes the stack to drop IPv4 unicast packets encapsulated in
link-layer multi- or broadcast frames. Such frames can (as an attack)
be created by any member of the same wireless network and transmitted
as valid encrypted frames since the symmetric key for broadcast frames
is shared between all stations.

Additionally, enabling this option provides compliance with a SHOULD
clause of RFC 1122.
Reviewed-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12b74dfa

22 10月, 2015 1 次提交

netlink: Rightsize IFLA_AF_SPEC size calculation · b1974ed0

由 Arad, Ronen 提交于 10月 19, 2015

if_nlmsg_size() overestimates the minimum allocation size of netlink
dump request (when called from rtnl_calcit()) or the size of the
message (when called from rtnl_getlink()). This is because
ext_filter_mask is not supported by rtnl_link_get_af_size() and
rtnl_link_get_size().

The over-estimation is significant when at least one netdev has many
VLANs configured (8 bytes for each configured VLAN).

This patch-set "rightsizes" the protocol specific attribute size
calculation by propagating ext_filter_mask to rtnl_link_get_af_size()
and adding this a argument to get_link_af_size op in rtnl_af_ops.

Bridge module already used filtering aware sizing for notifications.
br_get_link_af_size_filtered() is consistent with the modified
get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops.
br_get_link_af_size() becomes unused and thus removed.
Signed-off-by: NRonen Arad <ronen.arad@intel.com>
Acked-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1974ed0

16 9月, 2015 1 次提交

rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid dumping inet/inet6 stats · d5566fd7

由 Sowmini Varadhan 提交于 9月 11, 2015

Many commonly used functions like getifaddrs() invoke RTM_GETLINK
to dump the interface information, and do not need the
the AF_INET6 statististics that are always returned by default
from rtnl_fill_ifinfo().

Computing the statistics can be an expensive operation that impacts
scaling, so it is desirable to avoid this if the information is
not needed.

This patch adds a the RTEXT_FILTER_SKIP_STATS extended info flag that
can be passed with netlink_request() to avoid statistics computation
for the ifinfo path.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5566fd7

29 7月, 2015 1 次提交

net/ipv4: suppress NETDEV_UP notification on address lifetime update · 865b8042

由 David Ward 提交于 7月 26, 2015

This notification causes the FIB to be updated, which is not needed
because the address already exists, and more importantly it may undo
intentional changes that were made to the FIB after the address was
originally added. (As a point of comparison, when an address becomes
deprecated because its preferred lifetime expired, a notification on
this chain is not generated.)

The motivation for this commit is fixing an incompatibility between
DHCP clients which set and update the address lifetime according to
the lease, and a commercial VPN client which replaces kernel routes
in a way that outbound traffic is sent only through the tunnel (and
disconnects if any further route changes are detected via netlink).
Signed-off-by: NDavid Ward <david.ward@ll.mit.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

865b8042

09 7月, 2015 1 次提交

ipv4: add support for linkdown sysctl to netconf · 974d7af5

由 Andy Gospodarek 提交于 7月 07, 2015

This kernel patch exports the value of the new
ignore_routes_with_linkdown via netconf.

v2: changes to notify userspace via netlink when sysctl values change
and proposed for 'net' since this could be considered a bugfix
Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Suggested-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

974d7af5

24 6月, 2015 1 次提交

net: ipv4 sysctl option to ignore routes when nexthop link is down · 0eeb075f

由 Andy Gospodarek 提交于 6月 23, 2015

This feature is only enabled with the new per-interface or ipv4 global
sysctls called 'ignore_routes_with_linkdown'.

net.ipv4.conf.all.ignore_routes_with_linkdown = 0
net.ipv4.conf.default.ignore_routes_with_linkdown = 0
net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
...

When the above sysctls are set, will report to userspace that a route is
dead and will no longer resolve to this nexthop when performing a fib
lookup.  This will signal to userspace that the route will not be
selected.  The signalling of a RTNH_F_DEAD is only passed to userspace
if the sysctl is enabled and link is down.  This was done as without it
the netlink listeners would have no idea whether or not a nexthop would
be selected.   The kernel only sets RTNH_F_DEAD internally if the
interface has IFF_UP cleared.

With the new sysctl set, the following behavior can be observed
(interface p8p1 is link-down):

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 dead linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 dead linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
90.0.0.1 via 70.0.0.2 dev p7p1  src 70.0.0.1
    cache
local 80.0.0.1 dev lo  src 80.0.0.1
    cache <local>
80.0.0.2 via 10.0.5.2 dev p9p1  src 10.0.5.15
    cache

While the route does remain in the table (so it can be modified if
needed rather than being wiped away as it would be if IFF_UP was
cleared), the proper next-hop is chosen automatically when the link is
down.  Now interface p8p1 is linked-up:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2
192.168.56.0/24 dev p2p1  proto kernel  scope link  src 192.168.56.2
90.0.0.1 via 80.0.0.2 dev p8p1  src 80.0.0.1
    cache
local 80.0.0.1 dev lo  src 80.0.0.1
    cache <local>
80.0.0.2 dev p8p1  src 80.0.0.1
    cache

and the output changes to what one would expect.

If the sysctl is not set, the following output would be expected when
p8p1 is down:

default via 10.0.5.2 dev p9p1
10.0.5.0/24 dev p9p1  proto kernel  scope link  src 10.0.5.15
70.0.0.0/24 dev p7p1  proto kernel  scope link  src 70.0.0.1
80.0.0.0/24 dev p8p1  proto kernel  scope link  src 80.0.0.1 linkdown
90.0.0.0/24 via 80.0.0.2 dev p8p1  metric 1 linkdown
90.0.0.0/24 via 70.0.0.2 dev p7p1  metric 2

Since the dead flag does not appear, there should be no expectation that
the kernel would skip using this route due to link being down.

v2: Split kernel changes into 2 patches, this actually makes a
behavioral change if the sysctl is set.  Also took suggestion from Alex
to simplify code by only checking sysctl during fib lookup and
suggestion from Scott to add a per-interface sysctl.

v3: Code clean-ups to make it more readable and efficient as well as a
reverse path check fix.

v4: Drop binary sysctl

v5: Whitespace fixups from Dave

v6: Style changes from Dave and checkpatch suggestions

v7: One more checkpatch fixup
Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
Acked-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0eeb075f

04 4月, 2015 2 次提交

ipv4: coding style: comparison for inequality with NULL · 00db4124

由 Ian Morris 提交于 4月 03, 2015

The ipv4 code uses a mixture of coding styles. In some instances check
for non-NULL pointer is done as x != NULL and sometimes as x. x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00db4124

ipv4: coding style: comparison for equality with NULL · 51456b29

由 Ian Morris 提交于 4月 03, 2015

The ipv4 code uses a mixture of coding styles. In some instances check
for NULL pointer is done as x == NULL and sometimes as !x. !x is
preferred according to checkpatch and this patch makes the code
consistent by adopting the latter form.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51456b29

01 4月, 2015 2 次提交

netlink: implement nla_get_in_addr and nla_get_in6_addr · 67b61f6c

由 Jiri Benc 提交于 3月 29, 2015

Those are counterparts to nla_put_in_addr and nla_put_in6_addr.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67b61f6c

netlink: implement nla_put_in_addr and nla_put_in6_addr · 930345ea

由 Jiri Benc 提交于 3月 29, 2015

IP addresses are often stored in netlink attributes. Add generic functions
to do that.

For nla_put_in_addr, it would be nicer to pass struct in_addr but this is
not used universally throughout the kernel, in way too many places __be32 is
used to store IPv4 address.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

930345ea

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功