提交 · e6e6659446c87057aede26a39d9f16b19001716f · openeuler / Kernel

17 10月, 2017 3 次提交

rtnetlink: bring NETDEV_POST_TYPE_CHANGE event process back in rtnetlink_event · e6e66594

由 Xin Long 提交于 10月 15, 2017

As I said in patch 'rtnetlink: bring NETDEV_CHANGEMTU event process back
in rtnetlink_event', removing NETDEV_POST_TYPE_CHANGE event was not the
right fix for the redundant notifications issue.

So bring this event process back to rtnetlink_event and the old redundant
notifications issue would be fixed in the later patch 'rtnetlink: check
DO_SETLINK_NOTIFY correctly in do_setlink'.

Fixes: aef091ae ("rtnetlink: Do not generate notifications for POST_TYPE_CHANGE event")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6e66594

rtnetlink: bring NETDEV_CHANGE_TX_QUEUE_LEN event process back in rtnetlink_event · ebdcf045

由 Xin Long 提交于 10月 15, 2017

The same fix for changing mtu in the patch 'rtnetlink: bring
NETDEV_CHANGEMTU event process back in rtnetlink_event' is
needed for changing tx_queue_len.

Note that the redundant notifications issue for tx_queue_len
will be fixed in the later patch 'rtnetlink: do not send
notification for tx_queue_len in do_setlink'.

Fixes: 27b3b551 ("rtnetlink: Do not generate notifications for NETDEV_CHANGE_TX_QUEUE_LEN event")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebdcf045

rtnetlink: bring NETDEV_CHANGEMTU event process back in rtnetlink_event · 8a212589

由 Xin Long 提交于 10月 15, 2017

Commit 085e1a65 ("rtnetlink: Do not generate notifications for MTU
events") tried to fix the redundant notifications issue when ip link
set mtu by removing NETDEV_CHANGEMTU event process in rtnetlink_event.

But it also resulted in no notification generated when dev's mtu is
changed via other methods, like:
  'ifconfig eth1 mtu 1400' or 'echo 1400 > /sys/class/net/eth1/mtu'
It would cause users not to be notified by this change.

This patch is to fix it by bringing NETDEV_CHANGEMTU event back into
rtnetlink_event, and the redundant notifications issue will be fixed
in the later patch 'rtnetlink: check DO_SETLINK_NOTIFY correctly in
do_setlink'.

Fixes: 085e1a65 ("rtnetlink: Do not generate notifications for MTU events")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a212589

15 10月, 2017 2 次提交

l2tp: check ps->sock before running pppol2tp_session_ioctl() · 5903f594

由 Guillaume Nault 提交于 10月 13, 2017

When pppol2tp_session_ioctl() is called by pppol2tp_tunnel_ioctl(),
the session may be unconnected. That is, it was created by
pppol2tp_session_create() and hasn't been connected with
pppol2tp_connect(). In this case, ps->sock is NULL, so we need to check
for this case in order to avoid dereferencing a NULL pointer.

Fixes: 309795f4 ("l2tp: Add netlink control API for L2TP")
Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5903f594

W
net: fix typo in skbuff.c · 09001b03
由 Wenhua Shi 提交于 10月 14, 2017
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
09001b03

13 10月, 2017 1 次提交

ip: update policy routing config help · 12ed3772

由 Stephen Hemminger 提交于 10月 11, 2017

The kernel config help for policy routing was still pointing at
an ancient document from 2000 that refers to Linux 2.1. Update it
to point to something that is at least occasionally updated.
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12ed3772

12 10月, 2017 1 次提交

net/ncsi: Don't limit vids based on hot_channel · 6e9c0075

由 Samuel Mendoza-Jonas 提交于 10月 11, 2017

Currently we drop any new VLAN ids if there are more than the current
(or last used) channel can support. Most importantly this is a problem
if no channel has been selected yet, resulting in a segfault.

Secondly this does not necessarily reflect the capabilities of any other
channels. Instead only drop a new VLAN id if we are already tracking the
maximum allowed by the NCSI specification. Per-channel limits are
already handled by ncsi_add_filter(), but add a message to set_one_vid()
to make it obvious that the channel can not support any more VLAN ids.
Signed-off-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e9c0075

11 10月, 2017 2 次提交

net: call cgroup_sk_alloc() earlier in sk_clone_lock() · c0576e39

由 Eric Dumazet 提交于 10月 10, 2017

If for some reason, the newly allocated child need to be freed,
we will call cgroup_put() (via sk_free_unlock_clone()) while the
corresponding cgroup_get() was not yet done, and we will free memory
too soon.

Fixes: d979a39d ("cgroup: duplicate cgroup reference when cloning sockets")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0576e39

Revert "net: defer call to cgroup_sk_alloc()" · 75cb0709

由 Eric Dumazet 提交于 10月 10, 2017

This reverts commit fbb1fb4a.

This was not the proper fix, lets cleanly revert it, so that
following patch can be carried to stable versions.

sock_cgroup_ptr() callers do not expect a NULL return value.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75cb0709

10 10月, 2017 6 次提交

net: defer call to cgroup_sk_alloc() · fbb1fb4a

由 Eric Dumazet 提交于 10月 08, 2017

sk_clone_lock() might run while TCP/DCCP listener already vanished.

In order to prevent use after free, it is better to defer cgroup_sk_alloc()
to the point we know both parent and child exist, and from process context.

Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbb1fb4a

net: memcontrol: defer call to mem_cgroup_sk_alloc() · 9f1c2674

由 Eric Dumazet 提交于 10月 08, 2017

Instead of calling mem_cgroup_sk_alloc() from BH context,
it is better to call it from inet_csk_accept() in process context.

Not only this removes code in mem_cgroup_sk_alloc(), but it also
fixes a bug since listener might have been dismantled and css_get()
might cause a use-after-free.

Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f1c2674

udp: fix bcast packet reception · 996b44fc

由 Paolo Abeni 提交于 10月 09, 2017

The commit bc044e8d ("udp: perform source validation for
mcast early demux") does not take into account that broadcast packets
lands in the same code path and they need different checks for the
source address - notably, zero source address are valid for bcast
and invalid for mcast.

As a result, 2nd and later broadcast packets with 0 source address
landing to the same socket are dropped. This breaks dhcp servers.

Since we don't have stringent performance requirements for ingress
broadcast traffic, fix it by disabling UDP early demux such traffic.
Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Fixes: bc044e8d ("udp: perform source validation for mcast early demux")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

996b44fc

netlink: do not set cb_running if dump's start() errs · 41c87425

由 Jason A. Donenfeld 提交于 10月 09, 2017

It turns out that multiple places can call netlink_dump(), which means
it's still possible to dereference partially initialized values in
dump() that were the result of a faulty returned start().

This fixes the issue by calling start() _before_ setting cb_running to
true, so that there's no chance at all of hitting the dump() function
through any indirect paths.

It also moves the call to start() to be when the mutex is held. This has
the nice side effect of serializing invocations to start(), which is
likely desirable anyway. It also prevents any possible other races that
might come out of this logic.

In testing this with several different pieces of tricky code to trigger
these issues, this commit fixes all avenues that I'm aware of.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41c87425

ipv4: Fix traffic triggered IPsec connections. · 6c0e7284

由 Steffen Klassert 提交于 10月 09, 2017

A recent patch removed the dst_free() on the allocated
dst_entry in ipv4_blackhole_route(). The dst_free() marked the
dst_entry as dead and added it to the gc list. I.e. it was setup
for a one time usage. As a result we may now have a blackhole
route cached at a socket on some IPsec scenarios. This makes the
connection unusable.

Fix this by marking the dst_entry directly at allocation time
as 'dead', so it is used only once.

Fixes: b838d5e1 ("ipv4: mark DST_NOGC and remove the operation of dst_free()")
Reported-by: NTobias Brunner <tobias@strongswan.org>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c0e7284

ipv6: Fix traffic triggered IPsec connections. · 62cf27e5

由 Steffen Klassert 提交于 10月 09, 2017

A recent patch removed the dst_free() on the allocated
dst_entry in ipv6_blackhole_route(). The dst_free() marked
the dst_entry as dead and added it to the gc list. I.e. it
was setup for a one time usage. As a result we may now have
a blackhole route cached at a socket on some IPsec scenarios.
This makes the connection unusable.

Fix this by marking the dst_entry directly at allocation time
as 'dead', so it is used only once.

Fixes: 587fea74 ("ipv6: mark DST_NOGC and remove the operation of dst_free()")
Reported-by: NTobias Brunner <tobias@strongswan.org>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62cf27e5

09 10月, 2017 5 次提交

netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1' · 98589a09

由 Shmulik Ladkani 提交于 10月 09, 2017

Commit 2c16d603 ("netfilter: xt_bpf: support ebpf") introduced
support for attaching an eBPF object by an fd, with the
'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each
IPT_SO_SET_REPLACE call.

However this breaks subsequent iptables calls:

 # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT
 # iptables -A INPUT -s 5.6.7.8 -j ACCEPT
 iptables: Invalid argument. Run `dmesg' for more information.

That's because iptables works by loading existing rules using
IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with
the replacement set.

However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number
(from the initial "iptables -m bpf" invocation) - so when 2nd invocation
occurs, userspace passes a bogus fd number, which leads to
'bpf_mt_check_v1' to fail.

One suggested solution [1] was to hack iptables userspace, to perform a
"entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new,
process-local fd per every 'xt_bpf_info_v1' entry seen.

However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to
depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects.

This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given
'.fd' and instead perform an in-kernel lookup for the bpf object given
the provided '.path'.

It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named
XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is
expected to provide the path of the pinned object.

Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved.

References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2
            [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2Reported-by: NRafael Buchbinder <rafi@rbk.ms>
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

98589a09

netfilter: SYNPROXY: skip non-tcp packet in {ipv4, ipv6}_synproxy_hook · 49f817d7

由 Lin Zhang 提交于 10月 06, 2017

In function {ipv4,ipv6}_synproxy_hook we expect a normal tcp packet, but
the real server maybe reply an icmp error packet related to the exist
tcp conntrack, so we will access wrong tcp data.

Fix it by checking for the protocol field and only process tcp traffic.
Signed-off-by: NLin Zhang <xiaolou4617@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

49f817d7

tipc: Unclone message at secondary destination lookup · a9e2971b

由 Jon Maloy 提交于 10月 07, 2017

When a bundling message is received, the function tipc_link_input()
calls function tipc_msg_extract() to unbundle all inner messages of
the bundling message before adding them to input queue.

The function tipc_msg_extract() just clones all inner skb for all
inner messagges from the bundling skb. This means that the skb
headroom of an inner message overlaps with the data part of the
preceding message in the bundle.

If the message in question is a name addressed message, it may be
subject to a secondary destination lookup, and eventually be sent out
on one of the interfaces again. But, since what is perceived as headroom
by the device driver in reality is the last bytes of the preceding
message in the bundle, the latter will be overwritten by the MAC
addresses of the L2 header. If the preceding message has not yet been
consumed by the user, it will evenually be delivered with corrupted
contents.

This commit fixes this by uncloning all messages passing through the
function tipc_msg_lookup_dest(), hence ensuring that the headroom
is always valid when the message is passed on.
Signed-off-by: NTung Nguyen <tung.q.nguyen@dektech.com.au>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a9e2971b

tipc: correct initialization of skb list · 3382605f

由 Jon Maloy 提交于 10月 07, 2017

We change the initialization of the skb transmit buffer queues
in the functions tipc_bcast_xmit() and tipc_rcast_xmit() to also
initialize their spinlocks. This is needed because we may, during
error conditions, need to call skb_queue_purge() on those queues
further down the stack.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3382605f

gso: fix payload length when gso_size is zero · 3d0241d5

由 Alexey Kodanev 提交于 10月 06, 2017

When gso_size reset to zero for the tail segment in skb_segment(), later
in ipv6_gso_segment(), __skb_udp_tunnel_segment() and gre_gso_segment()
we will get incorrect results (payload length, pcsum) for that segment.
inet_gso_segment() already has a check for gso_size before calculating
payload.

The issue was found with LTP vxlan & gre tests over ixgbe NIC.

Fixes: 07b26c94 ("gso: Support partial splitting at the frag_list pointer")
Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d0241d5

08 10月, 2017 1 次提交

ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real · a2d3f3e3

由 Matteo Croce 提交于 10月 05, 2017

Commit 35e015e1 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
was intended to affect accept_dad flag handling in such a way that
DAD operation and mode on a given interface would be selected
according to the maximum value of conf/{all,interface}/accept_dad.

However, addrconf_dad_begin() checks for particular cases in which we
need to skip DAD, and this check was modified in the wrong way.

Namely, it was modified so that, if the accept_dad flag is 0 for the
given interface *or* for all interfaces, DAD would be skipped.

We have instead to skip DAD if accept_dad is 0 for the given interface
*and* for all interfaces.

Fixes: 35e015e1 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
Acked-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NMatteo Croce <mcroce@redhat.com>
Reported-by: NErik Kline <ek@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2d3f3e3

06 10月, 2017 2 次提交

netfilter: x_tables: avoid stack-out-of-bounds read in xt_copy_counters_from_user · e466af75

由 Eric Dumazet 提交于 10月 05, 2017

syzkaller reports an out of bound read in strlcpy(), triggered
by xt_copy_counters_from_user()

Fix this by using memcpy(), then forcing a zero byte at the last position
of the destination, as Florian did for the non COMPAT code.

Fixes: d7591f0c ("netfilter: x_tables: introduce and use xt_copy_counters_from_user")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

e466af75

netfilter: nf_tables: do not dump chain counters if not enabled · 5f9bfe0e

由 Pablo Neira Ayuso 提交于 10月 04, 2017

Chain counters are only enabled on demand since 9f08ea84, skip them
when dumping them via netlink.

Fixes: 9f08ea84 ("netfilter: nf_tables: keep chain counters away from hot path")
Reported-by: NJohny Mattsson <johny.mattsson+kernel@gmail.com>
Tested-by: NJohny Mattsson <johny.mattsson+kernel@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

5f9bfe0e

05 10月, 2017 1 次提交

net: 8021q: skip packets if the vlan is down · e769fcec

由 Vishakha Narvekar 提交于 10月 03, 2017

If the vlan is down, free the packet instead of proceeding with other
processing, or counting it as received.  If vlan interfaces are used
as slaves for bonding, with arp monitoring for connectivity, if the rx
counter is seen to be incrementing, then the bond device will not
observe that the interface is down.

CC: David S. Miller <davem@davemloft.net>
Signed-off-by: NVishakha Narvekar <Vishakha.Narvekar@dell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e769fcec

04 10月, 2017 2 次提交

nl80211: Define policy for packet pattern attributes · ad670233

由 Peng Xu 提交于 10月 03, 2017

Define a policy for packet pattern attributes in order to fix a
potential read over the end of the buffer during nla_get_u32()
of the NL80211_PKTPAT_OFFSET attribute.

Note that the data there can always be read due to SKB allocation
(with alignment and struct skb_shared_info at the end), but the
data might be uninitialized. This could be used to leak some data
from uninitialized vmalloc() memory, but most drivers don't allow
an offset (so you'd just get -EINVAL if the data is non-zero) or
just allow it with a fixed value - 100 or 128 bytes, so anything
above that would get -EINVAL. With brcmfmac the limit is 1500 so
(at least) one byte could be obtained.

Cc: stable@kernel.org
Signed-off-by: NPeng Xu <pxu@qti.qualcomm.com>
Signed-off-by: NJouni Malinen <jouni@qca.qualcomm.com>
[rewrite description based on SKB allocation knowledge]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

ad670233

net: rtnetlink: fix info leak in RTM_GETSTATS call · ce024f42

由 Nikolay Aleksandrov 提交于 10月 03, 2017

When RTM_GETSTATS was added the fields of its header struct were not all
initialized when returning the result thus leaking 4 bytes of information
to user-space per rtnl_fill_statsinfo call, so initialize them now. Thanks
to Alexander Potapenko for the detailed report and bisection.
Reported-by: NAlexander Potapenko <glider@google.com>
Fixes: 10c9ead9 ("rtnetlink: add new RTM_GETSTATS message to dump link stats")
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce024f42

03 10月, 2017 2 次提交

netfilter: nf_tables: Release memory obtained by kasprintf · e63aaaa6

由 Arvind Yadav 提交于 9月 20, 2017

Free memory region, if nf_tables_set_alloc_name is not successful.

Fixes: 38745490 ("netfilter: nf_tables: Allow set names of up to 255 chars")
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

e63aaaa6

socket, bpf: fix possible use after free · eefca20e

由 Eric Dumazet 提交于 10月 02, 2017

Starting from linux-4.4, 3WHS no longer takes the listener lock.

Since this time, we might hit a use-after-free in sk_filter_charge(),
if the filter we got in the memcpy() of the listener content
just happened to be replaced by a thread changing listener BPF filter.

To fix this, we need to make sure the filter refcount is not already
zero before incrementing it again.

Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eefca20e

02 10月, 2017 6 次提交

l2tp: fix l2tp_eth module loading · 9f775ead

由 Guillaume Nault 提交于 9月 28, 2017

The l2tp_eth module crashes if its netlink callbacks are run when the
pernet data aren't initialised.

We should normally register_pernet_device() before the genl callbacks.
However, the pernet data only maintain a list of l2tpeth interfaces,
and this list is never used. So let's just drop pernet handling
instead.

Fixes: d9e31d17 ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f775ead

ip_gre: erspan device should keep dst · c84bed44

由 Xin Long 提交于 10月 01, 2017

The patch 'ip_gre: ipgre_tap device should keep dst' fixed
the issue ipgre_tap dev mtu couldn't be updated in tx path.

The same fix is needed for erspan as well.

Fixes: 84e54fe0 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c84bed44

ip_gre: set tunnel hlen properly in erspan_tunnel_init · c122fda2

由 Xin Long 提交于 10月 01, 2017

According to __gre_tunnel_init, tunnel->hlen should be set as the
headers' length between inner packet and outer iphdr.

It would be used especially to calculate a proper mtu when updating
mtu in tnl_update_pmtu. Now without setting it, a bigger mtu value
than expected would be updated, which hurts performance a lot.

This patch is to fix it by setting tunnel->hlen with:
   tunnel->tun_hlen + tunnel->encap_hlen + sizeof(struct erspanhdr)

Fixes: 84e54fe0 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c122fda2

ip_gre: check packet length and mtu correctly in erspan_xmit · 5513d08d

由 Xin Long 提交于 10月 01, 2017

As a ARPHRD_ETHER device, skb->len in erspan_xmit is the length
of the whole ether packet. So before checking if a packet size
exceeds the mtu, skb->len should subtract dev->hard_header_len.

Otherwise, all packets with max size according to mtu would be
trimmed to be truncated packet.

Fixes: 84e54fe0 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5513d08d

ip_gre: get key from session_id correctly in erspan_rcv · 935a9749

由 Xin Long 提交于 10月 01, 2017

erspan only uses the first 10 bits of session_id as the key to look
up the tunnel. But in erspan_rcv, it missed 'session_id & ID_MASK'
when getting the key from session_id.

If any other flag is also set in session_id in a packet, it would
fail to find the tunnel due to incorrect key in erspan_rcv.

This patch is to add 'session_id & ID_MASK' there and also remove
the unnecessary variable session_id.

Fixes: 84e54fe0 ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

935a9749

sunrpc: remove redundant initialization of sock · d099b8af

由 Colin Ian King 提交于 9月 18, 2017

sock is being initialized and then being almost immediately updated
hence the initialized value is not being used and is redundant. Remove
the initialization. Cleans up clang warning:

warning: Value stored to 'sock' during its initialization is never read
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d099b8af

01 10月, 2017 6 次提交

tipc: use only positive error codes in messages · aad06212

由 Parthasarathy Bhuvaragan 提交于 9月 29, 2017

In commit e3a77561 ("tipc: split up function tipc_msg_eval()"),
we have updated the function tipc_msg_lookup_dest() to set the error
codes to negative values at destination lookup failures. Thus when
the function sets the error code to -TIPC_ERR_NO_NAME, its inserted
into the 4 bit error field of the message header as 0xf instead of
TIPC_ERR_NO_NAME (1). The value 0xf is an unknown error code.

In this commit, we set only positive error code.

Fixes: e3a77561 ("tipc: split up function tipc_msg_eval()")
Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aad06212

udp: perform source validation for mcast early demux · bc044e8d

由 Paolo Abeni 提交于 9月 28, 2017

The UDP early demux can leverate the rx dst cache even for
multicast unconnected sockets.

In such scenario the ipv4 source address is validated only on
the first packet in the given flow. After that, when we fetch
the dst entry  from the socket rx cache, we stop enforcing
the rp_filter and we even start accepting any kind of martian
addresses.

Disabling the dst cache for unconnected multicast socket will
cause large performace regression, nearly reducing by half the
max ingress tput.

Instead we factor out a route helper to completely validate an
skb source address for multicast packets and we call it from
the UDP early demux for mcast packets landing on unconnected
sockets, after successful fetching the related cached dst entry.

This still gives a measurable, but limited performance
regression:

		rp_filter = 0		rp_filter = 1
edmux disabled:	1182 Kpps		1127 Kpps
edmux before:	2238 Kpps		2238 Kpps
edmux after:	2037 Kpps		2019 Kpps

The above figures are on top of current net tree.
Applying the net-next commit 6e617de8 ("net: avoid a full
fib lookup when rp_filter is disabled.") the delta with
rp_filter == 0 will decrease even more.

Fixes: 421b3885 ("udp: ipv4: Add udp early demux")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc044e8d

IPv4: early demux can return an error code · 7487449c

由 Paolo Abeni 提交于 9月 28, 2017

Currently no error is emitted, but this infrastructure will
used by the next patch to allow source address validation
for mcast sockets.
Since early demux can do a route lookup and an ipv4 route
lookup can return an error code this is consistent with the
current ipv4 route infrastructure.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7487449c

ip6_tunnel: update mtu properly for ARPHRD_ETHER tunnel device in tx path · d41bb33b

由 Xin Long 提交于 9月 28, 2017

Now when updating mtu in tx path, it doesn't consider ARPHRD_ETHER tunnel
device, like ip6gre_tap tunnel, for which it should also subtract ether
header to get the correct mtu.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d41bb33b

ip6_gre: ip6gre_tap device should keep dst · 2d40557c

由 Xin Long 提交于 9月 28, 2017

The patch 'ip_gre: ipgre_tap device should keep dst' fixed
a issue that ipgre_tap mtu couldn't be updated in tx path.

The same fix is needed for ip6gre_tap as well.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d40557c

ip_gre: ipgre_tap device should keep dst · d51711c0

由 Xin Long 提交于 9月 28, 2017

Without keeping dst, the tunnel will not update any mtu/pmtu info,
since it does not have a dst on the skb.

Reproducer:
  client(ipgre_tap1 - eth1) <-----> (eth1 - ipgre_tap1)server

After reducing eth1's mtu on client, then perforamnce became 0.

This patch is to netif_keep_dst in gre_tap_init, as ipgre does.
Reported-by: NJianlin Shi <jishi@redhat.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d51711c0

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功