提交 · 95c2027bfeda21a28eb245121e6a249f38d0788e · openanolis / cloud-kernel

30 11月, 2016 1 次提交

net/sched: pedit: make sure that offset is valid · 95c2027b

由 Amir Vadai 提交于 11月 28, 2016

Add a validation function to make sure offset is valid:
1. Not below skb head (could happen when offset is negative).
2. Validate both 'offset' and 'at'.
Signed-off-by: NAmir Vadai <amir@vadai.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95c2027b

29 11月, 2016 3 次提交

net: dsa: fix unbalanced dsa_switch_tree reference counting · 7a99cd6e

由 Nikita Yushchenko 提交于 11月 28, 2016

_dsa_register_switch() gets a dsa_switch_tree object either via
dsa_get_dst() or via dsa_add_dst(). Former path does not increase kref
in returned object (resulting into caller not owning a reference),
while later path does create a new object (resulting into caller owning
a reference).

The rest of _dsa_register_switch() assumes that it owns a reference, and
calls dsa_put_dst().

This causes a memory breakage if first switch in the tree initialized
successfully, but second failed to initialize. In particular, freed
dsa_swith_tree object is left referenced by switch that was initialized,
and later access to sysfs attributes of that switch cause OOPS.

To fix, need to add kref_get() call to dsa_get_dst().

Fixes: 83c0afae ("net: dsa: Add new binding implementation")
Signed-off-by: NNikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a99cd6e

net: handle no dst on skb in icmp6_send · 79dc7e3f

由 David Ahern 提交于 11月 27, 2016

Andrey reported the following while fuzzing the kernel with syzkaller:

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff8800666d4200 task.stack: ffff880067348000
RIP: 0010:[<ffffffff833617ec>]  [<ffffffff833617ec>]
icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
RSP: 0018:ffff88006734f2c0  EFLAGS: 00010206
RAX: ffff8800666d4200 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
RBP: ffff88006734f630 R08: ffff880064138418 R09: 0000000000000003
R10: dffffc0000000000 R11: 0000000000000005 R12: 0000000000000000
R13: ffffffff84e7e200 R14: ffff880064138484 R15: ffff8800641383c0
FS:  00007fb3887a07c0(0000) GS:ffff88006cc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000000 CR3: 000000006b040000 CR4: 00000000000006f0
Stack:
 ffff8800666d4200 ffff8800666d49f8 ffff8800666d4200 ffffffff84c02460
 ffff8800666d4a1a 1ffff1000ccdaa2f ffff88006734f498 0000000000000046
 ffff88006734f440 ffffffff832f4269 ffff880064ba7456 0000000000000000
Call Trace:
 [<ffffffff83364ddc>] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557
 [<     inline     >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88
 [<ffffffff83394405>] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157
 [<ffffffff8339a759>] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663
 [<ffffffff832ee773>] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191
 ...

icmp6_send / icmpv6_send is invoked for both rx and tx paths. In both
cases the dst->dev should be preferred for determining the L3 domain
if the dst has been set on the skb. Fallback to the skb->dev if it has
not. This covers the case reported here where icmp6_send is invoked on
Rx before the route lookup.

Fixes: 5d41ce29 ("net: icmp6_send should use dst dev to determine L3 domain")
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79dc7e3f

J
tcp: Set DEFAULT_TCP_CONG to bbr if DEFAULT_BBR is set · 4df21dfc
由 Julian Wollrath 提交于 11月 25, 2016
```
Signed-off-by: NJulian Wollrath <jwollrath@web.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
4df21dfc

28 11月, 2016 3 次提交

net, sched: respect rcu grace period on cls destruction · d9363774

由 Daniel Borkmann 提交于 11月 27, 2016

Roi reported a crash in flower where tp->root was NULL in ->classify()
callbacks. Reason is that in ->destroy() tp->root is set to NULL via
RCU_INIT_POINTER(). It's problematic for some of the classifiers, because
this doesn't respect RCU grace period for them, and as a result, still
outstanding readers from tc_classify() will try to blindly dereference
a NULL tp->root.

The tp->root object is strictly private to the classifier implementation
and holds internal data the core such as tc_ctl_tfilter() doesn't know
about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root
is only checked for NULL in ->get() callback, but nowhere else. This is
misleading and seemed to be copied from old classifier code that was not
cleaned up properly. For example, d3fa76ee ("[NET_SCHED]: cls_basic:
fix NULL pointer dereference") moved tp->root initialization into ->init()
routine, where before it was part of ->change(), so ->get() had to deal
with tp->root being NULL back then, so that was indeed a valid case, after
d3fa76ee, not really anymore. We used to set tp->root to NULL long
ago in ->destroy(), see 47a1a1d4 ("pkt_sched: remove unnecessary xchg()
in packet classifiers"); but the NULLifying was reintroduced with the
RCUification, but it's not correct for every classifier implementation.

In the cases that are fixed here with one exception of cls_cgroup, tp->root
object is allocated and initialized inside ->init() callback, which is always
performed at a point in time after we allocate a new tp, which means tp and
thus tp->root was not globally visible in the tp chain yet (see tc_ctl_tfilter()).
Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy()
handler, same for the tp which is kfree_rcu()'ed right when we return
from ->destroy() in tcf_destroy(). This means, the head object's lifetime
for such classifiers is always tied to the tp lifetime. The RCU callback
invocation for the two kfree_rcu() could be out of order, but that's fine
since both are independent.

Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here
means that 1) we don't need a useless NULL check in fast-path and, 2) that
outstanding readers of that tp in tc_classify() can still execute under
respect with RCU grace period as it is actually expected.

Things that haven't been touched here: cls_fw and cls_route. They each
handle tp->root being NULL in ->classify() path for historic reasons, so
their ->destroy() implementation can stay as is. If someone actually
cares, they could get cleaned up at some point to avoid the test in fast
path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a
!head should anyone actually be using/testing it, so it at least aligns with
cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable
destruction (to a sleepable context) after RCU grace period as concurrent
readers might still access it. (Note that in this case we need to hold module
reference to keep work callback address intact, since we only wait on module
unload for all call_rcu()s to finish.)

This fixes one race to bring RCU grace period guarantees back. Next step
as worked on by Cong however is to fix 1e052be6 ("net_sched: destroy
proto tp when all filters are gone") to get the order of unlinking the tp
in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving
RCU_INIT_POINTER() before tcf_destroy() and let the notification for
removal be done through the prior ->delete() callback. Both are independant
issues. Once we have that right, we can then clean tp->root up for a number
of classifiers by not making them RCU pointers, which requires a new callback
(->uninit) that is triggered from tp's RCU callback, where we just kfree()
tp->root from there.

Fixes: 1f947bf1 ("net: sched: rcu'ify cls_bpf")
Fixes: 9888faef ("net: sched: cls_basic use RCU")
Fixes: 70da9f0b ("net: sched: cls_flow use RCU")
Fixes: 77b9900e ("tc: introduce Flower classifier")
Fixes: bf3994d2 ("net/sched: introduce Match-all classifier")
Fixes: 952313bd ("net: sched: cls_cgroup use RCU")
Reported-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Roi Dayan <roid@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9363774

tipc: fix link statistics counter errors · 95901122

由 Jon Paul Maloy 提交于 11月 25, 2016

In commit e4bf4f76 ("tipc: simplify packet sequence number
handling") we changed the internal representation of the packet
sequence number counters from u32 to u16, reflecting what is really
sent over the wire.

Since then some link statistics counters have been displaying incorrect
values, partially because the counters meant to be used as sequence
number snapshots are now used as direct counters, stored as u32, and
partially because some counter updates are just missing in the code.

In this commit we correct this in two ways. First, we base the
displayed packet sent/received values on direct counters instead
of as previously a calculated difference between current sequence
number and a snapshot. Second, we add the missing updates of the
counters.

This change is compatible with the current netlink API, and requires
no changes to the user space tools.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95901122

net: dsa: fix fixed-link-phy device leaks · fd05d7b1

由 Johan Hovold 提交于 11月 24, 2016

Make sure to drop the reference taken by of_phy_find_device() when
registering and deregistering the fixed-link PHY-device.

Fixes: 39b0c705 ("net: dsa: Allow configuration of CPU & DSA port
speeds/duplex")
Signed-off-by: NJohan Hovold <johan@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd05d7b1

26 11月, 2016 4 次提交

tipc: resolve connection flow control compatibility problem · 6998cc6e

由 Jon Paul Maloy 提交于 11月 24, 2016

In commit 10724cc7 ("tipc: redesign connection-level flow control")
we replaced the previous message based flow control with one based on
1k blocks. In order to ensure backwards compatibility the mechanism
falls back to using message as base unit when it senses that the peer
doesn't support the new algorithm. The default flow control window,
i.e., how many units can be sent before the sender blocks and waits
for an acknowledge (aka advertisement) is 512. This was tested against
the previous version, which uses an acknowledge frequency of on ack per
256 received message, and found to work fine.

However, we missed the fact that versions older than Linux 3.15 use an
acknowledge frequency of 512, which is exactly the limit where a 4.6+
sender will stop and wait for acknowledge. This would also work fine if
it weren't for the fact that if the first sent message on a 4.6+ server
side is an empty SYNACK, this one is also is counted as a sent message,
while it is not counted as a received message on a legacy 3.15-receiver.
This leads to the sender always being one step ahead of the receiver, a
scenario causing the sender to block after 512 sent messages, while the
receiver only has registered 511 read messages. Hence, the legacy
receiver is not trigged to send an acknowledge, with a permanently
blocked sender as result.

We solve this deadlock by simply allowing the sender to send one more
message before it blocks, i.e., by a making minimal change to the
condition used for determining connection congestion.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6998cc6e

net: ethtool: don't require CAP_NET_ADMIN for ETHTOOL_GLINKSETTINGS · 8006f6bf

由 Miroslav Lichvar 提交于 11月 24, 2016

The ETHTOOL_GLINKSETTINGS command is deprecating the ETHTOOL_GSET
command and likewise it shouldn't require the CAP_NET_ADMIN capability.
Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8006f6bf

tipc: improve sanity check for received domain records · d876a4d2

由 Jon Paul Maloy 提交于 11月 23, 2016

In commit 35c55c98 ("tipc: add neighbor monitoring framework") we
added a data area to the link monitor STATE messages under the
assumption that previous versions did not use any such data area.

For versions older than Linux 4.3 this assumption is not correct. In
those version, all STATE messages sent out from a node inadvertently
contain a 16 byte data area containing a string; -a leftover from
previous RESET messages which were using this during the setup phase.
This string serves no purpose in STATE messages, and should no be there.

Unfortunately, this data area is delivered to the link monitor
framework, where a sanity check catches that it is not a correct domain
record, and drops it. It also issues a rate limited warning about the
event.

Since such events occur much more frequently than anticipated, we now
choose to remove the warning in order to not fill the kernel log with
useless contents. We also make the sanity check stricter, to further
reduce the risk that such data is inavertently admitted.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d876a4d2

tipc: fix compatibility bug in link monitoring · f7967556

由 Jon Paul Maloy 提交于 11月 23, 2016

commit 81729810 ("tipc: fix link priority propagation") introduced a
compatibility problem between TIPC versions newer than Linux 4.6 and
those older than Linux 4.4. In versions later than 4.4, link STATE
messages only contain a non-zero link priority value when the sender
wants the receiver to change its priority. This has the effect that the
receiver resets itself in order to apply the new priority. This works
well, and is consistent with the said commit.

However, in versions older than 4.4 a valid link priority is present in
all sent link STATE messages, leading to cyclic link establishment and
reset on the 4.6+ node.

We fix this by adding a test that the received value should not only
be valid, but also differ from the current value in order to cause the
receiving link endpoint to reset.
Reported-by: NAmar Nv <amar.nv005@gmail.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7967556

25 11月, 2016 3 次提交

net sched filters: fix filter handle ID in tfilter_notify_chain() · 19a8bb28

由 Roman Mashak 提交于 11月 22, 2016

Should pass valid filter handle, not the netlink flags.

Fixes: 30a391a1 ("net sched filters: pass netlink message flags in event notification")
Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Reported-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19a8bb28

udplite: call proper backlog handlers · 30c7be26

由 Eric Dumazet 提交于 11月 22, 2016

In commits 93821778 ("udp: Fix rcv socket locking") and
f7ad74fe ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into
__udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite
was forgotten.

This leads to crashes if UDPlite header is pulled twice, which happens
starting from commit e6afc8ac ("udp: remove headers from UDP packets
before queueing")

Bug found by syzkaller team, thanks a lot guys !

Note that backlog use in UDP/UDPlite is scheduled to be removed starting
from linux-4.10, so this patch is only needed up to linux-4.9

Fixes: 93821778 ("udp: Fix rcv socket locking")
Fixes: f7ad74fe ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb")
Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30c7be26

ipv6: bump genid when the IFA_F_TENTATIVE flag is clear · 764d3be6

由 Paolo Abeni 提交于 11月 22, 2016

When an ipv6 address has the tentative flag set, it can't be
used as source for egress traffic, while the associated route,
if any, can be looked up and even stored into some dst_cache.

In the latter scenario, the source ipv6 address selected and
stored in the cache is most probably wrong (e.g. with
link-local scope) and the entity using the dst_cache will
experience lack of ipv6 connectivity until said cache is
cleared or invalidated.

Overall this may cause lack of connectivity over most IPv6 tunnels
(comprising geneve and vxlan), if the first egress packet reaches
the tunnel before the DaD is completed for the used ipv6
address.

This patch bumps a new genid after that the IFA_F_TENTATIVE flag
is cleared, so that dst_cache will be invalidated on
next lookup and ipv6 connectivity restored.

Fixes: 0c1d70af ("net: use dst_cache for vxlan device")
Fixes: 468dfffc ("geneve: add dst caching support")
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

764d3be6

24 11月, 2016 2 次提交

net: revert "net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit" · a4cd0271

由 WANG Cong 提交于 11月 21, 2016

This reverts commit 7c6ae610, because l2tp_xmit_skb() never
returns NET_XMIT_CN, it ignores the return value of l2tp_xmit_core().

Cc: Gao Feng <gfree.wind@gmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4cd0271

rtnetlink: fix the wrong minimal dump size getting from rtnl_calcit() · 93af2056

由 Zhang Shengju 提交于 11月 22, 2016

For RT netlink, calcit() function should return the minimal size for
netlink dump message. This will make sure that dump message for every
network device can be stored.

Currently, rtnl_calcit() function doesn't account the size of header of
netlink message, this patch will fix it.
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93af2056

23 11月, 2016 4 次提交

can: bcm: fix support for CAN FD frames · 5499a6b2

由 Oliver Hartkopp 提交于 11月 23, 2016

Since commit 6f3b911d ("can: bcm: add support for CAN FD frames") the
CAN broadcast manager supports CAN and CAN FD data frames.

As these data frames are embedded in struct can[fd]_frames which have a
different length the access to the provided array of CAN frames became
dependend of op->cfsiz. By using a struct canfd_frame pointer for the array of
CAN frames the new offset calculation based on op->cfsiz was accidently applied
to CAN FD frame element lengths.

This fix makes the pointer to the arrays of the different CAN frame types a
void pointer so that the offset calculation in bytes accesses the correct CAN
frame elements.

Reference: http://marc.info/?l=linux-netdev&m=147980658909653Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Tested-by: NAndrey Konovalov <andreyknvl@google.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

5499a6b2

flowcache: Increase threshold for refusing new allocations · 6b226487

由 Miroslav Urbanek 提交于 11月 21, 2016

The threshold for OOM protection is too small for systems with large
number of CPUs. Applications report ENOBUFs on connect() every 10
minutes.

The problem is that the variable net->xfrm.flow_cache_gc_count is a
global counter while the variable fc->high_watermark is a per-CPU
constant. Take the number of CPUs into account as well.

Fixes: 6ad3122a ("flowcache: Avoid OOM condition under preasure")
Reported-by: NLukáš Koldrt <lk@excello.cz>
Tested-by: NJan Hejl <jh@excello.cz>
Signed-off-by: NMiroslav Urbanek <mu@miroslavurbanek.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

6b226487

Bluetooth: Fix using the correct source address type · 39385cb5

由 Johan Hedberg 提交于 11月 12, 2016

The hci_get_route() API is used to look up local HCI devices, however
so far it has been incapable of dealing with anything else than the
public address of HCI devices. This completely breaks with LE-only HCI
devices that do not come with a public address, but use a static
random address instead.

This patch exteds the hci_get_route() API with a src_type parameter
that's used for comparing with the right address of each HCI device.
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

39385cb5

flow_dissect: call init_default_flow_dissectors() earlier · c9b8af13

由 Eric Dumazet 提交于 11月 22, 2016

Andre Noll reported panics after my recent fix (commit 34fad54c
"net: __skb_flow_dissect() must cap its return value")

After some more headaches, Alexander root caused the problem to
init_default_flow_dissectors() being called too late, in case
a network driver like IGB is not a module and receives DHCP message
very early.

Fix is to call init_default_flow_dissectors() much earlier,
as it is a core infrastructure and does not depend on another
kernel service.

Fixes: 06635a35 ("flow_dissect: use programable dissector in skb_flow_dissect and friends")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NAndre Noll <maan@tuebingen.mpg.de>
Diagnosed-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9b8af13

22 11月, 2016 2 次提交

tcp: zero ca_priv area when switching cc algorithms · 7082c5c3

由 Florian Westphal 提交于 11月 21, 2016

We need to zero out the private data area when application switches
connection to different algorithm (TCP_CONGESTION setsockopt).

When congestion ops get assigned at connect time everything is already
zeroed because sk_alloc uses GFP_ZERO flag.  But in the setsockopt case
this contains whatever previous cc placed there.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7082c5c3

net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit · 7c6ae610

由 Gao Feng 提交于 11月 21, 2016

The tc could return NET_XMIT_CN as one congestion notification, but
it does not mean the packe is lost. Other modules like ipvlan,
macvlan, and others treat NET_XMIT_CN as success too.
So l2tp_eth_dev_xmit should add the NET_XMIT_CN check.
Signed-off-by: NGao Feng <gfree.wind@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c6ae610

20 11月, 2016 3 次提交

tipc: eliminate obsolete socket locking policy description · 51b9a31c

由 Jon Paul Maloy 提交于 11月 19, 2016

The comment block in socket.c describing the locking policy is
obsolete, and does not reflect current reality. We remove it in this
commit.

Since the current locking policy is much simpler and follows a
mainstream approach, we see no need to add a new description.
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51b9a31c

rtnl: fix the loop index update error in rtnl_dump_ifinfo() · 3f0ae05d

由 Zhang Shengju 提交于 11月 19, 2016

If the link is filtered out, loop index should also be updated. If not,
loop index will not be correct.

Fixes: dc599f76 ("net: Add support for filtering link dump by master device and kind")
Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f0ae05d

l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind() · 32c23116

由 Guillaume Nault 提交于 11月 18, 2016

Lock socket before checking the SOCK_ZAPPED flag in l2tp_ip6_bind().
Without lock, a concurrent call could modify the socket flags between
the sock_flag(sk, SOCK_ZAPPED) test and the lock_sock() call. This way,
a socket could be inserted twice in l2tp_ip6_bind_table. Releasing it
would then leave a stale pointer there, generating use-after-free
errors when walking through the list or modifying adjacent entries.

BUG: KASAN: use-after-free in l2tp_ip6_close+0x22e/0x290 at addr ffff8800081b0ed8
Write of size 8 by task syz-executor/10987
CPU: 0 PID: 10987 Comm: syz-executor Not tainted 4.8.0+ #39
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
 ffff880031d97838 ffffffff829f835b ffff88001b5a1640 ffff8800081b0ec0
 ffff8800081b15a0 ffff8800081b6d20 ffff880031d97860 ffffffff8174d3cc
 ffff880031d978f0 ffff8800081b0e80 ffff88001b5a1640 ffff880031d978e0
Call Trace:
 [<ffffffff829f835b>] dump_stack+0xb3/0x118 lib/dump_stack.c:15
 [<ffffffff8174d3cc>] kasan_object_err+0x1c/0x70 mm/kasan/report.c:156
 [<     inline     >] print_address_description mm/kasan/report.c:194
 [<ffffffff8174d666>] kasan_report_error+0x1f6/0x4d0 mm/kasan/report.c:283
 [<     inline     >] kasan_report mm/kasan/report.c:303
 [<ffffffff8174db7e>] __asan_report_store8_noabort+0x3e/0x40 mm/kasan/report.c:329
 [<     inline     >] __write_once_size ./include/linux/compiler.h:249
 [<     inline     >] __hlist_del ./include/linux/list.h:622
 [<     inline     >] hlist_del_init ./include/linux/list.h:637
 [<ffffffff8579047e>] l2tp_ip6_close+0x22e/0x290 net/l2tp/l2tp_ip6.c:239
 [<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
 [<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
 [<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
 [<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
 [<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
 [<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
 [<ffffffff813774f9>] task_work_run+0xf9/0x170
 [<ffffffff81324aae>] do_exit+0x85e/0x2a00
 [<ffffffff81326dc8>] do_group_exit+0x108/0x330
 [<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
 [<ffffffff811b49af>] do_signal+0x7f/0x18f0
 [<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
 [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
 [<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
 [<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Object at ffff8800081b0ec0, in cache L2TP/IPv6 size: 1448
Allocated:
PID = 10987
 [ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
 [ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
 [ 1116.897025] [<ffffffff8174c9ad>] kasan_kmalloc+0xad/0xe0
 [ 1116.897025] [<ffffffff8174cee2>] kasan_slab_alloc+0x12/0x20
 [ 1116.897025] [<     inline     >] slab_post_alloc_hook mm/slab.h:417
 [ 1116.897025] [<     inline     >] slab_alloc_node mm/slub.c:2708
 [ 1116.897025] [<     inline     >] slab_alloc mm/slub.c:2716
 [ 1116.897025] [<ffffffff817476a8>] kmem_cache_alloc+0xc8/0x2b0 mm/slub.c:2721
 [ 1116.897025] [<ffffffff84c4f6a9>] sk_prot_alloc+0x69/0x2b0 net/core/sock.c:1326
 [ 1116.897025] [<ffffffff84c58ac8>] sk_alloc+0x38/0xae0 net/core/sock.c:1388
 [ 1116.897025] [<ffffffff851ddf67>] inet6_create+0x2d7/0x1000 net/ipv6/af_inet6.c:182
 [ 1116.897025] [<ffffffff84c4af7b>] __sock_create+0x37b/0x640 net/socket.c:1153
 [ 1116.897025] [<     inline     >] sock_create net/socket.c:1193
 [ 1116.897025] [<     inline     >] SYSC_socket net/socket.c:1223
 [ 1116.897025] [<ffffffff84c4b46f>] SyS_socket+0xef/0x1b0 net/socket.c:1203
 [ 1116.897025] [<ffffffff85e4d685>] entry_SYSCALL_64_fastpath+0x23/0xc6
Freed:
PID = 10987
 [ 1116.897025] [<ffffffff811ddcb6>] save_stack_trace+0x16/0x20
 [ 1116.897025] [<ffffffff8174c736>] save_stack+0x46/0xd0
 [ 1116.897025] [<ffffffff8174cf61>] kasan_slab_free+0x71/0xb0
 [ 1116.897025] [<     inline     >] slab_free_hook mm/slub.c:1352
 [ 1116.897025] [<     inline     >] slab_free_freelist_hook mm/slub.c:1374
 [ 1116.897025] [<     inline     >] slab_free mm/slub.c:2951
 [ 1116.897025] [<ffffffff81748b28>] kmem_cache_free+0xc8/0x330 mm/slub.c:2973
 [ 1116.897025] [<     inline     >] sk_prot_free net/core/sock.c:1369
 [ 1116.897025] [<ffffffff84c541eb>] __sk_destruct+0x32b/0x4f0 net/core/sock.c:1444
 [ 1116.897025] [<ffffffff84c5aca4>] sk_destruct+0x44/0x80 net/core/sock.c:1452
 [ 1116.897025] [<ffffffff84c5ad33>] __sk_free+0x53/0x220 net/core/sock.c:1460
 [ 1116.897025] [<ffffffff84c5af23>] sk_free+0x23/0x30 net/core/sock.c:1471
 [ 1116.897025] [<ffffffff84c5cb6c>] sk_common_release+0x28c/0x3e0 ./include/net/sock.h:1589
 [ 1116.897025] [<ffffffff8579044e>] l2tp_ip6_close+0x1fe/0x290 net/l2tp/l2tp_ip6.c:243
 [ 1116.897025] [<ffffffff850b2dfd>] inet_release+0xed/0x1c0 net/ipv4/af_inet.c:415
 [ 1116.897025] [<ffffffff851dc5a0>] inet6_release+0x50/0x70 net/ipv6/af_inet6.c:422
 [ 1116.897025] [<ffffffff84c4581d>] sock_release+0x8d/0x1d0 net/socket.c:570
 [ 1116.897025] [<ffffffff84c45976>] sock_close+0x16/0x20 net/socket.c:1017
 [ 1116.897025] [<ffffffff817a108c>] __fput+0x28c/0x780 fs/file_table.c:208
 [ 1116.897025] [<ffffffff817a1605>] ____fput+0x15/0x20 fs/file_table.c:244
 [ 1116.897025] [<ffffffff813774f9>] task_work_run+0xf9/0x170
 [ 1116.897025] [<ffffffff81324aae>] do_exit+0x85e/0x2a00
 [ 1116.897025] [<ffffffff81326dc8>] do_group_exit+0x108/0x330
 [ 1116.897025] [<ffffffff81348cf7>] get_signal+0x617/0x17a0 kernel/signal.c:2307
 [ 1116.897025] [<ffffffff811b49af>] do_signal+0x7f/0x18f0
 [ 1116.897025] [<ffffffff810039bf>] exit_to_usermode_loop+0xbf/0x150 arch/x86/entry/common.c:156
 [ 1116.897025] [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
 [ 1116.897025] [<ffffffff81006060>] syscall_return_slowpath+0x1a0/0x1e0 arch/x86/entry/common.c:259
 [ 1116.897025] [<ffffffff85e4d726>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Memory state around the buggy address:
 ffff8800081b0d80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8800081b0e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8800081b0e80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                    ^
 ffff8800081b0f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8800081b0f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

==================================================================

The same issue exists with l2tp_ip_bind() and l2tp_ip_bind_table.

Fixes: c51ce497 ("l2tp: fix oops in L2TP IP sockets for connect() AF_UNSPEC case")
Reported-by: NBaozeng Ding <sploving1@gmail.com>
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Tested-by: NBaozeng Ding <sploving1@gmail.com>
Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32c23116

19 11月, 2016 2 次提交

rtnetlink: fix FDB size computation · f82ef3e1

由 Sabrina Dubroca 提交于 11月 18, 2016

Add missing NDA_VLAN attribute's size.

Fixes: 1e53d5bb ("net: Pass VLAN ID to rtnl_fdb_notify.")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f82ef3e1

af_unix: conditionally use freezable blocking calls in read · 06a77b07

由 WANG Cong 提交于 11月 17, 2016

Commit 2b15af6f ("af_unix: use freezable blocking calls in read")
converts schedule_timeout() to its freezable version, it was probably
correct at that time, but later, commit 2b514574
("net: af_unix: implement splice for stream af_unix sockets") breaks
the strong requirement for a freezable sleep, according to
commit 0f9548ca:

    We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
    deadlock if the lock is later acquired in the suspend or hibernate path
    (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
    cgroup_freezer if a lock is held inside a frozen cgroup that is later
    acquired by a process outside that group.

The pipe_lock is still held at that point.

So use freezable version only for the recvmsg call path, avoid impact for
Android.

Fixes: 2b514574 ("net: af_unix: implement splice for stream af_unix sockets")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Colin Cross <ccross@android.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06a77b07

18 11月, 2016 5 次提交

cfg80211: limit scan results cache size · 9853a55e

由 Johannes Berg 提交于 11月 15, 2016

It's possible to make scanning consume almost arbitrary amounts
of memory, e.g. by sending beacon frames with random BSSIDs at
high rates while somebody is scanning.

Limit the number of BSS table entries we're willing to cache to
1000, limiting maximum memory usage to maybe 4-5MB, but lower
in practice - that would be the case for having both full-sized
beacon and probe response frames for each entry; this seems not
possible in practice, so a limit of 1000 entries will likely be
closer to 0.5 MB.

Cc: stable@vger.kernel.org
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

9853a55e

xfrm: unbreak xfrm_sk_policy_lookup · 330e832a

由 Florian Westphal 提交于 11月 17, 2016

if we succeed grabbing the refcount, then
  if (err && !xfrm_pol_hold_rcu)

will evaluate to false so this hits last else branch which then
sets policy to ERR_PTR(0).

Fixes: ae33786f ("xfrm: policy: only use rcu in xfrm_sk_policy_lookup")
Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

330e832a

net sched filters: pass netlink message flags in event notification · 30a391a1

由 Roman Mashak 提交于 11月 16, 2016

Userland client should be able to read an event, and reflect it back to
the kernel, therefore it needs to extract complete set of netlink flags.

For example, this will allow "tc monitor" to distinguish Add and Replace
operations.
Signed-off-by: NRoman Mashak <mrv@mojatatu.com>
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30a391a1

ip6_tunnel: disable caching when the traffic class is inherited · b5c2d495

由 Paolo Abeni 提交于 11月 16, 2016

If an ip6 tunnel is configured to inherit the traffic class from
the inner header, the dst_cache must be disabled or it will foul
the policy routing.

The issue is apprently there since at leat Linux-2.6.12-rc2.
Reported-by: NLiam McBirnie <liam.mcbirnie@boeing.com>
Cc: Liam McBirnie <liam.mcbirnie@boeing.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5c2d495

net: check dead netns for peernet2id_alloc() · cfc44a4d

由 WANG Cong 提交于 11月 16, 2016

Andrei reports we still allocate netns ID from idr after we destroy
it in cleanup_net().

cleanup_net():
  ...
  idr_destroy(&net->netns_ids);
  ...
  list_for_each_entry_reverse(ops, &pernet_list, list)
    ops_exit_list(ops, &net_exit_list);
      -> rollback_registered_many()
        -> rtmsg_ifinfo_build_skb()
         -> rtnl_fill_ifinfo()
           -> peernet2id_alloc()

After that point we should not even access net->netns_ids, we
should check the death of the current netns as early as we can in
peernet2id_alloc().

For net-next we can consider to avoid sending rtmsg totally,
it is a good optimization for netns teardown path.

Fixes: 0c7aecd4 ("netns: add rtnl cmd to add and get peer netns ids")
Reported-by: NAndrei Vagin <avagin@gmail.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NAndrei Vagin <avagin@openvz.org>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfc44a4d

17 11月, 2016 3 次提交

xattr: Fix setting security xattrs on sockfs · 4a590153

由 Andreas Gruenbacher 提交于 11月 13, 2016

The IOP_XATTR flag is set on sockfs because sockfs supports getting the
"system.sockprotoname" xattr.  Since commit 6c6ef9f2, this flag is checked for
setxattr support as well.  This is wrong on sockfs because security xattr
support there is supposed to be provided by security_inode_setsecurity.  The
smack security module relies on socket labels (xattrs).

Fix this by adding a security xattr handler on sockfs that returns
-EAGAIN, and by checking for -EAGAIN in setxattr.

We cannot simply check for -EOPNOTSUPP in setxattr because there are
filesystems that neither have direct security xattr support nor support
via security_inode_setsecurity.  A more proper fix might be to move the
call to security_inode_setsecurity into sockfs, but it's not clear to me
if that is safe: we would end up calling security_inode_post_setxattr after
that as well.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4a590153

ipv4: Fix memory leak in exception case for splitting tries · 3114cdfe

由 Alexander Duyck 提交于 11月 15, 2016

Fix a small memory leak that can occur where we leak a fib_alias in the
event of us not being able to insert it into the local table.

Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse")
Reported-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3114cdfe

ipv4: Restore fib_trie_flush_external function and fix call ordering · 3b709334

由 Alexander Duyck 提交于 11月 15, 2016

The patch that removed the FIB offload infrastructure was a bit too
aggressive and also removed code needed to clean up us splitting the table
if additional rules were added.  Specifically the function
fib_trie_flush_external was called at the end of a new rule being added to
flush the foreign trie entries from the main trie.

I updated the code so that we only call fib_trie_flush_external on the main
table so that we flush the entries for local from main.  This way we don't
call it for every rule change which is what was happening previously.

Fixes: 347e3b28 ("switchdev: remove FIB offload infrastructure")
Reported-by: NEric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b709334

16 11月, 2016 4 次提交

rtnetlink: fix rtnl message size computation for XDP · b3cfaa31

由 Sabrina Dubroca 提交于 11月 15, 2016

rtnl_xdp_size() only considers the size of the actual payload attribute,
and misses the space taken by the attribute used for nesting (IFLA_XDP).

Fixes: d1fdd913 ("rtnl: add option for setting link xdp prog")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Reviewed-by: NBrenden Blanco <bblanco@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3cfaa31

rtnetlink: fix rtnl_vfinfo_size · 7e75f74a

由 Sabrina Dubroca 提交于 11月 15, 2016

The size reported by rtnl_vfinfo_size doesn't match the space used by
rtnl_fill_vfinfo.

rtnl_vfinfo_size currently doesn't account for the nest attributes
used by statistics (added in commit 3b766cd8), nor for struct
ifla_vf_tx_rate (since commit ed616689, which added ifla_vf_rate
to the dump without removing ifla_vf_tx_rate, but replaced
ifla_vf_tx_rate with ifla_vf_rate in the size computation).

Fixes: 3b766cd8 ("net/core: Add reading VF statistics through the PF netdevice")
Fixes: ed616689 ("net-next:v4: Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e75f74a

udp: restore UDPlite many-cast delivery · 73e2d5e3

由 Pablo Neira 提交于 11月 14, 2016

Honor udptable parameter that is passed to __udp*_lib_mcast_deliver(),
otherwise udplite broadcast/multicast use the wrong table and it breaks.

Fixes: 2dc41cff ("udp: Use hash2 for long hash1 chains in __udp*_lib_mcast_deliver.")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73e2d5e3

igmp: do not remove igmp souce list info when set link down · 24803f38

由 Hangbin Liu 提交于 11月 14, 2016

In commit 24cf3af3 ("igmp: call ip_mc_clear_src..."), we forgot to remove
igmpv3_clear_delrec() in ip_mc_down(), which also called ip_mc_clear_src().
This make us clear all IGMPv3 source filter info after NETDEV_DOWN.
Move igmpv3_clear_delrec() to ip_mc_destroy_dev() and then no need
ip_mc_clear_src() in ip_mc_destroy_dev().

On the other hand, we should restore back instead of free all source filter
info in igmpv3_del_delrec(). Or we will not able to restore IGMPv3 source
filter info after NETDEV_UP and NETDEV_POST_TYPE_CHANGE.

Fixes: 24cf3af3 ("igmp: call ip_mc_clear_src() only when ...")
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24803f38

15 11月, 2016 1 次提交

mac80211: fix A-MSDU aggregation with fast-xmit + txq · a786f96d

由 Felix Fietkau 提交于 11月 04, 2016

A-MSDU aggregation alters the QoS header after a frame has been
enqueued, so it needs to be ready before enqueue and not overwritten
again afterwards

Fixes: bb42f2d1 ("mac80211: Move reorder-sensitive TX handlers to after TXQ dequeue")
Signed-off-by: NFelix Fietkau <nbd@nbd.name>
Acked-by: NToke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

a786f96d

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功