提交 · 5f74f82ea34c0da80ea0b49192bb5ea06e063593 · openeuler / raspberrypi-kernel

09 2月, 2016 2 次提交

由 Hans Westgaard Ry 提交于 2月 03, 2016

Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.
Signed-off-by: NHans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: NHåkon Bugge <haakon.bugge@oracle.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f74f82e

tcp: do not drop syn_recv on all icmp reports · 9cf74903

由 Eric Dumazet 提交于 2月 02, 2016

Petr Novopashenniy reported that ICMP redirects on SYN_RECV sockets
were leading to RST.

This is of course incorrect.

A specific list of ICMP messages should be able to drop a SYN_RECV.

For instance, a REDIRECT on SYN_RECV shall be ignored, as we do
not hold a dst per SYN_RECV pseudo request.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111751
Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
Reported-by: NPetr Novopashenniy <pety@rusnet.ru>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9cf74903

08 2月, 2016 2 次提交

ipv6: fix a lockdep splat · 44c3d0c1

由 Eric Dumazet 提交于 2月 02, 2016

Silence lockdep false positive about rcu_dereference() being
used in the wrong context.

First one should use rcu_dereference_protected() as we own the spinlock.

Second one should be a normal assignation, as no barrier is needed.

Fixes: 18367681 ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
Reported-by: NDave Jones <davej@codemonkey.org.uk>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44c3d0c1

unix: correctly track in-flight fds in sending process user_struct · 415e3d3e

由 Hannes Frederic Sowa 提交于 2月 03, 2016

The commit referenced in the Fixes tag incorrectly accounted the number
of in-flight fds over a unix domain socket to the original opener
of the file-descriptor. This allows another process to arbitrary
deplete the original file-openers resource limit for the maximum of
open files. Instead the sending processes and its struct cred should
be credited.

To do so, we add a reference counted struct user_struct pointer to the
scm_fp_list and use it to account for the number of inflight unix fds.

Fixes: 712f4aad ("unix: properly account for FDs passed over unix sockets")
Reported-by: NDavid Herrmann <dh.herrmann@gmail.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

415e3d3e

06 2月, 2016 1 次提交

ipv6: addrconf: Fix recursive spin lock call · 16186a82

由 subashab@codeaurora.org 提交于 2月 02, 2016

A rcu stall with the following backtrace was seen on a system with
forwarding, optimistic_dad and use_optimistic set. To reproduce,
set these flags and allow ipv6 autoconf.

This occurs because the device write_lock is acquired while already
holding the read_lock. Back trace below -

INFO: rcu_preempt self-detected stall on CPU { 1}  (t=2100 jiffies
 g=3992 c=3991 q=4471)
<6> Task dump for CPU 1:
<2> kworker/1:0     R  running task    12168    15   2 0x00000002
<2> Workqueue: ipv6_addrconf addrconf_dad_work
<6> Call trace:
<2> [<ffffffc000084da8>] el1_irq+0x68/0xdc
<2> [<ffffffc000cc4e0c>] _raw_write_lock_bh+0x20/0x30
<2> [<ffffffc000bc5dd8>] __ipv6_dev_ac_inc+0x64/0x1b4
<2> [<ffffffc000bcbd2c>] addrconf_join_anycast+0x9c/0xc4
<2> [<ffffffc000bcf9f0>] __ipv6_ifa_notify+0x160/0x29c
<2> [<ffffffc000bcfb7c>] ipv6_ifa_notify+0x50/0x70
<2> [<ffffffc000bd035c>] addrconf_dad_work+0x314/0x334
<2> [<ffffffc0000b64c8>] process_one_work+0x244/0x3fc
<2> [<ffffffc0000b7324>] worker_thread+0x2f8/0x418
<2> [<ffffffc0000bb40c>] kthread+0xe0/0xec

v2: do addrconf_dad_kick inside read lock and then acquire write
lock for ipv6_ifa_notify as suggested by Eric

Fixes: 7fd2561e ("net: ipv6: Add a sysctl to make optimistic
addresses useful candidates")

Cc: Eric Dumazet <edumazet@google.com>
Cc: Erik Kline <ek@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16186a82

30 1月, 2016 10 次提交

tcp: avoid cwnd undo after receiving ECN · 99b4dd9f

由 Yuchung Cheng 提交于 1月 29, 2016

RFC 4015 section 3.4 says the TCP sender MUST refrain from
reversing the congestion control state when the ACK signals
congestion through the ECN-Echo flag. Currently we may not
always do that when prior_ssthresh is reset upon receiving
ACKs with ECE marks. This patch fixes that.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99b4dd9f

irda: fix a potential use-after-free in ircomm_param_request · 3d45296a

由 WANG Cong 提交于 1月 29, 2016

self->ctrl_skb is protected by self->spinlock, we should not
access it out of the lock. Move the debugging printk inside.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d45296a

ipv6/udp: use sticky pktinfo egress ifindex on connect() · 1cdda918

由 Paolo Abeni 提交于 1月 29, 2016

Currently, the egress interface index specified via IPV6_PKTINFO
is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
can be subverted when the user space application calls connect()
before sendmsg().
Fix it by initializing properly flowi6_oif in connect() before
performing the route lookup.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1cdda918

ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail() · 6f21c96a

由 Paolo Abeni 提交于 1月 29, 2016

The current implementation of ip6_dst_lookup_tail basically
ignore the egress ifindex match: if the saddr is set,
ip6_route_output() purposefully ignores flowi6_oif, due
to the commit d46a9d67 ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
flag if saddr set"), if the saddr is 'any' the first route lookup
in ip6_dst_lookup_tail fails, but upon failure a second lookup will
be performed with saddr set, thus ignoring the ifindex constraint.

This commit adds an output route lookup function variant, which
allows the caller to specify lookup flags, and modify
ip6_dst_lookup_tail() to enforce the ifindex match on the second
lookup via said helper.

ip6_route_output() becames now a static inline function build on
top of ip6_route_output_flags(); as a side effect, out-of-tree
modules need now a GPL license to access the output route lookup
functionality.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f21c96a

netlink: not trim skb for mmaped socket when dump · aa3a0220

由 Ken-ichirou MATSUZAWA 提交于 1月 29, 2016

We should not trim skb for mmaped socket since its buf size is fixed
and userspace will read as frame which data equals head. mmaped
socket will not call recvmsg, means max_recvmsg_len is 0,
skb_reserve was not called before commit: db65a3aa.

Fixes: db65a3aa (netlink: Trim skb to alloc size to avoid MSG_TRUNC)
Signed-off-by: NKen-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa3a0220

fib_trie: Fix shift by 32 in fib_table_lookup · a5829f53

由 Alexander Duyck 提交于 1月 28, 2016

The fib_table_lookup function had a shift by 32 that triggered a UBSAN
warning. This was due to the fact that I had placed the shift first and
then followed it with the check for the suffix length to ignore the
undefined behavior. If we reorder this so that we verify the suffix is
less than 32 before shifting the value we can avoid the issue.
Reported-by: NToralf Förster <toralf.foerster@gmx.de>
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5829f53

ipv4: ipconfig: avoid unused ic_proto_used symbol · 52b79e2b

由 Arnd Bergmann 提交于 1月 28, 2016

When CONFIG_PROC_FS, CONFIG_IP_PNP_BOOTP, CONFIG_IP_PNP_DHCP and
CONFIG_IP_PNP_RARP are all disabled, we get a warning about the
ic_proto_used variable being unused:

net/ipv4/ipconfig.c:146:12: error: 'ic_proto_used' defined but not used [-Werror=unused-variable]

This avoids the warning, by making the definition conditional on
whether a dynamic IP configuration protocol is configured. If not,
we know that the value is always zero, so we can optimize away the
variable and all code that depends on it.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52b79e2b

net_sched: drr: check for NULL pointer in drr_dequeue · df3eb6cd

由 Bernie Harris 提交于 1月 28, 2016

There are cases where qdisc_dequeue_peeked can return NULL, and the result
is dereferenced later on in the function.

Similarly to the other qdisc dequeue functions, check whether the skb
pointer is NULL and if it is, goto out.
Signed-off-by: NBernie Harris <bernie.harris@alliedtelesis.co.nz>
Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df3eb6cd

tipc: fix connection abort during subscription cancel · 4d5cfcba

由 Parthasarathy Bhuvaragan 提交于 1月 27, 2016

In 'commit 7fe8097c ("tipc: fix nullpointer bug when subscribing
to events")', we terminate the connection if the subscription
creation fails.
In the same commit, the subscription creation result was based on
the value of the subscription pointer (set in the function) instead
of the return code.

Unfortunately, the same function tipc_subscrp_create() handles
subscription cancel request. For a subscription cancellation request,
the subscription pointer cannot be set. Thus if a subscriber has
several subscriptions and cancels any of them, the connection is
terminated.

In this commit, we terminate the connection based on the return value
of tipc_subscrp_create().
Fixes: commit 7fe8097c ("tipc: fix nullpointer bug when subscribing to events")
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4d5cfcba

ipv4: early demux should be aware of fragments · 63e51b6a

由 Eric Dumazet 提交于 1月 26, 2016

We should not assume a valid protocol header is present,
as this is not the case for IPv4 fragments.

Lets avoid extra cache line misses and potential bugs
if we actually find a socket and incorrectly uses its dst.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63e51b6a

29 1月, 2016 11 次提交

Bluetooth: Fix incorrect removing of IRKs · cff10ce7

由 Johan Hedberg 提交于 1月 26, 2016

The commit cad20c27 was supposed to
fix handling of devices first using public addresses and then
switching to RPAs after pairing. Unfortunately it missed a couple of
key places in the code.

1. When evaluating which devices should be removed from the existing
white list we also need to consider whether we have an IRK for them or
not, i.e. a call to hci_find_irk_by_addr() is needed.

2. In smp_notify_keys() we should not be requiring the knowledge of
the RPA, but should simply keep the IRK around if the other conditions
require it.
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 4.4+

cff10ce7

Bluetooth: L2CAP: Fix setting chan src info before adding PSM/CID · a2342c5f

由 Johan Hedberg 提交于 1月 26, 2016

At least the l2cap_add_psm() routine depends on the source address
type being properly set to know what auto-allocation ranges to use, so
the assignment to l2cap_chan needs to happen before this.
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

a2342c5f

Bluetooth: L2CAP: Fix auto-allocating LE PSM values · 92594a51

由 Johan Hedberg 提交于 1月 26, 2016

The LE dynamic PSM range is different from BR/EDR (0x0080 - 0x00ff)
and doesn't have requirements relating to parity, so separate checks
are needed.
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

92594a51

Bluetooth: L2CAP: Introduce proper defines for PSM ranges · 114f9f1e

由 Johan Hedberg 提交于 1月 26, 2016

Having proper defines makes the code a bit readable, it also avoids
duplicating hard-coded values since these are also needed when
auto-allocating PSM values (in a subsequent patch).
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

114f9f1e

tcp: beware of alignments in tcp_get_info() · ff5d7497

由 Eric Dumazet 提交于 1月 27, 2016

With some combinations of user provided flags in netlink command,
it is possible to call tcp_get_info() with a buffer that is not 8-bytes
aligned.

It does matter on some arches, so we need to use put_unaligned() to
store the u64 fields.

Current iproute2 package does not trigger this particular issue.

Fixes: 0df48c26 ("tcp: add tcpi_bytes_acked to tcp_info")
Fixes: 977cb0ec ("tcp: add pacing_rate information into tcp_info")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff5d7497

switchdev: Require RTNL mutex to be held when sending FDB notifications · 4f2c6ae5

由 Ido Schimmel 提交于 1月 27, 2016

When switchdev drivers process FDB notifications from the underlying
device they resolve the netdev to which the entry points to and notify
the bridge using the switchdev notifier.

However, since the RTNL mutex is not held there is nothing preventing
the netdev from disappearing in the middle, which will cause
br_switchdev_event() to dereference a non-existing netdev.

Make switchdev drivers hold the lock at the beginning of the
notification processing session and release it once it ends, after
notifying the bridge.

Also, remove switchdev_mutex and fdb_lock, as they are no longer needed
when RTNL mutex is held.

Fixes: 03bf0c28 ("switchdev: introduce switchdev notifier")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f2c6ae5

tcp: fix tcp_mark_head_lost to check skb len before fragmenting · d88270ee

由 Neal Cardwell 提交于 1月 25, 2016

This commit fixes a corner case in tcp_mark_head_lost() which was
causing the WARN_ON(len > skb->len) in tcp_fragment() to fire.

tcp_mark_head_lost() was assuming that if a packet has
tcp_skb_pcount(skb) of N, then it's safe to fragment off a prefix of
M*mss bytes, for any M < N. But with the tricky way TCP pcounts are
maintained, this is not always true.

For example, suppose the sender sends 4 1-byte packets and have the
last 3 packet sacked. It will merge the last 3 packets in the write
queue into an skb with pcount = 3 and len = 3 bytes. If another
recovery happens after a sack reneging event, tcp_mark_head_lost()
may attempt to split the skb assuming it has more than 2*MSS bytes.

This sounds very counterintuitive, but as the commit description for
the related commit c0638c24 ("tcp: don't fragment SACKed skbs in
tcp_mark_head_lost()") notes, this is because tcp_shifted_skb()
coalesces adjacent regions of SACKed skbs, and when doing this it
preserves the sum of their packet counts in order to reflect the
real-world dynamics on the wire. The c0638c24 commit tried to
avoid problems by not fragmenting SACKed skbs, since SACKed skbs are
where the non-proportionality between pcount and skb->len/mss is known
to be possible. However, that commit did not handle the case where
during a reneging event one of these weird SACKed skbs becomes an
un-SACKed skb, which tcp_mark_head_lost() can then try to fragment.

The fix is to simply mark the entire skb lost when this happens.
This makes the recovery slightly more aggressive in such corner
cases before we detect reordering. But once we detect reordering
this code path is by-passed because FACK is disabled.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d88270ee

inet: frag: Always orphan skbs inside ip_defrag() · 8282f274

由 Joe Stringer 提交于 1月 22, 2016

Later parts of the stack (including fragmentation) expect that there is
never a socket attached to frag in a frag_list, however this invariant
was not enforced on all defrag paths. This could lead to the
BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the
end of this commit message.

While the call could be added to openvswitch to fix this particular
error, the head and tail of the frags list are already orphaned
indirectly inside ip_defrag(), so it seems like the remaining fragments
should all be orphaned in all circumstances.

kernel BUG at net/ipv4/ip_output.c:586!
[...]
Call Trace:
 <IRQ>
 [<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch]
 [<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch]
 [<ffffffff81667830>] ? dst_discard_out+0x20/0x20
 [<ffffffff81667810>] ? dst_ifdown+0x80/0x80
 [<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch]
 [<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack]
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch]
 [<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch]
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch]
 [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch]
 [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
 [<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch]
 [<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch]
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
 [<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch]
 [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
 [<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0
 [<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0
 [<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930
 [<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930
 [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
 [<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220
 [<ffffffff81661060>] dev_queue_xmit+0x10/0x20
 [<ffffffff816698e8>] neigh_resolve_output+0x178/0x220
 [<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590
 [<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590
 [<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590
 [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
 [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
 [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
 [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
 [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
 [<ffffffff816ab4c0>] ip_output+0x70/0x110
 [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
 [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
 [<ffffffff816abf89>] ip_send_skb+0x19/0x40
 [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
 [<ffffffff816df21a>] icmp_push_reply+0xea/0x120
 [<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230
 [<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70
 [<ffffffff816dfa06>] icmp_echo+0x36/0x70
 [<ffffffff816e0d11>] icmp_rcv+0x271/0x450
 [<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0
 [<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0
 [<ffffffff816a5160>] ip_local_deliver+0x60/0xd0
 [<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560
 [<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560
 [<ffffffff816a5453>] ip_rcv+0x283/0x3e0
 [<ffffffff810b6302>] ? match_held_lock+0x192/0x200
 [<ffffffff816a4620>] ? inet_del_offload+0x40/0x40
 [<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0
 [<ffffffff8165e68e>] ? process_backlog+0x8e/0x230
 [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
 [<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60
 [<ffffffff8165e678>] process_backlog+0x78/0x230
 [<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230
 [<ffffffff8165e355>] net_rx_action+0x155/0x400
 [<ffffffff8106b48c>] __do_softirq+0xcc/0x420
 [<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590
 [<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30
 <EOI>
 [<ffffffff8106b88e>] do_softirq+0x4e/0x60
 [<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0
 [<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590
 [<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0
 [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
 [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
 [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
 [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
 [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
 [<ffffffff816ab4c0>] ip_output+0x70/0x110
 [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
 [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
 [<ffffffff816abf89>] ip_send_skb+0x19/0x40
 [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
 [<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0
 [<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0
 [<ffffffff8163e398>] sock_sendmsg+0x38/0x50
 [<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270
 [<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320
 [<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40
 [<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460
 [<ffffffff81204886>] ? __fget_light+0x66/0x90
 [<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80
 [<ffffffff8163f932>] SyS_sendmsg+0x12/0x20
 [<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f
Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f
 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
RIP  [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0
 RSP <ffff88006d603170>

Fixes: 7f8a436e ("openvswitch: Add conntrack action")
Signed-off-by: NJoe Stringer <joe@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8282f274

sctp: remove the dead field of sctp_transport · 47faa1e4

由 Xin Long 提交于 1月 22, 2016

After we use refcnt to check if transport is alive, the dead can be
removed from sctp_transport.

The traversal of transport_addr_list in procfs dump is using
list_for_each_entry_rcu, no need to check if it has been freed.

sctp_generate_t3_rtx_event and sctp_generate_heartbeat_event is
protected by sock lock, it's not necessary to check dead, either.
also, the timers are cancelled when sctp_transport_free() is
called, that it doesn't wait for refcnt to reach 0 to cancel them.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47faa1e4

sctp: hold transport before we access t->asoc in sctp proc · fba4c330

由 Xin Long 提交于 1月 22, 2016

Previously, before rhashtable, /proc assoc listing was done by
read-locking the entire hash entry and dumping all assocs at once, so we
were sure that the assoc wasn't freed because it wouldn't be possible to
remove it from the hash meanwhile.

Now we use rhashtable to list transports, and dump entries one by one.
That is, now we have to check if the assoc is still a good one, as the
transport we got may be being freed.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fba4c330

sctp: fix the transport dead race check by using atomic_add_unless on refcnt · 1eed6779

由 Xin Long 提交于 1月 22, 2016

Now when __sctp_lookup_association is running in BH, it will try to
check if t->dead is set, but meanwhile other CPUs may be freeing this
transport and this assoc and if it happens that
__sctp_lookup_association checked t->dead a bit too early, it may think
that the association is still good while it was already freed.

So we fix this race by using atomic_add_unless in sctp_transport_hold.
After we get one transport from hashtable, we will hold it only when
this transport's refcnt is not 0, so that we can make sure t->asoc
cannot be freed before we hold the asoc again.

Note that sctp association is not freed using RCU so we can't use
atomic_add_unless() with it as it may just be too late for that either.

Fixes: 4f008781 ("sctp: apply rhashtable api to send/recv path")
Reported-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1eed6779

26 1月, 2016 4 次提交

rfkill: fix rfkill_fop_read wait_event usage · 6736fde9

由 Johannes Berg 提交于 1月 26, 2016

The code within wait_event_interruptible() is called with
!TASK_RUNNING, so mustn't call any functions that can sleep,
like mutex_lock().

Since we re-check the list_empty() in a loop after the wait,
it's safe to simply use list_empty() without locking.

This bug has existed forever, but was only discovered now
because all userspace implementations, including the default
'rfkill' tool, use poll() or select() to get a readable fd
before attempting to read.

Cc: stable@vger.kernel.org
Fixes: c64fb016 ("rfkill: create useful userspace interface")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

6736fde9

mac80211: Requeue work after scan complete for all VIF types. · 4fa11ec7

由 Sachin Kulkarni 提交于 1月 12, 2016

During a sw scan ieee80211_iface_work ignores work items for all vifs.
However after the scan complete work is requeued only for STA, ADHOC
and MESH iftypes.

This occasionally results in event processing getting delayed/not
processed for iftype AP when it coexists with a STA. This can result
in data halt and eventually disconnection on the AP interface.

Cc: stable@vger.kernel.org
Signed-off-by: NSachin Kulkarni <Sachin.Kulkarni@imgtec.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

4fa11ec7

sit: set rtnl_link_ops before calling register_netdevice · 87e57399

由 Thadeu Lima de Souza Cascardo 提交于 1月 25, 2016

When creating a SIT tunnel with ip tunnel, rtnl_link_ops is not set before
ipip6_tunnel_create is called. When register_netdevice is called, there is
no linkinfo attribute in the NEWLINK message because of that.

Setting rtnl_link_ops before calling register_netdevice fixes that.
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87e57399

ipv4+ipv6: Make INET*_ESP select CRYPTO_ECHAINIV · 32b6170c

由 Thomas Egerer 提交于 1月 25, 2016

The ESP algorithms using CBC mode require echainiv. Hence INET*_ESP have
to select CRYPTO_ECHAINIV in order to work properly. This solves the
issues caused by a misconfiguration as described in [1].
The original approach, patching crypto/Kconfig was turned down by
Herbert Xu [2].

[1] https://lists.strongswan.org/pipermail/users/2015-December/009074.html
[2] http://marc.info/?l=linux-crypto-vger&m=145224655809562&w=2Signed-off-by: NThomas Egerer <hakke_007@gmx.de>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

32b6170c

25 1月, 2016 2 次提交

sctp: allow setting SCTP_SACK_IMMEDIATELY by the application · 27f7ed2b

由 Marcelo Ricardo Leitner 提交于 1月 22, 2016

This patch extends commit b93d6471 ("sctp: implement the sender side
for SACK-IMMEDIATELY extension") as it didn't white list
SCTP_SACK_IMMEDIATELY on sctp_msghdr_parse(), causing it to be
understood as an invalid flag and returning -EINVAL to the application.

Note that the actual handling of the flag is already there in
sctp_datamsg_from_user().

https://tools.ietf.org/html/rfc7053#section-7

Fixes: b93d6471 ("sctp: implement the sender side for SACK-IMMEDIATELY extension")
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27f7ed2b

af_unix: fix struct pid memory leak · fa0dc04d

由 Eric Dumazet 提交于 1月 24, 2016

Dmitry reported a struct pid leak detected by a syzkaller program.

Bug happens in unix_stream_recvmsg() when we break the loop when a
signal is pending, without properly releasing scm.

Fixes: b3ca9b02 ("net: fix multithreaded signal handling in unix recv routines")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa0dc04d

23 1月, 2016 4 次提交

Bluetooth: 6lowpan: Fix handling of uncompressed IPv6 packets · 87f5fedb

由 Lukasz Duda 提交于 1月 13, 2016

This patch fixes incorrect handling of the 6lowpan packets that contain
uncompressed IPv6 header.

RFC4944 specifies a special dispatch for 6lowpan to carry uncompressed
IPv6 header. This dispatch (1 byte long) has to be removed during
reception and skb data pointer has to be moved. To correctly point in
the beginning of the IPv6 header the dispatch byte has to be pulled off
before packet can be processed by netif_rx_in().

Test scenario: IPv6 packets are not correctly interpreted by the network
layer when IPv6 header is not compressed (e.g. ICMPv6 Echo Reply is not
propagated correctly to the ICMPv6 layer because the extra byte will make
the header look corrupted).

Similar approach is done for IEEE 802.15.4.
Signed-off-by: NLukasz Duda <lukasz.duda@nordicsemi.no>
Signed-off-by: NGlenn Ruben Bakke <glenn.ruben.bakke@nordicsemi.no>
Acked-by: NJukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org # 4.4+

87f5fedb

Bluetooth: 6lowpan: Fix kernel NULL pointer dereferences · 4c58f328

由 Glenn Ruben Bakke 提交于 1月 13, 2016

The fixes provided in this patch assigns a valid net_device structure to
skb before dispatching it for further processing.

Scenario #1:
============

Bluetooth 6lowpan receives an uncompressed IPv6 header, and dispatches it
to netif. The following error occurs:

Null pointer dereference error #1 crash log:

[  845.854013] BUG: unable to handle kernel NULL pointer dereference at
               0000000000000048
[  845.855785] IP: [<ffffffff816e3d36>] enqueue_to_backlog+0x56/0x240
...
[  845.909459] Call Trace:
[  845.911678]  [<ffffffff816e3f64>] netif_rx_internal+0x44/0xf0

The first modification fixes the NULL pointer dereference error by
assigning dev to the local_skb in order to set a valid net_device before
processing the skb by netif_rx_ni().

Scenario #2:
============

Bluetooth 6lowpan receives an UDP compressed message which needs further
decompression by nhc_udp. The following error occurs:

Null pointer dereference error #2 crash log:

[   63.295149] BUG: unable to handle kernel NULL pointer dereference at
               0000000000000840
[   63.295931] IP: [<ffffffffc0559540>] udp_uncompress+0x320/0x626
               [nhc_udp]

The second modification fixes the NULL pointer dereference error by
assigning dev to the local_skb in the case of a udp compressed packet.
The 6lowpan udp_uncompress function expects that the net_device is set in
the skb when checking lltype.
Signed-off-by: NGlenn Ruben Bakke <glenn.ruben.bakke@nordicsemi.no>
Signed-off-by: NLukasz Duda <lukasz.duda@nordicsemi.no>
Acked-by: NJukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org # 4.4+

4c58f328

tree wide: use kvfree() than conditional kfree()/vfree() · 1d5cfdb0

由 Tetsuo Handa 提交于 1月 22, 2016

There are many locations that do

  if (memory_was_allocated_by_vmalloc)
    vfree(ptr);
  else
    kfree(ptr);

but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory
using is_vmalloc_addr().  Unless callers have special reasons, we can
replace this branch with kvfree().  Please check and reply if you found
problems.
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NJan Kara <jack@suse.com>
Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: NAndreas Dilger <andreas.dilger@intel.com>
Acked-by: N"Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Boris Petkov <bp@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d5cfdb0

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

22 1月, 2016 4 次提交

tcp: fix NULL deref in tcp_v4_send_ack() · e62a123b

由 Eric Dumazet 提交于 1月 21, 2016

Neal reported crashes with this stack trace :

 RIP: 0010:[<ffffffff8c57231b>] tcp_v4_send_ack+0x41/0x20f
...
 CR2: 0000000000000018 CR3: 000000044005c000 CR4: 00000000001427e0
...
  [<ffffffff8c57258e>] tcp_v4_reqsk_send_ack+0xa5/0xb4
  [<ffffffff8c1a7caa>] tcp_check_req+0x2ea/0x3e0
  [<ffffffff8c19e420>] tcp_rcv_state_process+0x850/0x2500
  [<ffffffff8c1a6d21>] tcp_v4_do_rcv+0x141/0x330
  [<ffffffff8c56cdb2>] sk_backlog_rcv+0x21/0x30
  [<ffffffff8c098bbd>] tcp_recvmsg+0x75d/0xf90
  [<ffffffff8c0a8700>] inet_recvmsg+0x80/0xa0
  [<ffffffff8c17623e>] sock_aio_read+0xee/0x110
  [<ffffffff8c066fcf>] do_sync_read+0x6f/0xa0
  [<ffffffff8c0673a1>] SyS_read+0x1e1/0x290
  [<ffffffff8c5ca262>] system_call_fastpath+0x16/0x1b

The problem here is the skb we provide to tcp_v4_send_ack() had to
be parked in the backlog of a new TCP fastopen child because this child
was owned by the user at the time an out of window packet arrived.

Before queuing a packet, TCP has to set skb->dev to NULL as the device
could disappear before packet is removed from the queue.

Fix this issue by using the net pointer provided by the socket (being a
timewait or a request socket).

IPv6 is immune to the bug : tcp_v6_send_response() already gets the net
pointer from the socket if provided.

Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
Reported-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e62a123b

libceph: remove outdated comment · 7e01726a

由 Ilya Dryomov 提交于 1月 18, 2016

MClientMount{,Ack} are long gone.  The receipt of bare monmap doesn't
actually indicate a mount success as we are yet to authenticate at that
point in time.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7e01726a

libceph: kill off ceph_x_ticket_handler::validity · f6cdb292

由 Ilya Dryomov 提交于 1月 15, 2016

With it gone, no need to preserve ceph_timespec in process_one_ticket()
either.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

f6cdb292

libceph: invalidate AUTH in addition to a service ticket · 187d131d

由 Ilya Dryomov 提交于 1月 14, 2016

If we fault due to authentication, we invalidate the service ticket we
have and request a new one - the idea being that if a service rejected
our authorizer, it must have expired, despite mon_client's attempts at
periodic renewal.  (The other possibility is that our ticket is too new
and the service hasn't gotten it yet, in which case invalidating isn't
necessary but doesn't hurt.)

Invalidating just the service ticket is not enough, though.  If we
assume a failure on mon_client's part to renew a service ticket, we
have to assume the same for the AUTH ticket.  If our AUTH ticket is
bad, we won't get any service tickets no matter how hard we try, so
invalidate AUTH ticket along with the service ticket.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

187d131d