提交 · b5f0549231ffb025337be5a625b0ff9f52b016f0 · openanolis / cloud-kernel

20 2月, 2016 5 次提交

unix_diag: fix incorrect sign extension in unix_lookup_by_ino · b5f05492

由 Dmitry V. Levin 提交于 2月 19, 2016

The value passed by unix_diag_get_exact to unix_lookup_by_ino has type
__u32, but unix_lookup_by_ino's argument ino has type int, which is not
a problem yet.
However, when ino is compared with sock_i_ino return value of type
unsigned long, ino is sign extended to signed long, and this results
to incorrect comparison on 64-bit architectures for inode numbers
greater than INT_MAX.

This bug was found by strace test suite.

Fixes: 5d3cae8b ("unix_diag: Dumping exact socket core")
Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5f05492

lwt: fix rx checksum setting for lwt devices tunneling over ipv6 · c868ee70

由 Paolo Abeni 提交于 2月 17, 2016

the commit 35e2d115 ("tunnels: Allow IPv6 UDP checksums to be
correctly controlled.") changed the default xmit checksum setting
for lwt vxlan/geneve ipv6 tunnels, so that now the checksum is not
set into external UDP header.
This commit changes the rx checksum setting for both lwt vxlan/geneve
devices created by openvswitch accordingly, so that lwt over ipv6
tunnel pairs are again able to communicate with default values.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NJesse Gross <jesse@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c868ee70

tipc: unlock in error path · b53ce3e7

由 Insu Yun 提交于 2月 17, 2016

tipc_bcast_unlock need to be unlocked in error path.
Signed-off-by: NInsu Yun <wuninsu@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b53ce3e7

rtnl: RTM_GETNETCONF: fix wrong return value · a97eb33f

由 Anton Protopopov 提交于 2月 16, 2016

An error response from a RTM_GETNETCONF request can return the positive
error value EINVAL in the struct nlmsgerr that can mislead userspace.
Signed-off-by: NAnton Protopopov <a.s.protopopov@gmail.com>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a97eb33f

net: make netdev_for_each_lower_dev safe for device removal · cfdd28be

由 Nikolay Aleksandrov 提交于 2月 17, 2016

When I used netdev_for_each_lower_dev in commit bad53162 ("vrf:
remove slave queue and private slave struct") I thought that it acts
like netdev_for_each_lower_private and can be used to remove the current
device from the list while walking, but unfortunately it acts more like
netdev_for_each_lower_private_rcu and doesn't allow it. The difference
is where the "iter" points to, right now it points to the current element
and that makes it impossible to remove it. Change the logic to be
similar to netdev_for_each_lower_private and make it point to the "next"
element so we can safely delete the current one. VRF is the only such
user right now, there's no change for the read-only users.

Here's what can happen now:
[98423.249858] general protection fault: 0000 [#1] SMP
[98423.250175] Modules linked in: vrf bridge(O) stp llc nfsd auth_rpcgss
oid_registry nfs_acl nfs lockd grace sunrpc crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel jitterentropy_rng
sha256_generic hmac drbg ppdev aesni_intel aes_x86_64 glue_helper lrw
gf128mul ablk_helper cryptd evdev serio_raw pcspkr virtio_balloon
parport_pc parport i2c_piix4 i2c_core virtio_console acpi_cpufreq button
9pnet_virtio 9p 9pnet fscache ipv6 autofs4 ext4 crc16 mbcache jbd2 sg
virtio_blk virtio_net sr_mod cdrom e1000 ata_generic ehci_pci uhci_hcd
ehci_hcd usbcore usb_common virtio_pci ata_piix libata floppy
virtio_ring virtio scsi_mod [last unloaded: bridge]
[98423.255040] CPU: 1 PID: 14173 Comm: ip Tainted: G           O
4.5.0-rc2+ #81
[98423.255386] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.8.1-20150318_183358- 04/01/2014
[98423.255777] task: ffff8800547f5540 ti: ffff88003428c000 task.ti:
ffff88003428c000
[98423.256123] RIP: 0010:[<ffffffff81514f3e>]  [<ffffffff81514f3e>]
netdev_lower_get_next+0x1e/0x30
[98423.256534] RSP: 0018:ffff88003428f940  EFLAGS: 00010207
[98423.256766] RAX: 0002000100000004 RBX: ffff880054ff9000 RCX:
0000000000000000
[98423.257039] RDX: ffff88003428f8b8 RSI: ffff88003428f950 RDI:
ffff880054ff90c0
[98423.257287] RBP: ffff88003428f940 R08: 0000000000000000 R09:
0000000000000000
[98423.257537] R10: 0000000000000001 R11: 0000000000000000 R12:
ffff88003428f9e0
[98423.257802] R13: ffff880054a5fd00 R14: ffff88003428f970 R15:
0000000000000001
[98423.258055] FS:  00007f3d76881700(0000) GS:ffff88005d000000(0000)
knlGS:0000000000000000
[98423.258418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[98423.258650] CR2: 00007ffe5951ffa8 CR3: 0000000052077000 CR4:
00000000000406e0
[98423.258902] Stack:
[98423.259075]  ffff88003428f960 ffffffffa0442636 0002000100000004
ffff880054ff9000
[98423.259647]  ffff88003428f9b0 ffffffff81518205 ffff880054ff9000
ffff88003428f978
[98423.260208]  ffff88003428f978 ffff88003428f9e0 ffff88003428f9e0
ffff880035b35f00
[98423.260739] Call Trace:
[98423.260920]  [<ffffffffa0442636>] vrf_dev_uninit+0x76/0xa0 [vrf]
[98423.261156]  [<ffffffff81518205>]
rollback_registered_many+0x205/0x390
[98423.261401]  [<ffffffff815183ec>] unregister_netdevice_many+0x1c/0x70
[98423.261641]  [<ffffffff8153223c>] rtnl_delete_link+0x3c/0x50
[98423.271557]  [<ffffffff815335bb>] rtnl_dellink+0xcb/0x1d0
[98423.271800]  [<ffffffff811cd7da>] ? __inc_zone_state+0x4a/0x90
[98423.272049]  [<ffffffff815337b4>] rtnetlink_rcv_msg+0x84/0x200
[98423.272279]  [<ffffffff810cfe7d>] ? trace_hardirqs_on+0xd/0x10
[98423.272513]  [<ffffffff8153370b>] ? rtnetlink_rcv+0x1b/0x40
[98423.272755]  [<ffffffff81533730>] ? rtnetlink_rcv+0x40/0x40
[98423.272983]  [<ffffffff8155d6e7>] netlink_rcv_skb+0x97/0xb0
[98423.273209]  [<ffffffff8153371a>] rtnetlink_rcv+0x2a/0x40
[98423.273476]  [<ffffffff8155ce8b>] netlink_unicast+0x11b/0x1a0
[98423.273710]  [<ffffffff8155d2f1>] netlink_sendmsg+0x3e1/0x610
[98423.273947]  [<ffffffff814fbc98>] sock_sendmsg+0x38/0x70
[98423.274175]  [<ffffffff814fc253>] ___sys_sendmsg+0x2e3/0x2f0
[98423.274416]  [<ffffffff810d841e>] ? do_raw_spin_unlock+0xbe/0x140
[98423.274658]  [<ffffffff811e1bec>] ? handle_mm_fault+0x26c/0x2210
[98423.274894]  [<ffffffff811e19cd>] ? handle_mm_fault+0x4d/0x2210
[98423.275130]  [<ffffffff81269611>] ? __fget_light+0x91/0xb0
[98423.275365]  [<ffffffff814fcd42>] __sys_sendmsg+0x42/0x80
[98423.275595]  [<ffffffff814fcd92>] SyS_sendmsg+0x12/0x20
[98423.275827]  [<ffffffff81611bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
[98423.276073] Code: c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66
90 48 8b 06 55 48 81 c7 c0 00 00 00 48 89 e5 48 8b 00 48 39 f8 74 09 48
89 06 <48> 8b 40 e8 5d c3 31 c0 5d c3 0f 1f 84 00 00 00 00 00 66 66 66
[98423.279639] RIP  [<ffffffff81514f3e>] netdev_lower_get_next+0x1e/0x30
[98423.279920]  RSP <ffff88003428f940>

CC: David Ahern <dsa@cumulusnetworks.com>
CC: David S. Miller <davem@davemloft.net>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Vlad Yasevich <vyasevic@redhat.com>
Fixes: bad53162 ("vrf: remove slave queue and private slave struct")
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: NDavid Ahern <dsa@cumulusnetworks.com>
Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cfdd28be

19 2月, 2016 7 次提交

net: caif: fix erroneous return value · 449f14f0

由 Anton Protopopov 提交于 2月 17, 2016

The cfrfml_receive() function might return positive value EPROTO
Signed-off-by: NAnton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

449f14f0

appletalk: fix erroneous return value · 48bb230e

由 Anton Protopopov 提交于 2月 17, 2016

The atalk_sendmsg() function might return wrong value ENETUNREACH
instead of -ENETUNREACH.
Signed-off-by: NAnton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48bb230e

IFF_NO_QUEUE: Fix for drivers not calling ether_setup() · a813104d

由 Phil Sutter 提交于 2月 17, 2016

My implementation around IFF_NO_QUEUE driver flag assumed that leaving
tx_queue_len untouched (specifically: not setting it to zero) by drivers
would make it possible to assign a regular qdisc to them without having
to worry about setting tx_queue_len to a useful value. This was only
partially true: I overlooked that some drivers don't call ether_setup()
and therefore not initialize tx_queue_len to the default value of 1000.
Consequently, removing the workarounds in place for that case in qdisc
implementations which cared about it (namely, pfifo, bfifo, gred, htb,
plug and sfb) leads to problems with these specific interface types and
qdiscs.

Luckily, there's already a sanitization point for drivers setting
tx_queue_len to zero, which can be reused to assign the fallback value
most qdisc implementations used, which is 1.

Fixes: 348e3435 ("net: sched: drop all special handling of tx_queue_len == 0")
Tested-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NPhil Sutter <phil@nwl.cc>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a813104d

gre: clear IFF_TX_SKB_SHARING · d13b161c

由 Jiri Benc 提交于 2月 17, 2016

ether_setup sets IFF_TX_SKB_SHARING but this is not supported by gre
as it modifies the skb on xmit.

Also, clean up whitespace in ipgre_tap_setup when we're already touching it.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d13b161c

tcp/dccp: fix another race at listener dismantle · 7716682c

由 Eric Dumazet 提交于 2月 18, 2016

Ilya reported following lockdep splat:

kernel: =========================
kernel: [ BUG: held lock freed! ]
kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted
kernel: -------------------------
kernel: swapper/5/0 is freeing memory
ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
kernel: 4 locks held by swapper/5/0:
kernel: #0:  (rcu_read_lock){......}, at: [<ffffffff8169ef6b>]
netif_receive_skb_internal+0x4b/0x1f0
kernel: #1:  (rcu_read_lock){......}, at: [<ffffffff816e977f>]
ip_local_deliver_finish+0x3f/0x380
kernel: #2:  (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>]
sk_clone_lock+0x19b/0x440
kernel: #3:  (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0

To properly fix this issue, inet_csk_reqsk_queue_add() needs
to return to its callers if the child as been queued
into accept queue.

We also need to make sure listener is still there before
calling sk->sk_data_ready(), by holding a reference on it,
since the reference carried by the child can disappear as
soon as the child is put on accept queue.
Reported-by: NIlya Dryomov <idryomov@gmail.com>
Fixes: ebb516af ("tcp/dccp: fix race at listener dismantle phase")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7716682c

route: check and remove route cache when we get route · deed49df

由 Xin Long 提交于 2月 18, 2016

Since the gc of ipv4 route was removed, the route cached would has
no chance to be removed, and even it has been timeout, it still could
be used, cause no code to check it's expires.

Fix this issue by checking  and removing route cache when we get route.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

deed49df

net_sched fix: reclassification needs to consider ether protocol changes · 619fe326

由 Jamal Hadi Salim 提交于 2月 18, 2016

actions could change the etherproto in particular with ethernet
tunnelled data. Typically such actions, after peeling the outer header,
will ask for the packet to be  reclassified. We then need to restart
the classification with the new proto header.

Example setup used to catch this:
sudo tc qdisc add dev $ETH ingress
sudo $TC filter add dev $ETH parent ffff: pref 1 protocol 802.1Q \
u32 match u32 0 0 flowid 1:1 \
action  vlan pop reclassify

Fixes: 3b3ae880 ("net: sched: consolidate tc_classify{,_compat}")
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

619fe326

18 2月, 2016 3 次提交

tcp: correctly crypto_alloc_hash return check · 1eea84b7

由 Insu Yun 提交于 2月 15, 2016

crypto_alloc_hash never returns NULL
Signed-off-by: NInsu Yun <wuninsu@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1eea84b7

net: dsa: Unregister slave_dev in error path · 73dcb556

由 Florian Fainelli 提交于 2月 17, 2016

With commit 0071f56e ("dsa: Register netdev before phy"), we are now trying
to free a network device that has been previously registered, and in case of
errors, this will make us hit the BUG_ON(dev->reg_state != NETREG_UNREGISTERED)
condition.

Fix this by adding a missing unregister_netdev() before free_netdev().

Fixes: 0071f56e ("dsa: Register netdev before phy")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73dcb556

l2tp: Fix error creating L2TP tunnels · 853effc5

由 Mark Tomlinson 提交于 2月 15, 2016

A previous commit (33f72e6f) added notification via netlink for tunnels
when created/modified/deleted. If the notification returned an error,
this error was returned from the tunnel function. If there were no
listeners, the error code ESRCH was returned, even though having no
listeners is not an error. Other calls to this and other similar
notification functions either ignore the error code, or filter ESRCH.
This patch checks for ESRCH and does not flag this as an error.
Reviewed-by: NHamish Martin <hamish.martin@alliedtelesis.co.nz>
Signed-off-by: NMark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

853effc5

17 2月, 2016 8 次提交

tcp: md5: release request socket instead of listener · 72923555

由 Eric Dumazet 提交于 2月 11, 2016

If tcp_v4_inbound_md5_hash() returns an error, we must release
the refcount on the request socket, not on the listener.

The bug was added for IPv4 only.

Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72923555

tcp: do not set rtt_min to 1 · 37202283

由 Eric Dumazet 提交于 2月 11, 2016

There are some cases where rtt_us derives from deltas of jiffies,
instead of using usec timestamps.

Since we want to track minimal rtt, better to assume a delta of 0 jiffie
might be in fact be very close to 1 jiffie.

It is kind of sad jiffies_to_usecs(1) calls a function instead of simply
using a constant.

Fixes: f6722583 ("tcp: track min RTT using windowed min-filter")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

37202283

net: dsa: remove phy_disconnect from error path · a407054f

由 Sascha Hauer 提交于 2月 11, 2016

The phy has not been initialized, disconnecting it in the error
path results in a NULL pointer exception. Drop the phy_disconnect
from the error path.
Signed-off-by: NSascha Hauer <s.hauer@pengutronix.de>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Acked-by: NNeil Armstrong <narmstrong@baylibre.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a407054f

tipc: fix premature addition of node to lookup table · d5c91fb7

由 Jon Paul Maloy 提交于 2月 10, 2016

In commit 52666986 ("tipc: let broadcast packet reception
use new link receive function") we introduced a new per-node
broadcast reception link instance. This link is created at the
moment the node itself is created. Unfortunately, the allocation
is done after the node instance has already been added to the node
lookup hash table. This creates a potential race condition, where
arriving broadcast packets are able to find and access the node
before it has been fully initialized, and before the above mentioned
link has been created. The result is occasional crashes in the function
tipc_bcast_rcv(), which is trying to access the not-yet existing link.

We fix this by deferring the addition of the node instance until after
it has been fully initialized in the function tipc_node_create().
Acked-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5c91fb7

bridge: mdb: avoid uninitialized variable warning · 56bb7fd9

由 Arnd Bergmann 提交于 2月 10, 2016

A recent change to the mdb code confused the compiler to the point
where it did not realize that the port-group returned from
br_mdb_add_group() is always valid when the function returns a nonzero
return value, so we get a spurious warning:

net/bridge/br_mdb.c: In function 'br_mdb_add':
net/bridge/br_mdb.c:542:4: error: 'pg' may be used uninitialized in this function [-Werror=maybe-uninitialized]
    __br_mdb_notify(dev, entry, RTM_NEWMDB, pg);

Slightly rearranging the code in br_mdb_add_group() makes the problem
go away, as gcc is clever enough to see that both functions check
for 'ret != 0'.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Fixes: 9e8430f8 ("bridge: mdb: Passing the port-group pointer to br_mdb module")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56bb7fd9

net: Copy inner L3 and L4 headers as unaligned on GRE TEB · 78565208

由 Alexander Duyck 提交于 2月 09, 2016

This patch corrects the unaligned accesses seen on GRE TEB tunnels when
generating hash keys.  Specifically what this patch does is make it so that
we force the use of skb_copy_bits when the GRE inner headers will be
unaligned due to NET_IP_ALIGNED being a non-zero value.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Acked-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78565208

af_unix: Guard against other == sk in unix_dgram_sendmsg · a5527dda

由 Rainer Weikusat 提交于 2月 11, 2016

The unix_dgram_sendmsg routine use the following test

if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {

to determine if sk and other are in an n:1 association (either
established via connect or by using sendto to send messages to an
unrelated socket identified by address). This isn't correct as the
specified address could have been bound to the sending socket itself or
because this socket could have been connected to itself by the time of
the unix_peer_get but disconnected before the unix_state_lock(other). In
both cases, the if-block would be entered despite other == sk which
might either block the sender unintentionally or lead to trying to unlock
the same spin lock twice for a non-blocking send. Add a other != sk
check to guard against this.

Fixes: 7d267278 ("unix: avoid use-after-free in ep_remove_wait_queue")
Reported-By: NPhilipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
Tested-by: NPhilipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5527dda

af_unix: Don't set err in unix_stream_read_generic unless there was an error · 1b92ee3d

由 Rainer Weikusat 提交于 2月 08, 2016

The present unix_stream_read_generic contains various code sequences of
the form

err = -EDISASTER;
if (<test>)
	goto out;

This has the unfortunate side effect of possibly causing the error code
to bleed through to the final

out:
	return copied ? : err;

and then to be wrongly returned if no data was copied because the caller
didn't supply a data buffer, as demonstrated by the program available at

http://pad.lv/1540731

Change it such that err is only set if an error condition was detected.

Fixes: 3822b5c2 ("af_unix: Revert 'lock_interruptible' in stream receive code")
Reported-by: NJoseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b92ee3d

16 2月, 2016 3 次提交

batman-adv: Avoid endless loop in bat-on-bat netdevice check · 1bc4e2b0

由 Andrew Lunn 提交于 2月 11, 2016

batman-adv checks in different situation if a new device is already on top
of a different batman-adv device. This is done by getting the iflink of a
device and all its parent. It assumes that this iflink is always a parent
device in an acyclic graph. But this assumption is broken by devices like
veth which are actually a pair of two devices linked to each other. The
recursive check would therefore get veth0 when calling dev_get_iflink on
veth1. And it gets veth0 when calling dev_get_iflink with veth1.

Creating a veth pair and loading batman-adv freezes parts of the system

    ip link add veth0 type veth peer name veth1
    modprobe batman-adv

An RCU stall will be detected on the system which cannot be fixed.

    INFO: rcu_sched self-detected stall on CPU
            1: (5264 ticks this GP) idle=3e9/140000000000001/0
    softirq=144683/144686 fqs=5249
             (t=5250 jiffies g=46 c=45 q=43)
    Task dump for CPU 1:
    insmod          R  running task        0   247    245 0x00000008
     ffffffff8151f140 ffffffff8107888e ffff88000fd141c0 ffffffff8151f140
     0000000000000000 ffffffff81552df0 ffffffff8107b420 0000000000000001
     ffff88000e3fa700 ffffffff81540b00 ffffffff8107d667 0000000000000001
    Call Trace:
     <IRQ>  [<ffffffff8107888e>] ? rcu_dump_cpu_stacks+0x7e/0xd0
     [<ffffffff8107b420>] ? rcu_check_callbacks+0x3f0/0x6b0
     [<ffffffff8107d667>] ? hrtimer_run_queues+0x47/0x180
     [<ffffffff8107cf9d>] ? update_process_times+0x2d/0x50
     [<ffffffff810873fb>] ? tick_handle_periodic+0x1b/0x60
     [<ffffffff810290ae>] ? smp_trace_apic_timer_interrupt+0x5e/0x90
     [<ffffffff813bbae2>] ? apic_timer_interrupt+0x82/0x90
     <EOI>  [<ffffffff812c3fd7>] ? __dev_get_by_index+0x37/0x40
     [<ffffffffa0031f3e>] ? batadv_hard_if_event+0xee/0x3a0 [batman_adv]
     [<ffffffff812c5801>] ? register_netdevice_notifier+0x81/0x1a0
    [...]

This can be avoided by checking if two devices are each others parent and
stopping the check in this situation.

Fixes: b7eddd0b ("batman-adv: prevent using any virtual device created on batman-adv as hard-interface")
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
[sven@narfation.org: rewritten description, extracted fix]
Signed-off-by: NSven Eckelmann <sven@narfation.org>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: NAntonio Quartulli <a@unstable.cc>

1bc4e2b0

batman-adv: Only put orig_node_vlan list reference when removed · 3db15209

由 Sven Eckelmann 提交于 1月 31, 2016

The batadv_orig_node_vlan reference counter in batadv_tt_global_size_mod
can only be reduced when the list entry was actually removed. Otherwise the
reference counter may reach zero when batadv_tt_global_size_mod is called
from two different contexts for the same orig_node_vlan but only one
context is actually removing the entry from the list.

The release function for this orig_node_vlan is not called inside the
vlan_list_lock spinlock protected region because the function
batadv_tt_global_size_mod still holds a orig_node_vlan reference for the
object pointer on the stack. Thus the actual release function (when
required) will be called only at the end of the function.

Fixes: 7ea7b4a1 ("batman-adv: make the TT CRC logic VLAN specific")
Signed-off-by: NSven Eckelmann <sven@narfation.org>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: NAntonio Quartulli <a@unstable.cc>

3db15209

batman-adv: Only put gw_node list reference when removed · c18bdd01

由 Sven Eckelmann 提交于 1月 31, 2016

The batadv_gw_node reference counter in batadv_gw_node_update can only be
reduced when the list entry was actually removed. Otherwise the reference
counter may reach zero when batadv_gw_node_update is called from two
different contexts for the same gw_node but only one context is actually
removing the entry from the list.

The release function for this gw_node is not called inside the list_lock
spinlock protected region because the function batadv_gw_node_update still
holds a gw_node reference for the object pointer on the stack. Thus the
actual release function (when required) will be called only at the end of
the function.

Fixes: bd3524c1 ("batman-adv: remove obsolete deleted attribute for gateway node")
Signed-off-by: NSven Eckelmann <sven@narfation.org>
Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: NAntonio Quartulli <a@unstable.cc>

c18bdd01

13 2月, 2016 2 次提交

vsock: Fix blocking ops call in prepare_to_wait · 59888180

由 Laura Abbott 提交于 2月 04, 2016

We receoved a bug report from someone using vmware:

WARNING: CPU: 3 PID: 660 at kernel/sched/core.c:7389
__might_sleep+0x7d/0x90()
do not call blocking ops when !TASK_RUNNING; state=1 set at
[<ffffffff810fa68d>] prepare_to_wait+0x2d/0x90
Modules linked in: vmw_vsock_vmci_transport vsock snd_seq_midi
snd_seq_midi_event snd_ens1371 iosf_mbi gameport snd_rawmidi
snd_ac97_codec ac97_bus snd_seq coretemp snd_seq_device snd_pcm
snd_timer snd soundcore ppdev crct10dif_pclmul crc32_pclmul
ghash_clmulni_intel vmw_vmci vmw_balloon i2c_piix4 shpchp parport_pc
parport acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc btrfs
xor raid6_pq 8021q garp stp llc mrp crc32c_intel serio_raw mptspi vmwgfx
drm_kms_helper ttm drm scsi_transport_spi mptscsih e1000 ata_generic
mptbase pata_acpi
CPU: 3 PID: 660 Comm: vmtoolsd Not tainted
4.2.0-0.rc1.git3.1.fc23.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
Reference Platform, BIOS 6.00 05/20/2014
 0000000000000000 0000000049e617f3 ffff88006ac37ac8 ffffffff818641f5
 0000000000000000 ffff88006ac37b20 ffff88006ac37b08 ffffffff810ab446
 ffff880068009f40 ffffffff81c63bc0 0000000000000061 0000000000000000
Call Trace:
 [<ffffffff818641f5>] dump_stack+0x4c/0x65
 [<ffffffff810ab446>] warn_slowpath_common+0x86/0xc0
 [<ffffffff810ab4d5>] warn_slowpath_fmt+0x55/0x70
 [<ffffffff8112551d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
 [<ffffffff810fa68d>] ? prepare_to_wait+0x2d/0x90
 [<ffffffff810fa68d>] ? prepare_to_wait+0x2d/0x90
 [<ffffffff810da2bd>] __might_sleep+0x7d/0x90
 [<ffffffff812163b3>] __might_fault+0x43/0xa0
 [<ffffffff81430477>] copy_from_iter+0x87/0x2a0
 [<ffffffffa039460a>] __qp_memcpy_to_queue+0x9a/0x1b0 [vmw_vmci]
 [<ffffffffa0394740>] ? qp_memcpy_to_queue+0x20/0x20 [vmw_vmci]
 [<ffffffffa0394757>] qp_memcpy_to_queue_iov+0x17/0x20 [vmw_vmci]
 [<ffffffffa0394d50>] qp_enqueue_locked+0xa0/0x140 [vmw_vmci]
 [<ffffffffa039593f>] vmci_qpair_enquev+0x4f/0xd0 [vmw_vmci]
 [<ffffffffa04847bb>] vmci_transport_stream_enqueue+0x1b/0x20
[vmw_vsock_vmci_transport]
 [<ffffffffa047ae05>] vsock_stream_sendmsg+0x2c5/0x320 [vsock]
 [<ffffffff810fabd0>] ? wake_atomic_t_function+0x70/0x70
 [<ffffffff81702af8>] sock_sendmsg+0x38/0x50
 [<ffffffff81702ff4>] SYSC_sendto+0x104/0x190
 [<ffffffff8126e25a>] ? vfs_read+0x8a/0x140
 [<ffffffff817042ee>] SyS_sendto+0xe/0x10
 [<ffffffff8186d9ae>] entry_SYSCALL_64_fastpath+0x12/0x76

transport->stream_enqueue may call copy_to_user so it should
not be called inside a prepare_to_wait. Narrow the scope of
the prepare_to_wait to avoid the bad call. This also applies
to vsock_stream_recvmsg as well.
Reported-by: NVinson Lee <vlee@freedesktop.org>
Tested-by: NVinson Lee <vlee@freedesktop.org>
Signed-off-by: NLaura Abbott <labbott@fedoraproject.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59888180

ipv4: fix memory leaks in ip_cmsg_send() callers · 91948309

由 Eric Dumazet 提交于 2月 04, 2016

Dmitry reported memory leaks of IP options allocated in
ip_cmsg_send() when/if this function returns an error.

Callers are responsible for the freeing.

Many thanks to Dmitry for the report and diagnostic.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91948309

10 2月, 2016 1 次提交

vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices · 7e059158

由 David Wragg 提交于 2月 10, 2016

Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
transmit vxlan packets of any size, constrained only by the ability to
send out the resulting packets. 4.3 introduced netdevs corresponding
to tunnel vports. These netdevs have an MTU, which limits the size of
a packet that can be successfully encapsulated. The default MTU
values are low (1500 or less), which is awkwardly small in the context
of physical networks supporting jumbo frames, and leads to a
conspicuous change in behaviour for userspace.

Instead, set the MTU on openvswitch-created netdevs to be the relevant
maximum (i.e. the maximum IP packet size minus any relevant overhead),
effectively restoring the behaviour prior to 4.3.
Signed-off-by: NDavid Wragg <david@weave.works>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e059158

09 2月, 2016 4 次提交

flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen · 461547f3

由 Alexander Duyck 提交于 2月 09, 2016

This patch fixes an issue with unaligned accesses when using
eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
The fact is when trying to check the length we don't need to be looking at
the flow label so we can reorder the checks to first check if we are
supposed to gather the flow label and then make the call to actually get
it.

v2:  Updated path so that either STOP_AT_FLOW_LABEL or KEY_FLOW_LABEL can
     cause us to check for the flow label.
Reported-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

461547f3

sctp: translate network order to host order when users get a hmacid · 7a84bd46

由 Xin Long 提交于 2月 03, 2016

Commit ed5a377d ("sctp: translate host order to network order when
setting a hmacid") corrected the hmacid byte-order when setting a hmacid.
but the same issue also exists on getting a hmacid.

We fix it by changing hmacids to host order when users get them with
getsockopt.

Fixes: Commit ed5a377d ("sctp: translate host order to network order when setting a hmacid")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a84bd46

net:Add sysctl_max_skb_frags · 5f74f82e

由 Hans Westgaard Ry 提交于 2月 03, 2016

Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.
Signed-off-by: NHans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: NHåkon Bugge <haakon.bugge@oracle.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f74f82e

tcp: do not drop syn_recv on all icmp reports · 9cf74903

由 Eric Dumazet 提交于 2月 02, 2016

Petr Novopashenniy reported that ICMP redirects on SYN_RECV sockets
were leading to RST.

This is of course incorrect.

A specific list of ICMP messages should be able to drop a SYN_RECV.

For instance, a REDIRECT on SYN_RECV shall be ignored, as we do
not hold a dst per SYN_RECV pseudo request.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111751
Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
Reported-by: NPetr Novopashenniy <pety@rusnet.ru>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9cf74903

08 2月, 2016 5 次提交

ipv6: fix a lockdep splat · 44c3d0c1

由 Eric Dumazet 提交于 2月 02, 2016

Silence lockdep false positive about rcu_dereference() being
used in the wrong context.

First one should use rcu_dereference_protected() as we own the spinlock.

Second one should be a normal assignation, as no barrier is needed.

Fixes: 18367681 ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
Reported-by: NDave Jones <davej@codemonkey.org.uk>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44c3d0c1

unix: correctly track in-flight fds in sending process user_struct · 415e3d3e

由 Hannes Frederic Sowa 提交于 2月 03, 2016

The commit referenced in the Fixes tag incorrectly accounted the number
of in-flight fds over a unix domain socket to the original opener
of the file-descriptor. This allows another process to arbitrary
deplete the original file-openers resource limit for the maximum of
open files. Instead the sending processes and its struct cred should
be credited.

To do so, we add a reference counted struct user_struct pointer to the
scm_fp_list and use it to account for the number of inflight unix fds.

Fixes: 712f4aad ("unix: properly account for FDs passed over unix sockets")
Reported-by: NDavid Herrmann <dh.herrmann@gmail.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

415e3d3e

netfilter: nft_counter: fix erroneous return values · 5cc6ce9f

由 Anton Protopopov 提交于 2月 06, 2016

The nft_counter_init() and nft_counter_clone() functions should return
negative error value -ENOMEM instead of positive ENOMEM.
Signed-off-by: NAnton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

5cc6ce9f

netfilter: tee: select NF_DUP_IPV6 unconditionally · 08a7f5d3

由 Arnd Bergmann 提交于 2月 05, 2016

The NETFILTER_XT_TARGET_TEE option selects NF_DUP_IPV6 whenever
IP6_NF_IPTABLES is enabled, and it ensures that it cannot be
builtin itself if NF_CONNTRACK is a loadable module, as that
is a dependency for NF_DUP_IPV6.

However, NF_DUP_IPV6 can be enabled even if IP6_NF_IPTABLES is
turned off, and it only really depends on IPV6. With the current
check in tee_tg6, we call nf_dup_ipv6() whenever NF_DUP_IPV6
is enabled. This can however be a loadable module which is
unreachable from a built-in xt_TEE:

net/built-in.o: In function `tee_tg6':
:(.text+0x67728): undefined reference to `nf_dup_ipv6'

The bug was originally introduced in the split of the xt_TEE module
into separate modules for ipv4 and ipv6, and two patches tried
to fix it unsuccessfully afterwards.

This is a revert of the the first incorrect attempt to fix it,
going back to depending on IPV6 as the dependency, and we
adapt the 'select' condition accordingly.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Fixes: bbde9fc1 ("netfilter: factor out packet duplication for IPv4/IPv6")
Fixes: 116984a3 ("netfilter: xt_TEE: use IS_ENABLED(CONFIG_NF_DUP_IPV6)")
Fixes: 74ec4d55 ("netfilter: fix xt_TEE and xt_TPROXY dependencies")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

08a7f5d3

netfilter: nfnetlink: correctly validate length of batch messages · c58d6c93

由 Phil Turnbull 提交于 2月 02, 2016

If nlh->nlmsg_len is zero then an infinite loop is triggered because
'skb_pull(skb, msglen);' pulls zero bytes.

The calculation in nlmsg_len() underflows if 'nlh->nlmsg_len <
NLMSG_HDRLEN' which bypasses the length validation and will later
trigger an out-of-bound read.

If the length validation does fail then the malformed batch message is
copied back to userspace. However, we cannot do this because the
nlh->nlmsg_len can be invalid. This leads to an out-of-bounds read in
netlink_ack:

    [   41.455421] ==================================================================
    [   41.456431] BUG: KASAN: slab-out-of-bounds in memcpy+0x1d/0x40 at addr ffff880119e79340
    [   41.456431] Read of size 4294967280 by task a.out/987
    [   41.456431] =============================================================================
    [   41.456431] BUG kmalloc-512 (Not tainted): kasan: bad access detected
    [   41.456431] -----------------------------------------------------------------------------
    ...
    [   41.456431] Bytes b4 ffff880119e79310: 00 00 00 00 d5 03 00 00 b0 fb fe ff 00 00 00 00  ................
    [   41.456431] Object ffff880119e79320: 20 00 00 00 10 00 05 00 00 00 00 00 00 00 00 00   ...............
    [   41.456431] Object ffff880119e79330: 14 00 0a 00 01 03 fc 40 45 56 11 22 33 10 00 05  .......@EV."3...
    [   41.456431] Object ffff880119e79340: f0 ff ff ff 88 99 aa bb 00 14 00 0a 00 06 fe fb  ................
                                            ^^ start of batch nlmsg with
                                               nlmsg_len=4294967280
    ...
    [   41.456431] Memory state around the buggy address:
    [   41.456431]  ffff880119e79400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [   41.456431]  ffff880119e79480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    [   41.456431] >ffff880119e79500: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
    [   41.456431]                                ^
    [   41.456431]  ffff880119e79580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    [   41.456431]  ffff880119e79600: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb
    [   41.456431] ==================================================================

Fix this with better validation of nlh->nlmsg_len and by setting
NFNL_BATCH_FAILURE if any batch message fails length validation.

CAP_NET_ADMIN is required to trigger the bugs.

Fixes: 9ea2aa8b ("netfilter: nfnetlink: validate nfnetlink header from batch")
Signed-off-by: NPhil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c58d6c93

06 2月, 2016 1 次提交

ipv6: addrconf: Fix recursive spin lock call · 16186a82

由 subashab@codeaurora.org 提交于 2月 02, 2016

A rcu stall with the following backtrace was seen on a system with
forwarding, optimistic_dad and use_optimistic set. To reproduce,
set these flags and allow ipv6 autoconf.

This occurs because the device write_lock is acquired while already
holding the read_lock. Back trace below -

INFO: rcu_preempt self-detected stall on CPU { 1}  (t=2100 jiffies
 g=3992 c=3991 q=4471)
<6> Task dump for CPU 1:
<2> kworker/1:0     R  running task    12168    15   2 0x00000002
<2> Workqueue: ipv6_addrconf addrconf_dad_work
<6> Call trace:
<2> [<ffffffc000084da8>] el1_irq+0x68/0xdc
<2> [<ffffffc000cc4e0c>] _raw_write_lock_bh+0x20/0x30
<2> [<ffffffc000bc5dd8>] __ipv6_dev_ac_inc+0x64/0x1b4
<2> [<ffffffc000bcbd2c>] addrconf_join_anycast+0x9c/0xc4
<2> [<ffffffc000bcf9f0>] __ipv6_ifa_notify+0x160/0x29c
<2> [<ffffffc000bcfb7c>] ipv6_ifa_notify+0x50/0x70
<2> [<ffffffc000bd035c>] addrconf_dad_work+0x314/0x334
<2> [<ffffffc0000b64c8>] process_one_work+0x244/0x3fc
<2> [<ffffffc0000b7324>] worker_thread+0x2f8/0x418
<2> [<ffffffc0000bb40c>] kthread+0xe0/0xec

v2: do addrconf_dad_kick inside read lock and then acquire write
lock for ipv6_ifa_notify as suggested by Eric

Fixes: 7fd2561e ("net: ipv6: Add a sysctl to make optimistic
addresses useful candidates")

Cc: Eric Dumazet <edumazet@google.com>
Cc: Erik Kline <ek@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16186a82

05 2月, 2016 1 次提交

libceph: MOSDOpReply v7 encoding · b0b31a8f

由 Ilya Dryomov 提交于 2月 03, 2016

Empty request_redirect_t (struct ceph_request_redirect in the kernel
client) is now encoded with a bool.  NEW_OSDOPREPLY_ENCODING feature
bit overlaps with already supported CRUSH_TUNABLES5.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

b0b31a8f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功