提交 · b664f008b0d885db1d5617ed1c51d29a8c04da93 · bug2833 / cloud-kernel

04 12月, 2015 12 次提交

bnxt_en: Setup uc_list mac filters after resetting the chip. · b664f008

由 Michael Chan 提交于 12月 02, 2015

Call bnxt_cfg_rx_mode() in bnxt_init_chip() to setup uc_list and
mc_list mac address filters.  Before the patch, uc_list is not
setup again after chip reset (such as ethtool ring size change)
and macvlans don't work any more after that.

Modify bnxt_cfg_rx_mode() to return error codes appropriately so
that the init chip sequence can detect any failures.
Signed-off-by: NMichael Chan <mchan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b664f008

bnxt_en: enforce proper storing of MAC address · bdd4347b

由 Jeffrey Huang 提交于 12月 02, 2015

For PF, the bp->pf.mac_addr always holds the permanent MAC
addr assigned by the HW.  For VF, the bp->vf.mac_addr always
holds the administrator assigned VF MAC addr. The random
generated VF MAC addr should never get stored to bp->vf.mac_addr.
This way, when the VF wants to change the MAC address, we can tell
if the adminstrator has already set it and disallow the VF from
changing it.

v2: Fix compile error if CONFIG_BNXT_SRIOV is not set.
Signed-off-by: NJeffrey Huang <huangjw@broadcom.com>
Signed-off-by: NMichael Chan <mchan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdd4347b

bnxt_en: Fixed incorrect implementation of ndo_set_mac_address · 1fc2cfd0

由 Jeffrey Huang 提交于 12月 02, 2015

The existing ndo_set_mac_address only copies the new MAC addr
and didn't set the new MAC addr to the HW. The correct way is
to delete the existing default MAC filter from HW and add
the new one. Because of RFS filters are also dependent on the
default mac filter l2 context, the driver must go thru
close_nic() to delete the default MAC and RFS filters, then
open_nic() to set the default MAC address to HW.
Signed-off-by: NJeffrey Huang <huangjw@broadcom.com>
Signed-off-by: NMichael Chan <mchan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fc2cfd0

net: lpc_eth: remove irq > NR_IRQS check from probe() · 39198ec9

由 Vladimir Zapolskiy 提交于 12月 02, 2015

If the driver is used on an ARM platform with SPARSE_IRQ defined,
semantics of NR_IRQS is different (minimal value of virtual irqs) and
by default it is set to 16, see arch/arm/include/asm/irq.h.

This value may be less than the actual number of virtual irqs, which
may break the driver initialization. The check removal allows to use
the driver on such a platform, and, if irq controller driver works
correctly, the check is not needed on legacy platforms.

Fixes a runtime problem:

    lpc-eth 31060000.ethernet: error getting resources.
    lpc_eth: lpc-eth: not found (-6).
Signed-off-by: NVladimir Zapolskiy <vz@mleia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39198ec9

net_sched: fix qdisc_tree_decrease_qlen() races · 4eaf3b84

由 Eric Dumazet 提交于 12月 01, 2015

qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
devices.

One problem is that it updates sch->q.qlen and sch->qstats.drops
on the mq/mqprio root qdisc, while it should not : Daniele
reported underflows errors :
[  681.774821] PAX: sch->q.qlen: 0 n: 1
[  681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
[  681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G           O    4.2.6.201511282239-1-grsec #1
[  681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
[  681.774956]  ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
[  681.774959]  ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
[  681.774960]  ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
[  681.774962] Call Trace:
[  681.774967]  [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
[  681.774970]  [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
[  681.774972]  [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
[  681.774976]  [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
[  681.774978]  [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
[  681.774980]  [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
[  681.774983]  [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
[  681.774985]  [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
[  681.774987]  [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
[  681.774989]  [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
[  681.774991]  [<ffffffffa9089560>] ? sort_range+0x40/0x40
[  681.774992]  [<ffffffffa9085fe4>] kthread+0xe4/0x100
[  681.774994]  [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
[  681.774995]  [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70

mq/mqprio have their own ways to report qlen/drops by folding stats on
all their queues, with appropriate locking.

A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
without proper locking : concurrent qdisc updates could corrupt the list
that qdisc_match_from_root() parses to find a qdisc given its handle.

Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
qdisc_tree_decrease_qlen() can use to abort its tree traversal,
as soon as it meets a mq/mqprio qdisc children.

Second problem can be fixed by RCU protection.
Qdisc are already freed after RCU grace period, so qdisc_list_add() and
qdisc_list_del() simply have to use appropriate rcu list variants.

A future patch will add a per struct netdev_queue list anchor, so that
qdisc_tree_decrease_qlen() can have more efficient lookups.
Reported-by: NDaniele Fucini <dfucini@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4eaf3b84

openvswitch: fix hangup on vxlan/gre/geneve device deletion · 13175303

由 Paolo Abeni 提交于 12月 01, 2015

Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.

Fixes: 614732ea ("openvswitch: Use regular VXLAN net_device device")
Fixes: b2acd1dc ("openvswitch: Use regular GRE net_device instead of vport")
Fixes: 6b001e68 ("openvswitch: Use Geneve device.")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NFlavio Leitner <fbl@sysclose.org>
Acked-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13175303

ipv4: igmp: Allow removing groups from a removed interface · 4eba7bb1

由 Andrew Lunn 提交于 12月 01, 2015

When a multicast group is joined on a socket, a struct ip_mc_socklist
is appended to the sockets mc_list containing information about the
joined group.

If the interface is hot unplugged, this entry becomes stale. Prior to
commit 52ad353a ("igmp: fix the problem when mc leave group") it
was possible to remove the stale entry by performing a
IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on
the interface. However, this fix enforces that the interface must
still exist. Thus with time, the number of stale entries grows, until
sysctl_igmp_max_memberships is reached and then it is not possible to
join and more groups.

The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is
performed without specifying the interface, either by ifindex or ip
address. However here we do supply one of these. So loosen the
restriction on device existence to only apply when the interface has
not been specified. This then restores the ability to clean up the
stale entries.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Fixes: 52ad353a "(igmp: fix the problem when mc leave group")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4eba7bb1

ipv6: sctp: implement sctp_v6_destroy_sock() · 602dd62d

由 Eric Dumazet 提交于 12月 01, 2015

Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets.

We need to call inet6_destroy_sock() to properly release
inet6 specific fields.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

602dd62d

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 79aecc72

由 David S. Miller 提交于 12月 03, 2015

Johan Hedberg says:

====================
pull request: bluetooth 2015-12-01

Here's a Bluetooth fix for the 4.4-rc series that fixes a memory leak of
the Security Manager L2CAP channel that'll happen for every LE
connection.

Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79aecc72

arm64: bpf: add 'store immediate' instruction · df849ba3

由 Yang Shi 提交于 11月 30, 2015

aarch64 doesn't have native store immediate instruction, such operation
has to be implemented by the below instruction sequence:

Load immediate to register
Store register
Signed-off-by: NYang Shi <yang.shi@linaro.org>
CC: Zi Shen Lim <zlim.lnx@gmail.com>
CC: Xi Wang <xi.wang@gmail.com>
Reviewed-by: NZi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df849ba3

ipv6: kill sk_dst_lock · 6bd4f355

由 Eric Dumazet 提交于 12月 02, 2015

While testing the np->opt RCU conversion, I found that UDP/IPv6 was
using a mixture of xchg() and sk_dst_lock to protect concurrent changes
to sk->sk_dst_cache, leading to possible corruptions and crashes.

ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
way to fix the mess is to remove sk_dst_lock completely, as we did for
IPv4.

__ip6_dst_store() and ip6_dst_store() share same implementation.

sk_setup_caps() being called with socket lock being held or not,
we have to use sk_dst_set() instead of __sk_dst_set()

Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
in ip6_dst_store() before the sk_setup_caps(sk, dst) call.

This is because ip6_dst_store() can be called from process context,
without any lock held.

As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
from another cpu doing a concurrent ip6_dst_store()

Doing the dst dereference before doing the install is needed to make
sure no use after free would trigger.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bd4f355

ipv6: sctp: add rcu protection around np->opt · c836a8ba

由 Eric Dumazet 提交于 12月 02, 2015

This patch completes the work I did in commit 45f6fad8
("ipv6: add complete rcu protection around np->opt"), as I missed
sctp part.

This simply makes sure np->opt is used with proper RCU locking
and accessors.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c836a8ba

03 12月, 2015 19 次提交

net/neighbour: fix crash at dumping device-agnostic proxy entries · 6adc5fd6

由 Konstantin Khlebnikov 提交于 12月 01, 2015

Proxy entries could have null pointer to net-device.
Signed-off-by: NKonstantin Khlebnikov <koct9i@gmail.com>
Fixes: 84920c14 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6adc5fd6

sctp: use GFP_USER for user-controlled kmalloc · cacc0621

由 Marcelo Ricardo Leitner 提交于 11月 30, 2015

Dmitry Vyukov reported that the user could trigger a kernel warning by
using a large len value for getsockopt SCTP_GET_LOCAL_ADDRS, as that
value directly affects the value used as a kmalloc() parameter.

This patch thus switches the allocation flags from all user-controllable
kmalloc size to GFP_USER to put some more restrictions on it and also
disables the warn, as they are not necessary.
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cacc0621

sctp: convert sack_needed and sack_generation to bits · 38ee8fb6

由 Marcelo Ricardo Leitner 提交于 11月 30, 2015

They don't need to be any bigger than that and with this we start a new
bitfield for tracking association runtime stuff, like zero window
situation.
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38ee8fb6

ipv6: add complete rcu protection around np->opt · 45f6fad8

由 Eric Dumazet 提交于 11月 29, 2015

This patch addresses multiple problems :

UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.

Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())

This patch adds full RCU protection to np->opt
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45f6fad8

bpf: fix allocation warnings in bpf maps and integer overflow · 01b3f521

由 Alexei Starovoitov 提交于 11月 29, 2015

For large map->value_size the user space can trigger memory allocation warnings like:
WARNING: CPU: 2 PID: 11122 at mm/page_alloc.c:2989
__alloc_pages_nodemask+0x695/0x14e0()
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff82743b56>] dump_stack+0x68/0x92 lib/dump_stack.c:50
 [<ffffffff81244ec9>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:460
 [<ffffffff812450f9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:493
 [<     inline     >] __alloc_pages_slowpath mm/page_alloc.c:2989
 [<ffffffff81554e95>] __alloc_pages_nodemask+0x695/0x14e0 mm/page_alloc.c:3235
 [<ffffffff816188fe>] alloc_pages_current+0xee/0x340 mm/mempolicy.c:2055
 [<     inline     >] alloc_pages include/linux/gfp.h:451
 [<ffffffff81550706>] alloc_kmem_pages+0x16/0xf0 mm/page_alloc.c:3414
 [<ffffffff815a1c89>] kmalloc_order+0x19/0x60 mm/slab_common.c:1007
 [<ffffffff815a1cef>] kmalloc_order_trace+0x1f/0xa0 mm/slab_common.c:1018
 [<     inline     >] kmalloc_large include/linux/slab.h:390
 [<ffffffff81627784>] __kmalloc+0x234/0x250 mm/slub.c:3525
 [<     inline     >] kmalloc include/linux/slab.h:463
 [<     inline     >] map_update_elem kernel/bpf/syscall.c:288
 [<     inline     >] SYSC_bpf kernel/bpf/syscall.c:744

To avoid never succeeding kmalloc with order >= MAX_ORDER check that
elem->value_size and computed elem_size are within limits for both hash and
array type maps.
Also add __GFP_NOWARN to kmalloc(value_size | elem_size) to avoid OOM warnings.
Note kmalloc(key_size) is highly unlikely to trigger OOM, since key_size <= 512,
so keep those kmalloc-s as-is.

Large value_size can cause integer overflows in elem_size and map.pages
formulas, so check for that as well.

Fixes: aaac3ba9 ("bpf: charge user for creation of BPF maps and programs")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01b3f521

Merge branch 'mvneta-fixes' · a64f3f83

由 David S. Miller 提交于 12月 02, 2015

Marcin Wojtas says:

====================
Marvell Armada XP/370/38X Neta fixes

I'm sending v4 with corrected commit log of the last patch, in order to
avoid possible conflicts between the branches as suggested by Gregory
Clement.

Best regards,
Marcin Wojtas

Changes from v4:
* Correct commit log of patch 6/6

Changes from v2:
* Style fixes in patch updating mbus protection
* Remove redundant stable notifications except for patch 4/6

Changes from v1:
* update MBUS windows access protection register once, after whole loop
* add fixing value of MVNETA_RXQ_INTR_ENABLE_ALL_MASK
* add fixing error path for skb_build()
* add possibility of setting custom TX IP checksum limit in DT property
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a64f3f83

mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0 · c4a25007