提交 · 1a3065a26807b4cdd65d3b696ddb18385610f7da · openeuler / Kernel

14 5月, 2021 7 次提交

net: bridge: mcast: prepare is-router function for mcast router split · 1a3065a2

由 Linus Lüssing 提交于 5月 13, 2021

In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants make br_multicast_is_router() protocol
family aware.

Note that for now br_ip6_multicast_is_router() uses the currently still
common ip4_mc_router_timer for now. It will be renamed to
ip6_mc_router_timer later when the split is performed.

While at it also renames the "1" and "2" constants in
br_multicast_is_router() to the MDB_RTR_TYPE_TEMP_QUERY and
MDB_RTR_TYPE_PERM enums.
Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a3065a2

net: bridge: mcast: prepare query reception for mcast router split · b19232ef

由 Linus Lüssing 提交于 5月 13, 2021

In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants and as the br_multicast_mark_router() will
be split for that remove the select querier wrapper and instead add
ip4 and ip6 variants for br_multicast_query_received().
Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b19232ef

net: bridge: mcast: prepare mdb netlink for mcast router split · ff391c5d

由 Linus Lüssing 提交于 5月 13, 2021

In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants and to avoid IPv6 #ifdef clutter later add
some inline functions for the protocol specific parts in the mdb router
netlink code. Also the we need iterate over the port instead of router
list to be able put one router port entry with both the IPv4 and IPv6
multicast router info later.
Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff391c5d

net: bridge: mcast: add wrappers for router node retrieval · 44ebb081

由 Linus Lüssing 提交于 5月 13, 2021

In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants and to avoid IPv6 #ifdef clutter later add
two wrapper functions for router node retrieval in the payload
forwarding code.
Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44ebb081

net: bridge: mcast: rename multicast router lists and timers · ce6f7097

由 Linus Lüssing 提交于 5月 13, 2021

In preparation for the upcoming split of multicast router state into
their IPv4 and IPv6 variants, rename the affected variable to the IPv4
version first to avoid some renames in later commits.
Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce6f7097

net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT · 8380c81d

由 Sebastian Andrzej Siewior 提交于 5月 12, 2021

__napi_schedule_irqoff() is an optimized version of __napi_schedule()
which can be used where it is known that interrupts are disabled,
e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
callbacks.

On PREEMPT_RT enabled kernels this assumptions is not true. Force-
threaded interrupt handlers and spinlocks are not disabling interrupts
and the NAPI hrtimer callback is forced into softirq context which runs
with interrupts enabled as well.

Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole
game so make __napi_schedule_irqoff() invoke __napi_schedule() for
PREEMPT_RT kernels.

The callers of ____napi_schedule() in the networking core have been
audited and are correct on PREEMPT_RT kernels as well.
Reported-by: NJuri Lelli <juri.lelli@redhat.com>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NJuri Lelli <juri.lelli@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8380c81d

net: taprio offload: enforce qdisc to netdev queue mapping · 13511704

由 Yannick Vignon 提交于 5月 11, 2021

Even though the taprio qdisc is designed for multiqueue devices, all the
queues still point to the same top-level taprio qdisc. This works and is
probably required for software taprio, but at least with offload taprio,
it has an undesirable side effect: because the whole qdisc is run when a
packet has to be sent, it allows packets in a best-effort class to be
processed in the context of a task sending higher priority traffic. If
there are packets left in the qdisc after that first run, the NET_TX
softirq is raised and gets executed immediately in the same process
context. As with any other softirq, it runs up to 10 times and for up to
2ms, during which the calling process is waiting for the sendmsg call (or
similar) to return. In my use case, that calling process is a real-time
task scheduled to send a packet every 2ms, so the long sendmsg calls are
leading to missed timeslots.

By attaching each netdev queue to its own qdisc, as it is done with
the "classic" mq qdisc, each traffic class can be processed independently
without touching the other classes. A high-priority process can then send
packets without getting stuck in the sendmsg call anymore.
Signed-off-by: NYannick Vignon <yannick.vignon@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13511704

13 5月, 2021 1 次提交

tls splice: remove inappropriate flags checking for MSG_PEEK · d8654f4f

由 Jim Ma 提交于 5月 12, 2021

In function tls_sw_splice_read, before call tls_sw_advance_skb
it checks likely(!(flags & MSG_PEEK)), while MSG_PEEK is used
for recvmsg, splice supports SPLICE_F_NONBLOCK, SPLICE_F_MOVE,
SPLICE_F_MORE, should remove this checking.
Signed-off-by: NJim Ma <majinjing3@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8654f4f

12 5月, 2021 1 次提交

net/sched: taprio: Drop unnecessary NULL check after container_of · faa5f5da

由 Guenter Roeck 提交于 5月 11, 2021

The rcu_head pointer passed to taprio_free_sched_cb is never NULL.
That means that the result of container_of() operations on it is also
never NULL, even though rcu_head is the first element of the structure
embedding it. On top of that, it is misleading to perform a NULL check
on the result of container_of() because the position of the contained
element could change, which would make the check invalid. Remove the
unnecessary NULL check.

This change was made automatically with the following Coccinelle script.

@@
type t;
identifier v;
statement s;
@@

<+...
(
  t v = container_of(...);
|
  v = container_of(...);
)
  ...
  when != v
- if (\( !v \| v == NULL \) ) s
...+>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

faa5f5da

11 5月, 2021 3 次提交

rtnetlink: avoid RCU read lock when holding RTNL · a100243d

由 Cong Wang 提交于 5月 08, 2021

When we call af_ops->set_link_af() we hold a RCU read lock
as we retrieve af_ops from the RCU protected list, but this
is unnecessary because we already hold RTNL lock, which is
the writer lock for protecting rtnl_af_ops, so it is safer
than RCU read lock. Similar for af_ops->validate_link_af().

This was not a problem until we begin to take mutex lock
down the path of ->set_link_af() in __ipv6_dev_mc_dec()
recently. We can just drop the RCU read lock there and
assert RTNL lock.

Reported-and-tested-by: syzbot+7d941e89dd48bcf42573@syzkaller.appspotmail.com
Fixes: 63ed8de4 ("mld: add mc_lock for protecting per-interface mld data")
Tested-by: NTaehee Yoo <ap420073@gmail.com>
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a100243d

neighbour: Remove redundant initialization of 'bucket' · 48de7c0c

由 Yang Li 提交于 5月 08, 2021

Integer variable 'bucket' is being initialized however
this value is never read as 'bucket' is assigned zero
in for statement. Remove the redundant assignment.

Cleans up clang warning:

net/core/neighbour.c:3144:6: warning: Value stored to 'bucket' during
its initialization is never read [clang-analyzer-deadcode.DeadStores]
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Signed-off-by: NYang Li <yang.lee@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48de7c0c

net: openvswitch: Remove unnecessary skb_nfct() · d2792e91

由 Yejune Deng 提交于 5月 08, 2021

There is no need add 'if (skb_nfct(skb))' assignment, the
nf_conntrack_put() would check it.
Signed-off-by: NYejune Deng <yejunedeng@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2792e91

08 5月, 2021 1 次提交

mptcp: fix splat when closing unaccepted socket · 578c18ef

由 Paolo Abeni 提交于 5月 06, 2021

If userspace exits before calling accept() on a listener that had at least
one new connection ready, we get:

Attempt to release TCP socket in state 8

This happens because the mptcp socket gets cloned when the TCP connection
is ready, but the socket is never exposed to userspace.

The client additionally sends a DATA_FIN, which brings connection into
CLOSE_WAIT state. This in turn prevents the orphan+state reset fixup
in mptcp_sock_destruct() from doing its job.

Fixes: 3721b9b6 ("mptcp: Track received DATA_FIN sequence number and add related helpers")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/185Tested-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20210507001638.225468-1-mathew.j.martineau@linux.intel.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

578c18ef

07 5月, 2021 3 次提交

netfilter: nftables: avoid potential overflows on 32bit arches · 6c8774a9

由 Eric Dumazet 提交于 5月 06, 2021

User space could ask for very large hash tables, we need to make sure
our size computations wont overflow.

nf_tables_newset() needs to double check the u64 size
will fit into size_t field.

Fixes: 0ed6389c ("netfilter: nf_tables: rename set implementations")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6c8774a9

netfilter: nftables: avoid overflows in nft_hash_buckets() · a54754ec

由 Eric Dumazet 提交于 5月 06, 2021

Number of buckets being stored in 32bit variables, we have to
ensure that no overflows occur in nft_hash_buckets()

syzbot injected a size == 0x40000000 and reported:

UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
shift exponent 64 is too large for 64-bit type 'long unsigned int'
CPU: 1 PID: 29539 Comm: syz-executor.4 Not tainted 5.12.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327
 __roundup_pow_of_two include/linux/log2.h:57 [inline]
 nft_hash_buckets net/netfilter/nft_set_hash.c:411 [inline]
 nft_hash_estimate.cold+0x19/0x1e net/netfilter/nft_set_hash.c:652
 nft_select_set_ops net/netfilter/nf_tables_api.c:3586 [inline]
 nf_tables_newset+0xe62/0x3110 net/netfilter/nf_tables_api.c:4322
 nfnetlink_rcv_batch+0xa09/0x24b0 net/netfilter/nfnetlink.c:488
 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:612 [inline]
 nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:630
 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
 sock_sendmsg_nosec net/socket.c:654 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:674
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46

Fixes: 0ed6389c ("netfilter: nf_tables: rename set implementations")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a54754ec

tcp: Specify cmsgbuf is user pointer for receive zerocopy. · a6f8ee58

由 Arjun Roy 提交于 5月 06, 2021

A prior change (1f466e1f) introduces separate handling for
->msg_control depending on whether the pointer is a kernel or user
pointer. However, while tcp receive zerocopy is using this field, it
is not properly annotating that the buffer in this case is a user
pointer. This can cause faults when the improper mechanism is used
within put_cmsg().

This patch simply annotates tcp receive zerocopy's use as explicitly
being a user pointer.

Fixes: 7eeba170 ("tcp: Add receive timestamp support for receive zerocopy.")
Signed-off-by: NArjun Roy <arjunroy@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20210506223530.2266456-1-arjunroy.kdev@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

a6f8ee58

06 5月, 2021 5 次提交

netfilter: nftables: Fix a memleak from userdata error path in new objects · 85dfd816

由 Pablo Neira Ayuso 提交于 5月 05, 2021

Release object name if userdata allocation fails.

Fixes: b131c964 ("netfilter: nf_tables: add userdata support for nft_object")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

85dfd816

netfilter: remove BUG_ON() after skb_header_pointer() · 198ad973

由 Pablo Neira Ayuso 提交于 5月 05, 2021

Several conntrack helpers and the TCP tracker assume that
skb_header_pointer() never fails based on upfront header validation.
Even if this should not ever happen, BUG_ON() is a too drastic measure,
remove them.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

198ad973

netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check · 5e024c32

由 Pablo Neira Ayuso 提交于 5月 05, 2021

Do not assume that the tcph->doff field is correct when parsing for TCP
options, skb_header_pointer() might fail to fetch these bits.

Fixes: 11eeef41 ("netfilter: passive OS fingerprint xtables match")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

5e024c32

smc: disallow TCP_ULP in smc_setsockopt() · 86214366

由 Cong Wang 提交于 5月 05, 2021

syzbot is able to setup kTLS on an SMC socket which coincidentally
uses sk_user_data too. Later, kTLS treats it as psock so triggers a
refcnt warning. The root cause is that smc_setsockopt() simply calls
TCP setsockopt() which includes TCP_ULP. I do not think it makes
sense to setup kTLS on top of SMC sockets, so we should just disallow
this setup.

It is hard to find a commit to blame, but we can apply this patch
since the beginning of TCP_ULP.

Reported-and-tested-by: syzbot+b54a1ce86ba4a623b7f0@syzkaller.appspotmail.com
Fixes: 734942cc ("tcp: ULP infrastructure")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86214366

ethtool: fix missing NLM_F_MULTI flag when dumping · cf754ae3

由 Fernando Fernandez Mancera 提交于 5月 05, 2021

When dumping the ethtool information from all the interfaces, the
netlink reply should contain the NLM_F_MULTI flag. This flag allows
userspace tools to identify that multiple messages are expected.

Link: https://bugzilla.redhat.com/1953847
Fixes: 365f9ae4 ("ethtool: fix genlmsg_put() failure handling in ethnl_default_dumpit()")
Signed-off-by: NFernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf754ae3

05 5月, 2021 3 次提交

netfilter: nfnetlink: add a missing rcu_read_unlock() · 7072a355

由 Eric Dumazet 提交于 5月 05, 2021

Reported by syzbot :
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:201
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 26899, name: syz-executor.5
1 lock held by syz-executor.5/26899:
 #0: ffffffff8bf797a0 (rcu_read_lock){....}-{1:2}, at: nfnetlink_get_subsys net/netfilter/nfnetlink.c:148 [inline]
 #0: ffffffff8bf797a0 (rcu_read_lock){....}-{1:2}, at: nfnetlink_rcv_msg+0x1da/0x1300 net/netfilter/nfnetlink.c:226
Preemption disabled at:
[<ffffffff8917799e>] preempt_schedule_irq+0x3e/0x90 kernel/sched/core.c:5533
CPU: 1 PID: 26899 Comm: syz-executor.5 Not tainted 5.12.0-next-20210504-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 ___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:8338
 might_alloc include/linux/sched/mm.h:201 [inline]
 slab_pre_alloc_hook mm/slab.h:500 [inline]
 slab_alloc_node mm/slub.c:2845 [inline]
 kmem_cache_alloc_node+0x33d/0x3e0 mm/slub.c:2960
 __alloc_skb+0x20b/0x340 net/core/skbuff.c:413
 alloc_skb include/linux/skbuff.h:1107 [inline]
 nlmsg_new include/net/netlink.h:953 [inline]
 netlink_ack+0x1ed/0xaa0 net/netlink/af_netlink.c:2437
 netlink_rcv_skb+0x33d/0x420 net/netlink/af_netlink.c:2508
 nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:650
 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
 sock_sendmsg_nosec net/socket.c:654 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:674
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
 do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665f9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fa8a03ee188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000056bf60 RCX: 00000000004665f9
RDX: 0000000000000000 RSI: 0000000020000480 RDI: 0000000000000004
RBP: 00000000004bfce1 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf60
R13: 00007fffe864480f R14: 00007fa8a03ee300 R15: 0000000000022000

================================================
WARNING: lock held when returning to user space!
5.12.0-next-20210504-syzkaller #0 Tainted: G        W
------------------------------------------------
syz-executor.5/26899 is leaving the kernel with locks still held!
1 lock held by syz-executor.5/26899:
 #0: ffffffff8bf797a0 (rcu_read_lock){....}-{1:2}, at: nfnetlink_get_subsys net/netfilter/nfnetlink.c:148 [inline]
 #0: ffffffff8bf797a0 (rcu_read_lock){....}-{1:2}, at: nfnetlink_rcv_msg+0x1da/0x1300 net/netfilter/nfnetlink.c:226
------------[ cut here ]------------
WARNING: CPU: 0 PID: 26899 at kernel/rcu/tree_plugin.h:359 rcu_note_context_switch+0xfd/0x16e0 kernel/rcu/tree_plugin.h:359
Modules linked in:
CPU: 0 PID: 26899 Comm: syz-executor.5 Tainted: G        W         5.12.0-next-20210504-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:rcu_note_context_switch+0xfd/0x16e0 kernel/rcu/tree_plugin.h:359
Code: 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 2e 0d 00 00 8b bd cc 03 00 00 85 ff 7e 02 <0f> 0b 65 48 8b 2c 25 00 f0 01 00 48 8d bd cc 03 00 00 48 b8 00 00
RSP: 0000:ffffc90002fffdb0 EFLAGS: 00010002
RAX: 0000000000000007 RBX: ffff8880b9c36080 RCX: ffffffff8dc99bac
RDX: 0000000000000000 RSI: 0000000000000008 RDI: 0000000000000001
RBP: ffff88808b9d1c80 R08: 0000000000000000 R09: ffffffff8dc96917
R10: fffffbfff1b92d22 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88808b9d1c80 R14: ffff88808b9d1c80 R15: ffffc90002ff8000
FS:  00007fa8a03ee700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f09896ed000 CR3: 0000000032070000 CR4: 00000000001526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 __schedule+0x214/0x23e0 kernel/sched/core.c:5044
 schedule+0xcf/0x270 kernel/sched/core.c:5226
 exit_to_user_mode_loop kernel/entry/common.c:162 [inline]
 exit_to_user_mode_prepare+0x13e/0x280 kernel/entry/common.c:208
 irqentry_exit_to_user_mode+0x5/0x40 kernel/entry/common.c:314
 asm_sysvec_reschedule_ipi+0x12/0x20 arch/x86/include/asm/idtentry.h:637
RIP: 0033:0x4665f9

Fixes: 50f2db9e ("netfilter: nfnetlink: consolidate callback types")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7072a355

net/nfc: fix use-after-free llcp_sock_bind/connect · c61760e6

由 Or Cohen 提交于 5月 04, 2021

Commits 8a4cd82d ("nfc: fix refcount leak in llcp_sock_connect()")
and c33b1cc6 ("nfc: fix refcount leak in llcp_sock_bind()")
fixed a refcount leak bug in bind/connect but introduced a
use-after-free if the same local is assigned to 2 different sockets.

This can be triggered by the following simple program:
    int sock1 = socket( AF_NFC, SOCK_STREAM, NFC_SOCKPROTO_LLCP );
    int sock2 = socket( AF_NFC, SOCK_STREAM, NFC_SOCKPROTO_LLCP );
    memset( &addr, 0, sizeof(struct sockaddr_nfc_llcp) );
    addr.sa_family = AF_NFC;
    addr.nfc_protocol = NFC_PROTO_NFC_DEP;
    bind( sock1, (struct sockaddr*) &addr, sizeof(struct sockaddr_nfc_llcp) )
    bind( sock2, (struct sockaddr*) &addr, sizeof(struct sockaddr_nfc_llcp) )
    close(sock1);
    close(sock2);

Fix this by assigning NULL to llcp_sock->local after calling
nfc_llcp_local_put.

This addresses CVE-2021-23134.
Reported-by: NOr Cohen <orcohen@paloaltonetworks.com>
Reported-by: NNadav Markus <nmarkus@paloaltonetworks.com>
Fixes: c33b1cc6 ("nfc: fix refcount leak in llcp_sock_bind()")
Signed-off-by: NOr Cohen <orcohen@paloaltonetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c61760e6

net: Only allow init netns to set default tcp cong to a restricted algo · 8d432592

由 Jonathon Reinhart 提交于 5月 01, 2021

tcp_set_default_congestion_control() is netns-safe in that it writes
to &net->ipv4.tcp_congestion_control, but it also sets
ca->flags |= TCP_CONG_NON_RESTRICTED which is not namespaced.
This has the unintended side-effect of changing the global
net.ipv4.tcp_allowed_congestion_control sysctl, despite the fact that it
is read-only: 97684f09 ("net: Make tcp_allowed_congestion_control
readonly in non-init netns")

Resolve this netns "leak" by only allowing the init netns to set the
default algorithm to one that is restricted. This restriction could be
removed if tcp_allowed_congestion_control were namespace-ified in the
future.

This bug was uncovered with
https://github.com/JonathonReinhart/linux-netns-sysctl-verify

Fixes: 6670e152 ("tcp: Namespace-ify sysctl_tcp_default_congestion_control")
Signed-off-by: NJonathon Reinhart <jonathon.reinhart@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d432592

04 5月, 2021 10 次提交

libceph: allow addrvecs with a single NONE/blank address · 3f1c6f21

由 Ilya Dryomov 提交于 5月 03, 2021

Normally, an unused OSD id/slot is represented by an empty addrvec.
However, it also appears to be possible to generate an osdmap where
an unused OSD id/slot has an addrvec with a single blank address of
type NONE.  Allow such addrvecs and make the end result be exactly
the same as for the empty addrvec case -- leave addr intact.

Cc: stable@vger.kernel.org # 5.11+
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@kernel.org>

3f1c6f21

xsk: Fix for xp_aligned_validate_desc() when len == chunk_size · ac31565c

由 Xuan Zhuo 提交于 4月 28, 2021

When desc->len is equal to chunk_size, it is legal. But when the
xp_aligned_validate_desc() got chunk_end from desc->addr + desc->len
pointing to the next chunk during the check, it caused the check to
fail.

This problem was first introduced in bbff2f32 ("xsk: new descriptor
addressing scheme"). Later in 2b43470a ("xsk: Introduce AF_XDP buffer
allocation API") this piece of code was moved into the new function called
xp_aligned_validate_desc(). This function was then moved into xsk_queue.h
via 26062b18 ("xsk: Explicitly inline functions and move definitions").

Fixes: bbff2f32 ("xsk: new descriptor addressing scheme")
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20210428094424.54435-1-xuanzhuo@linux.alibaba.com

ac31565c

netfilter: arptables: use pernet ops struct during unregister · 43016d02

由 Florian Westphal 提交于 5月 03, 2021

Like with iptables and ebtables, hook unregistration has to use the
pernet ops struct, not the template.

This triggered following splat:
  hook not found, pf 3 num 0
  WARNING: CPU: 0 PID: 224 at net/netfilter/core.c:480 __nf_unregister_net_hook+0x1eb/0x610 net/netfilter/core.c:480
[..]
 nf_unregister_net_hook net/netfilter/core.c:502 [inline]
 nf_unregister_net_hooks+0x117/0x160 net/netfilter/core.c:576
 arpt_unregister_table_pre_exit+0x67/0x80 net/ipv4/netfilter/arp_tables.c:1565

Fixes: f9006acc ("netfilter: arp_tables: pass table pointer via nf_hook_ops")
Reported-by: syzbot+dcccba8a1e41a38cb9df@syzkaller.appspotmail.com
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

43016d02

netfilter: xt_SECMARK: add new revision to fix structure layout · c7d13358

由 Pablo Neira Ayuso 提交于 4月 30, 2021

This extension breaks when trying to delete rules, add a new revision to
fix this.

Fixes: 5e6874cd ("[SECMARK]: Add xtables SECMARK target")
Signed-off-by: NPhil Sutter <phil@nwl.cc>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c7d13358

sctp: delay auto_asconf init until binding the first addr · 34e5b011

由 Xin Long 提交于 5月 03, 2021

As Or Cohen described:

  If sctp_destroy_sock is called without sock_net(sk)->sctp.addr_wq_lock
  held and sp->do_auto_asconf is true, then an element is removed
  from the auto_asconf_splist without any proper locking.

  This can happen in the following functions:
  1. In sctp_accept, if sctp_sock_migrate fails.
  2. In inet_create or inet6_create, if there is a bpf program
     attached to BPF_CGROUP_INET_SOCK_CREATE which denies
     creation of the sctp socket.

This patch is to fix it by moving the auto_asconf init out of
sctp_init_sock(), by which inet_create()/inet6_create() won't
need to operate it in sctp_destroy_sock() when calling
sk_common_release().

It also makes more sense to do auto_asconf init while binding the
first addr, as auto_asconf actually requires an ANY addr bind,
see it in sctp_addr_wq_timeout_handler().

This addresses CVE-2021-23133.

Fixes: 61023658 ("bpf: Add new cgroup attach type to enable sock modifications")
Reported-by: NOr Cohen <orcohen@paloaltonetworks.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34e5b011

Revert "net/sctp: fix race condition in sctp_destroy_sock" · 01bfe5e8

由 Xin Long 提交于 5月 03, 2021

This reverts commit b166a20b.

This one has to be reverted as it introduced a dead lock, as
syzbot reported:

       CPU0                    CPU1
       ----                    ----
  lock(&net->sctp.addr_wq_lock);
                               lock(slock-AF_INET6);
                               lock(&net->sctp.addr_wq_lock);
  lock(slock-AF_INET6);

CPU0 is the thread of sctp_addr_wq_timeout_handler(), and CPU1
is that of sctp_close().

The original issue this commit fixed will be fixed in the next
patch.

Reported-by: syzbot+959223586843e69a2674@syzkaller.appspotmail.com
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01bfe5e8

net: hsr: check skb can contain struct hsr_ethhdr in fill_frame_info · 2e9f6093

由 Phillip Potter 提交于 5月 02, 2021

Check at start of fill_frame_info that the MAC header in the supplied
skb is large enough to fit a struct hsr_ethhdr, as otherwise this is
not a valid HSR frame. If it is too small, return an error which will
then cause the callers to clean up the skb. Fixes a KMSAN-found
uninit-value bug reported by syzbot at:
https://syzkaller.appspot.com/bug?id=f7e9b601f1414f814f7602a82b6619a8d80bce3f

Reported-by: syzbot+e267bed19bfc5478fb33@syzkaller.appspotmail.com
Signed-off-by: NPhillip Potter <phil@philpotter.co.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e9f6093

sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b · f282df03

由 Xin Long 提交于 5月 03, 2021

Normally SCTP_MIB_CURRESTAB is always incremented once asoc enter into
ESTABLISHED from the state < ESTABLISHED and decremented when the asoc
is being deleted.

However, in sctp_sf_do_dupcook_b(), the asoc's state can be changed to
ESTABLISHED from the state >= ESTABLISHED where it shouldn't increment
SCTP_MIB_CURRESTAB. Otherwise, one asoc may increment MIB_CURRESTAB
multiple times but only decrement once at the end.

I was able to reproduce it by using scapy to do the 4-way shakehands,
after that I replayed the COOKIE-ECHO chunk with 'peer_vtag' field
changed to different values, and SCTP_MIB_CURRESTAB was incremented
multiple times and never went back to 0 even when the asoc was freed.

This patch is to fix it by only incrementing SCTP_MIB_CURRESTAB when
the state < ESTABLISHED in sctp_sf_do_dupcook_b().

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Reported-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f282df03

Revert "sctp: Fix SHUTDOWN CTSN Ack in the peer restart case" · 7aa4e547

由 Xin Long 提交于 5月 03, 2021

This reverts commit 12dfd78e.

This can be reverted as shutdown and cookie_ack chunk are using the
same asoc since commit 35b4f244 ("sctp: do asoc update earlier
in sctp_sf_do_dupcook_a").
Reported-by: NJere Leppänen <jere.leppanen@nokia.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7aa4e547

Revert "Revert "sctp: Fix bundling of SHUTDOWN with COOKIE-ACK"" · 22008f56

由 Xin Long 提交于 5月 03, 2021

This reverts commit 7e9269a5.

As Jere notice, commit 35b4f244 ("sctp: do asoc update earlier
in sctp_sf_do_dupcook_a") only keeps the SHUTDOWN and COOKIE-ACK
with the same asoc, not transport. So we have to bring this patch
back.
Reported-by: NJere Leppänen <jere.leppanen@nokia.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22008f56

02 5月, 2021 2 次提交

xprtrdma: Fix a NULL dereference in frwr_unmap_sync() · 9e895cd9

由 Chuck Lever 提交于 5月 01, 2021

The normal mechanism that invalidates and unmaps MRs is
frwr_unmap_async(). frwr_unmap_sync() is used only when an RPC
Reply bearing Write or Reply chunks has been lost (ie, almost
never).

Coverity found that after commit 9a301caf ("xprtrdma: Move
fr_linv_done field to struct rpcrdma_mr"), the while() loop in
frwr_unmap_sync() exits only once @mr is NULL, unconditionally
causing subsequent dereferences of @mr to Oops.

I've tested this fix by creating a client that skips invoking
frwr_unmap_async() when RPC Replies complete. That forces all
invalidation tasks to fall upon frwr_unmap_sync(). Simple workloads
with this fix applied to the adulterated client work as designed.
Reported-by: Ncoverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1504556 ("Null pointer dereferences")
Fixes: 9a301caf ("xprtrdma: Move fr_linv_done field to struct rpcrdma_mr")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

9e895cd9

sunrpc: Fix misplaced barrier in call_decode · f8f7e0fb

由 Baptiste Lepers 提交于 5月 01, 2021

Fix a misplaced barrier in call_decode. The struct rpc_rqst is modified
as follows by xprt_complete_rqst:

req->rq_private_buf.len = copied;
/* Ensure all writes are done before we update */
/* req->rq_reply_bytes_recvd */
smp_wmb();
req->rq_reply_bytes_recvd = copied;

And currently read as follows by call_decode:

smp_rmb(); // misplaced
if (!req->rq_reply_bytes_recvd)
   goto out;
req->rq_rcv_buf.len = req->rq_private_buf.len;

This patch places the smp_rmb after the if to ensure that
rq_reply_bytes_recvd and rq_private_buf.len are read in order.

Fixes: 9ba82886 ("SUNRPC: Don't try to parse incomplete RPC messages")
Signed-off-by: NBaptiste Lepers <baptiste.lepers@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

f8f7e0fb

01 5月, 2021 4 次提交

sctp: do asoc update earlier in sctp_sf_do_dupcook_b · 51eac7f2

由 Xin Long 提交于 5月 01, 2021

The same thing should be done for sctp_sf_do_dupcook_b().
Meanwhile, SCTP_CMD_UPDATE_ASSOC cmd can be removed.

v1->v2:
  - Fix the return value in sctp_sf_do_assoc_update().
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51eac7f2

Revert "sctp: Fix bundling of SHUTDOWN with COOKIE-ACK" · 7e9269a5

由 Xin Long 提交于 5月 01, 2021

This can be reverted as shutdown and cookie_ack chunk are using the
same asoc since the last patch.

This reverts commit 145cb2f7.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e9269a5

sctp: do asoc update earlier in sctp_sf_do_dupcook_a · 35b4f244

由 Xin Long 提交于 5月 01, 2021

There's a panic that occurs in a few of envs, the call trace is as below:

  [] general protection fault, ... 0x29acd70f1000a: 0000 [#1] SMP PTI
  [] RIP: 0010:sctp_ulpevent_notify_peer_addr_change+0x4b/0x1fa [sctp]
  []  sctp_assoc_control_transport+0x1b9/0x210 [sctp]
  []  sctp_do_8_2_transport_strike.isra.16+0x15c/0x220 [sctp]
  []  sctp_cmd_interpreter.isra.21+0x1231/0x1a10 [sctp]
  []  sctp_do_sm+0xc3/0x2a0 [sctp]
  []  sctp_generate_timeout_event+0x81/0xf0 [sctp]

This is caused by a transport use-after-free issue. When processing a
duplicate COOKIE-ECHO chunk in sctp_sf_do_dupcook_a(), both COOKIE-ACK
and SHUTDOWN chunks are allocated with the transort from the new asoc.
However, later in the sideeffect machine, the old asoc is used to send
them out and old asoc's shutdown_last_sent_to is set to the transport
that SHUTDOWN chunk attached to in sctp_cmd_setup_t2(), which actually
belongs to the new asoc. After the new_asoc is freed and the old asoc
T2 timeout, the old asoc's shutdown_last_sent_to that is already freed
would be accessed in sctp_sf_t2_timer_expire().

Thanks Alexander and Jere for helping dig into this issue.

To fix it, this patch is to do the asoc update first, then allocate
the COOKIE-ACK and SHUTDOWN chunks with the 'updated' old asoc. This
would make more sense, as a chunk from an asoc shouldn't be sent out
with another asoc. We had fixed quite a few issues caused by this.

Fixes: 145cb2f7 ("sctp: Fix bundling of SHUTDOWN with COOKIE-ACK")
Reported-by: NAlexander Sverdlin <alexander.sverdlin@nokia.com>
Reported-by: syzbot+bbe538efd1046586f587@syzkaller.appspotmail.com
Reported-by: NMichal Tesar <mtesar@redhat.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35b4f244

vsock/vmci: Remove redundant assignment to err · f0a5818b

由 Yang Li 提交于 4月 30, 2021

Variable 'err' is set to zero but this value is never read as it is
overwritten with a new value later on, hence it is a redundant
assignment and can be removed.

Clean up the following clang-analyzer warning:

net/vmw_vsock/vmci_transport.c:948:2: warning: Value stored to 'err' is
never read [clang-analyzer-deadcode.DeadStores]
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Signed-off-by: NYang Li <yang.lee@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f0a5818b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功