提交 · 4845b5713ab18a1bb6e31d1fbb4d600240b8b691 · openeuler / Kernel

20 7月, 2022 3 次提交

tcp: Fix data-races around sysctl_tcp_slow_start_after_idle. · 4845b571

由 Kuniyuki Iwashima 提交于 7月 18, 2022

While reading sysctl_tcp_slow_start_after_idle, it can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 35089bb2 ("[TCP]: Add tcp_slow_start_after_idle sysctl.")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4845b571

udp: Fix a data-race around sysctl_udp_l3mdev_accept. · 3d72bb41

由 Kuniyuki Iwashima 提交于 7月 18, 2022

While reading sysctl_udp_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 63a6fff3 ("net: Avoid receiving packets with an l3mdev on unbound UDP sockets")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d72bb41

ip: Fix data-races around sysctl_ip_prot_sock. · 9b55c20f

由 Kuniyuki Iwashima 提交于 7月 18, 2022

sysctl_ip_prot_sock is accessed concurrently, and there is always a chance
of data-race.  So, all readers and writers need some basic protection to
avoid load/store-tearing.

Fixes: 4548b683 ("Introduce a sysctl that modifies the value of PROT_SOCK.")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b55c20f

19 7月, 2022 1 次提交

amt: use workqueue for gateway side message handling · 30e22a6e

由 Taehee Yoo 提交于 7月 17, 2022

There are some synchronization issues(amt->status, amt->req_cnt, etc)
if the interface is in gateway mode because gateway message handlers
are processed concurrently.
This applies a work queue for processing these messages instead of
expanding the locking context.

So, the purposes of this patch are to fix exist race conditions and to make
gateway to be able to validate a gateway status more correctly.

When the AMT gateway interface is created, it tries to establish to relay.
The establishment step looks stateless, but it should be managed well.
In order to handle messages in the gateway, it saves the current
status(i.e. AMT_STATUS_XXX).
This patch makes gateway code to be worked with a single thread.

Now, all messages except the multicast are triggered(received or
delay expired), and these messages will be stored in the event
queue(amt->events).
Then, the single worker processes stored messages asynchronously one
by one.
The multicast data message type will be still processed immediately.

Now, amt->lock is only needed to access the event queue(amt->events)
if an interface is the gateway mode.

Fixes: cbc21dc1 ("amt: add data plane of amt interface")
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>

30e22a6e

18 7月, 2022 3 次提交

tcp: Fix a data-race around sysctl_tcp_notsent_lowat. · 55be8736

由 Kuniyuki Iwashima 提交于 7月 15, 2022

While reading sysctl_tcp_notsent_lowat, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: c9bee3b7 ("tcp: TCP_NOTSENT_LOWAT socket option")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55be8736

tcp: Fix data-races around some timeout sysctl knobs. · 39e24435

由 Kuniyuki Iwashima 提交于 7月 15, 2022

While reading these sysctl knobs, they can be changed concurrently.
Thus, we need to add READ_ONCE() to their readers.

  - tcp_retries1
  - tcp_retries2
  - tcp_orphan_retries
  - tcp_fin_timeout

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39e24435

tcp: Fix data-races around keepalive sysctl knobs. · f2f316e2

由 Kuniyuki Iwashima 提交于 7月 15, 2022

While reading sysctl_tcp_keepalive_(time|probes|intvl), they can be changed
concurrently.  Thus, we need to add READ_ONCE() to their readers.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2f316e2

16 7月, 2022 1 次提交

tcp/udp: Make early_demux back namespacified. · 11052589

由 Kuniyuki Iwashima 提交于 7月 13, 2022

Commit e21145a9 ("ipv4: namespacify ip_early_demux sysctl knob") made
it possible to enable/disable early_demux on a per-netns basis. Then, we
introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for
TCP/UDP in commit dddb64bc ("net: Add sysctl to toggle early demux for
tcp and udp"). However, the .proc_handler() was wrong and actually
disabled us from changing the behaviour in each netns.

We can execute early_demux if net.ipv4.ip_early_demux is on and each proto
.early_demux() handler is not NULL. When we toggle (tcp|udp)_early_demux,
the change itself is saved in each netns variable, but the .early_demux()
handler is a global variable, so the handler is switched based on the
init_net's sysctl variable. Thus, netns (tcp|udp)_early_demux knobs have
nothing to do with the logic. Whether we CAN execute proto .early_demux()
is always decided by init_net's sysctl knob, and whether we DO it or not is
by each netns ip_early_demux knob.

This patch namespacifies (tcp|udp)_early_demux again. For now, the users
of the .early_demux() handler are TCP and UDP only, and they are called
directly to avoid retpoline. So, we can remove the .early_demux() handler
from inet6?_protos and need not dereference them in ip6?_rcv_finish_core().
If another proto needs .early_demux(), we can restore it at that time.

Fixes: dddb64bc ("net: Add sysctl to toggle early demux for tcp and udp")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

11052589

15 7月, 2022 7 次提交

tcp: Fix data-races around sysctl_tcp_l3mdev_accept. · 08a75f10

由 Kuniyuki Iwashima 提交于 7月 13, 2022

While reading sysctl_tcp_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 6dd9a14e ("net: Allow accepted sockets to be bound to l3mdev domain")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08a75f10

tcp/dccp: Fix a data-race around sysctl_tcp_fwmark_accept. · 1a0008f9

由 Kuniyuki Iwashima 提交于 7月 13, 2022

While reading sysctl_tcp_fwmark_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 84f39b08 ("net: support marking accepting TCP sockets")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a0008f9

ip: Fix a data-race around sysctl_fwmark_reflect. · 85d0b4db

由 Kuniyuki Iwashima 提交于 7月 13, 2022

While reading sysctl_fwmark_reflect, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: e110861f ("net: add a sysctl to reflect the fwmark on replies")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85d0b4db

ip: Fix data-races around sysctl_ip_nonlocal_bind. · 289d3b21

由 Kuniyuki Iwashima 提交于 7月 13, 2022

While reading sysctl_ip_nonlocal_bind, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

289d3b21

ip: Fix data-races around sysctl_ip_fwd_use_pmtu. · 60c158dc

由 Kuniyuki Iwashima 提交于 7月 13, 2022

While reading sysctl_ip_fwd_use_pmtu, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: f87c10a8 ("ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60c158dc

ip: Fix data-races around sysctl_ip_default_ttl. · 8281b7ec

由 Kuniyuki Iwashima 提交于 7月 13, 2022

While reading sysctl_ip_default_ttl, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8281b7ec

net/tls: Check for errors in tls_device_init · 3d8c51b2

由 Tariq Toukan 提交于 7月 14, 2022

Add missing error checks in tls_device_init.

Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
Reported-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/20220714070754.1428-1-tariqt@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

3d8c51b2

13 7月, 2022 1 次提交

raw: Fix a data-race around sysctl_raw_l3mdev_accept. · 1dace014

由 Kuniyuki Iwashima 提交于 7月 11, 2022

While reading sysctl_raw_l3mdev_accept, it can be changed concurrently.
Thus, we need to add READ_ONCE() to its reader.

Fixes: 6897445f ("net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1dace014

09 7月, 2022 1 次提交

netfilter: nf_tables: replace BUG_ON by element length check · c39ba4de

由 Pablo Neira Ayuso 提交于 7月 05, 2022

BUG_ON can be triggered from userspace with an element with a large
userdata area. Replace it by length check and return EINVAL instead.
Over time extensions have been growing in size.

Pick a sufficiently old Fixes: tag to propagate this fix.

Fixes: 7d740264 ("netfilter: nf_tables: variable sized set element keys / data")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c39ba4de

08 7月, 2022 1 次提交

net: Fix data-races around sysctl_mem. · 310731e2

由 Kuniyuki Iwashima 提交于 7月 06, 2022

While reading .sysctl_mem, it can be changed concurrently.
So, we need to add READ_ONCE() to avoid data-races.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

310731e2

06 7月, 2022 1 次提交

net/sched: act_police: allow 'continue' action offload · 052f744f

由 Vlad Buslov 提交于 7月 04, 2022

Offloading police with action TC_ACT_UNSPEC was erroneously disabled even
though it was supported by mlx5 matchall offload implementation, which
didn't verify the action type but instead assumed that any single police
action attached to matchall classifier is a 'continue' action. Lack of
action type check made it non-obvious what mlx5 matchall implementation
actually supports and caused implementers and reviewers of referenced
commits to disallow it as a part of improved validation code.

Fixes: b8cd5831 ("net: flow_offload: add tc police action parameters")
Fixes: b50e462b ("net/sched: act_police: Add extack messages for offload failure")
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Tested-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

052f744f

29 6月, 2022 1 次提交

wifi: mac80211: add gfp_t parameter to ieeee80211_obss_color_collision_notify · 03895c84

由 Lorenzo Bianconi 提交于 6月 17, 2022

Introduce the capability to specify gfp_t parameter to
ieeee80211_obss_color_collision_notify routine since it runs in
interrupt context in ieee80211_rx_check_bss_color_collision().

Fixes: 6d945a33 ("mac80211: introduce BSS color collision detection")
Co-developed-by: NRyder Lee <ryder.lee@mediatek.com>
Signed-off-by: NRyder Lee <ryder.lee@mediatek.com>
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/02c990fb3fbd929c8548a656477d20d6c0427a13.1655419135.git.lorenzo@kernel.orgSigned-off-by: NJohannes Berg <johannes.berg@intel.com>

03895c84

28 6月, 2022 1 次提交

netfilter: nf_tables: avoid skb access on nf_stolen · e34b9ed9

由 Florian Westphal 提交于 6月 22, 2022

When verdict is NF_STOLEN, the skb might have been freed.

When tracing is enabled, this can result in a use-after-free:
1. access to skb->nf_trace
2. access to skb->mark
3. computation of trace id
4. dump of packet payload

To avoid 1, keep a cached copy of skb->nf_trace in the
trace state struct.
Refresh this copy whenever verdict is != STOLEN.

Avoid 2 by skipping skb->mark access if verdict is STOLEN.

3 is avoided by precomputing the trace id.

Only dump the packet when verdict is not "STOLEN".
Reported-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

e34b9ed9

23 6月, 2022 1 次提交

sock: redo the psock vs ULP protection check · e34a07c0

由 Jakub Kicinski 提交于 6月 20, 2022

Commit 8a59f9d1 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
has moved the inet_csk_has_ulp(sk) check from sk_psock_init() to
the new tcp_bpf_update_proto() function. I'm guessing that this
was done to allow creating psocks for non-inet sockets.

Unfortunately the destruction path for psock includes the ULP
unwind, so we need to fail the sk_psock_init() itself.
Otherwise if ULP is already present we'll notice that later,
and call tcp_update_ulp() with the sk_proto of the ULP
itself, which will most likely result in the ULP looping
its callbacks.

Fixes: 8a59f9d1 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()")
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NJohn Fastabend <john.fastabend@gmail.com>
Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
Tested-by: NJakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20220620191353.1184629-2-kuba@kernel.orgSigned-off-by: NPaolo Abeni <pabeni@redhat.com>

e34a07c0

17 6月, 2022 1 次提交

Revert "net: Add a second bind table hashed by port and address" · 593d1ebe

由 Joanne Koong 提交于 6月 15, 2022

This reverts:

commit d5a42de8 ("net: Add a second bind table hashed by port and address")
commit 538aaf9b ("selftests: Add test for timing a bind request to a port with a populated bhash entry")
Link: https://lore.kernel.org/netdev/20220520001834.2247810-1-kuba@kernel.org/

There are a few things that need to be fixed here:
* Updating bhash2 in cases where the socket's rcv saddr changes
* Adding bhash2 hashbucket locks

Links to syzbot reports:
https://lore.kernel.org/netdev/00000000000022208805e0df247a@google.com/
https://lore.kernel.org/netdev/0000000000003f33bc05dfaf44fe@google.com/

Fixes: d5a42de8 ("net: Add a second bind table hashed by port and address")
Reported-by: syzbot+015d756bbd1f8b5c8f09@syzkaller.appspotmail.com
Reported-by: syzbot+98fd2d1422063b0f8c44@syzkaller.appspotmail.com
Reported-by: syzbot+0a847a982613c6438fba@syzkaller.appspotmail.com
Signed-off-by: NJoanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20220615193213.2419568-1-joannelkoong@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

593d1ebe

09 6月, 2022 1 次提交

ipv6: Fix signed integer overflow in __ip6_append_data · f93431c8

由 Wang Yufen 提交于 6月 07, 2022

Resurrect ubsan overflow checks and ubsan report this warning,
fix it by change the variable [length] type to size_t.

UBSAN: signed-integer-overflow in net/ipv6/ip6_output.c:1489:19
2147479552 + 8567 cannot be represented in type 'int'
CPU: 0 PID: 253 Comm: err Not tainted 5.16.0+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
  dump_backtrace+0x214/0x230
  show_stack+0x30/0x78
  dump_stack_lvl+0xf8/0x118
  dump_stack+0x18/0x30
  ubsan_epilogue+0x18/0x60
  handle_overflow+0xd0/0xf0
  __ubsan_handle_add_overflow+0x34/0x44
  __ip6_append_data.isra.48+0x1598/0x1688
  ip6_append_data+0x128/0x260
  udpv6_sendmsg+0x680/0xdd0
  inet6_sendmsg+0x54/0x90
  sock_sendmsg+0x70/0x88
  ____sys_sendmsg+0xe8/0x368
  ___sys_sendmsg+0x98/0xe0
  __sys_sendmmsg+0xf4/0x3b8
  __arm64_sys_sendmmsg+0x34/0x48
  invoke_syscall+0x64/0x160
  el0_svc_common.constprop.4+0x124/0x300
  do_el0_svc+0x44/0xc8
  el0_svc+0x3c/0x1e8
  el0t_64_sync_handler+0x88/0xb0
  el0t_64_sync+0x16c/0x170

Changes since v1:
-Change the variable [length] type to unsigned, as Eric Dumazet suggested.
Changes since v2:
-Don't change exthdrlen type in ip6_make_skb, as Paolo Abeni suggested.
Changes since v3:
-Don't change ulen type in udpv6_sendmsg and l2tp_ip6_sendmsg, as
Jakub Kicinski suggested.
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NWang Yufen <wangyufen@huawei.com>
Link: https://lore.kernel.org/r/20220607120028.845916-1-wangyufen@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

f93431c8

07 6月, 2022 1 次提交

netfilter: nf_tables: bail out early if hardware offload is not supported · 3a41c64d

由 Pablo Neira Ayuso 提交于 6月 06, 2022

If user requests for NFT_CHAIN_HW_OFFLOAD, then check if either device
provides the .ndo_setup_tc interface or there is an indirect flow block
that has been registered. Otherwise, bail out early from the preparation
phase. Moreover, validate that family == NFPROTO_NETDEV and hook is
NF_NETDEV_INGRESS.

Fixes: c9626a2c ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

3a41c64d

06 6月, 2022 1 次提交

bluetooth: don't use bitmaps for random flag accesses · e1cff700

由 Linus Torvalds 提交于 6月 05, 2022

The bluetooth code uses our bitmap infrastructure for the two bits (!)
of connection setup flags, and in the process causes odd problems when
it converts between a bitmap and just the regular values of said bits.

It's completely pointless to do things like bitmap_to_arr32() to convert
a bitmap into a u32. It shoudln't have been a bitmap in the first
place. The reason to use bitmaps is if you have arbitrary number of
bits you want to manage (not two!), or if you rely on the atomicity
guarantees of the bitmap setting and clearing.

The code could use an "atomic_t" and use "atomic_or/andnot()" to set and
clear the bit values, but considering that it then copies the bitmaps
around with "bitmap_to_arr32()" and friends, there clearly cannot be a
lot of atomicity requirements.

So just use a regular integer.

In the process, this avoids the warnings about erroneous use of
bitmap_from_u64() which were triggered on 32-bit architectures when
conversion from a u64 would access two words (and, surprise, surprise,
only one word is needed - and indeed overkill - for a 2-bit bitmap).

That was always problematic, but the compiler seems to notice it and
warn about the invalid pattern only after commit 0a97953f ("lib: add
bitmap_{from,to}_arr64") changed the exact implementation details of
'bitmap_from_u64()', as reported by Sudip Mukherjee and Stephen Rothwell.

Fixes: fe92ee64 ("Bluetooth: hci_core: Rework hci_conn_params flags")
Link: https://lore.kernel.org/all/YpyJ9qTNHJzz0FHY@debian/
Link: https://lore.kernel.org/all/20220606080631.0c3014f2@canb.auug.org.au/
Link: https://lore.kernel.org/all/20220605162537.1604762-1-yury.norov@gmail.com/Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Reported-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
Reviewed-by: NYury Norov <yury.norov@gmail.com>
Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e1cff700

02 6月, 2022 2 次提交

ax25: Fix ax25 session cleanup problems · 7d8a3a47

由 Duoming Zhou 提交于 5月 30, 2022

There are session cleanup problems in ax25_release() and
ax25_disconnect(). If we setup a session and then disconnect,
the disconnected session is still in "LISTENING" state that
is shown below.

Active AX.25 sockets
Dest       Source     Device  State        Vr/Vs    Send-Q  Recv-Q
DL9SAU-4   DL9SAU-3   ???     LISTENING    000/000  0       0
DL9SAU-3   DL9SAU-4   ???     LISTENING    000/000  0       0

The first reason is caused by del_timer_sync() in ax25_release().
The timers of ax25 are used for correct session cleanup. If we use
ax25_release() to close ax25 sessions and ax25_dev is not null,
the del_timer_sync() functions in ax25_release() will execute.
As a result, the sessions could not be cleaned up correctly,
because the timers have stopped.

In order to solve this problem, this patch adds a device_up flag
in ax25_dev in order to judge whether the device is up. If there
are sessions to be cleaned up, the del_timer_sync() in
ax25_release() will not execute. What's more, we add ax25_cb_del()
in ax25_kill_by_device(), because the timers have been stopped
and there are no functions that could delete ax25_cb if we do not
call ax25_release(). Finally, we reorder the position of
ax25_list_lock in ax25_cb_del() in order to synchronize among
different functions that call ax25_cb_del().

The second reason is caused by improper check in ax25_disconnect().
The incoming ax25 sessions which ax25->sk is null will close
heartbeat timer, because the check "if(!ax25->sk || ..)" is
satisfied. As a result, the session could not be cleaned up properly.

In order to solve this problem, this patch changes the improper
check to "if(ax25->sk && ..)" in ax25_disconnect().

What`s more, the ax25_disconnect() may be called twice, which is
not necessary. For example, ax25_kill_by_device() calls
ax25_disconnect() and sets ax25->state to AX25_STATE_0, but
ax25_release() calls ax25_disconnect() again.

In order to solve this problem, this patch add a check in
ax25_release(). If the flag of ax25->sk equals to SOCK_DEAD,
the ax25_disconnect() in ax25_release() should not be executed.

Fixes: 82e31755 ("ax25: Fix UAF bugs in ax25 timers")
Fixes: 8a367e74 ("ax25: Fix segfault after sock connection timeout")
Reported-and-tested-by: NThomas Osterried <thomas@osterried.de>
Signed-off-by: NDuoming Zhou <duoming@zju.edu.cn>
Link: https://lore.kernel.org/r/20220530152158.108619-1-duoming@zju.edu.cnSigned-off-by: NPaolo Abeni <pabeni@redhat.com>

7d8a3a47

netfilter: nf_tables: delete flowtable hooks via transaction list · b6d9014a

由 Pablo Neira Ayuso 提交于 5月 30, 2022

Remove inactive bool field in nft_hook object that was introduced in
abadb2f8 ("netfilter: nf_tables: delete devices from flowtable").
Move stale flowtable hooks to transaction list instead.

Deleting twice the same device does not result in ENOENT.

Fixes: abadb2f8 ("netfilter: nf_tables: delete devices from flowtable")
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b6d9014a

01 6月, 2022 2 次提交

bonding: guard ns_targets by CONFIG_IPV6 · c4caa500

由 Hangbin Liu 提交于 5月 31, 2022

Guard ns_targets in struct bond_params by CONFIG_IPV6, which could save
256 bytes if IPv6 not configed. Also add this protection for function
bond_is_ip6_target_ok() and bond_get_targets_ip6().

Remove the IS_ENABLED() check for bond_opts[] as this will make
BOND_OPT_NS_TARGETS uninitialized if CONFIG_IPV6 not enabled. Add
a dummy bond_option_ns_ip6_targets_set() for this situation.

Fixes: 4e24be01 ("bonding: add new parameter ns_targets")
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Acked-by: NJonathan Toppins <jtoppins@redhat.com>
Link: https://lore.kernel.org/r/20220531063727.224043-1-liuhangbin@gmail.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>

c4caa500

net: sched: add barrier to fix packet stuck problem for lockless qdisc · 2e8728c9

由 Guoju Fang 提交于 5月 28, 2022

In qdisc_run_end(), the spin_unlock() only has store-release semantic,
which guarantees all earlier memory access are visible before it. But
the subsequent test_bit() has no barrier semantics so may be reordered
ahead of the spin_unlock(). The store-load reordering may cause a packet
stuck problem.

The concurrent operations can be described as below,
         CPU 0                      |          CPU 1
   qdisc_run_end()                  |     qdisc_run_begin()
          .                         |           .
 ----> /* may be reorderd here */   |           .
|         .                         |           .
|     spin_unlock()                 |         set_bit()
|         .                         |         smp_mb__after_atomic()
 ---- test_bit()                    |         spin_trylock()
          .                         |          .

Consider the following sequence of events:
    CPU 0 reorder test_bit() ahead and see MISSED = 0
    CPU 1 calls set_bit()
    CPU 1 calls spin_trylock() and return fail
    CPU 0 executes spin_unlock()

At the end of the sequence, CPU 0 calls spin_unlock() and does nothing
because it see MISSED = 0. The skb on CPU 1 has beed enqueued but no one
take it, until the next cpu pushing to the qdisc (if ever ...) will
notice and dequeue it.

This patch fix this by adding one explicit barrier. As spin_unlock() and
test_bit() ordering is a store-load ordering, a full memory barrier
smp_mb() is needed here.

Fixes: a90c57f2 ("net: sched: fix packet stuck problem for lockless qdisc")
Signed-off-by: NGuoju Fang <gjfang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220528101628.120193-1-gjfang@linux.alibaba.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

2e8728c9

27 5月, 2022 2 次提交

netfilter: conntrack: re-fetch conntrack after insertion · 56b14ece

由 Florian Westphal 提交于 5月 20, 2022

In case the conntrack is clashing, insertion can free skb->_nfct and
set skb->_nfct to the already-confirmed entry.

This wasn't found before because the conntrack entry and the extension
space used to free'd after an rcu grace period, plus the race needs
events enabled to trigger.

Reported-by: <syzbot+793a590957d9c1b96620@syzkaller.appspotmail.com>
Fixes: 71d8c47f ("netfilter: conntrack: introduce clash resolution on insertion race")
Fixes: 2ad9d774 ("netfilter: conntrack: free extension area immediately")
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

56b14ece

net: sched: fixed barrier to prevent skbuff sticking in qdisc backlog · a54ce370

由 Vincent Ray 提交于 5月 25, 2022

In qdisc_run_begin(), smp_mb__before_atomic() used before test_bit()
does not provide any ordering guarantee as test_bit() is not an atomic
operation. This, added to the fact that the spin_trylock() call at
the beginning of qdisc_run_begin() does not guarantee acquire
semantics if it does not grab the lock, makes it possible for the
following statement :

if (test_bit(__QDISC_STATE_MISSED, &qdisc->state))

to be executed before an enqueue operation called before
qdisc_run_begin().

As a result the following race can happen :

           CPU 1                             CPU 2

      qdisc_run_begin()               qdisc_run_begin() /* true */
        set(MISSED)                            .
      /* returns false */                      .
          .                            /* sees MISSED = 1 */
          .                            /* so qdisc not empty */
          .                            __qdisc_run()
          .                                    .
          .                              pfifo_fast_dequeue()
 ----> /* may be done here */                  .
|         .                                clear(MISSED)
|         .                                    .
|         .                                smp_mb __after_atomic();
|         .                                    .
|         .                                /* recheck the queue */
|         .                                /* nothing => exit   */
|   enqueue(skb1)
|         .
|   qdisc_run_begin()
|         .
|     spin_trylock() /* fail */
|         .
|     smp_mb__before_atomic() /* not enough */
|         .
 ---- if (test_bit(MISSED))
        return false;   /* exit */

In the above scenario, CPU 1 and CPU 2 both try to grab the
qdisc->seqlock at the same time. Only CPU 2 succeeds and enters the
bypass code path, where it emits its skb then calls __qdisc_run().

CPU1 fails, sets MISSED and goes down the traditionnal enqueue() +
dequeue() code path. But when executing qdisc_run_begin() for the
second time, after enqueuing its skbuff, it sees the MISSED bit still
set (by itself) and consequently chooses to exit early without setting
it again nor trying to grab the spinlock again.

Meanwhile CPU2 has seen MISSED = 1, cleared it, checked the queue
and found it empty, so it returned.

At the end of the sequence, we end up with skb1 enqueued in the
backlog, both CPUs out of __dev_xmit_skb(), the MISSED bit not set,
and no __netif_schedule() called made. skb1 will now linger in the
qdisc until somebody later performs a full __qdisc_run(). Associated
to the bypass capacity of the qdisc, and the ability of the TCP layer
to avoid resending packets which it knows are still in the qdisc, this
can lead to serious traffic "holes" in a TCP connection.

We fix this by replacing the smp_mb__before_atomic() / test_bit() /
set_bit() / smp_mb__after_atomic() sequence inside qdisc_run_begin()
by a single test_and_set_bit() call, which is more concise and
enforces the needed memory barriers.

Fixes: 89837eb4 ("net: sched: add barrier to ensure correct ordering for lockless qdisc")
Signed-off-by: NVincent Ray <vray@kalrayinc.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220526001746.2437669-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

a54ce370

26 5月, 2022 1 次提交

amt: fix typo in amt · 4934609d

由 Taehee Yoo 提交于 5月 23, 2022

AMT_MSG_TEARDOWM is defined,
But it should be AMT_MSG_TEARDOWN.

Fixes: b9022b53 ("amt: add control plane of amt interface")
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

4934609d

23 5月, 2022 1 次提交

net: wrap the wireless pointers in struct net_device in an ifdef · c304eddc

由 Jakub Kicinski 提交于 5月 19, 2022

Most protocol-specific pointers in struct net_device are under
a respective ifdef. Wireless is the notable exception. Since
there's a sizable number of custom-built kernels for datacenter
workloads which don't build wireless it seems reasonable to
ifdefy those pointers as well.

While at it move IPv4 and IPv6 pointers up, those are special
for obvious reasons.
Acked-by: NJohannes Berg <johannes@sipsolutions.net>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> # ieee802154
Acked-by: NSven Eckelmann <sven@narfation.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c304eddc

21 5月, 2022 2 次提交

net: Add a second bind table hashed by port and address · d5a42de8

由 Joanne Koong 提交于 5月 19, 2022

We currently have one tcp bind table (bhash) which hashes by port
number only. In the socket bind path, we check for bind conflicts by
traversing the specified port's inet_bind2_bucket while holding the
bucket's spinlock (see inet_csk_get_port() and inet_csk_bind_conflict()).

In instances where there are tons of sockets hashed to the same port
at different addresses, checking for a bind conflict is time-intensive
and can cause softirq cpu lockups, as well as stops new tcp connections
since __inet_inherit_port() also contests for the spinlock.

This patch proposes adding a second bind table, bhash2, that hashes by
port and ip address. Searching the bhash2 table leads to significantly
faster conflict resolution and less time holding the spinlock.
Signed-off-by: NJoanne Koong <joannelkoong@gmail.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Acked-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

d5a42de8

bpf: Add bpf_skc_to_mptcp_sock_proto · 3bc253c2

由 Geliang Tang 提交于 5月 19, 2022

This patch implements a new struct bpf_func_proto, named
bpf_skc_to_mptcp_sock_proto. Define a new bpf_id BTF_SOCK_TYPE_MPTCP,
and a new helper bpf_skc_to_mptcp_sock(), which invokes another new
helper bpf_mptcp_sock_from_subflow() in net/mptcp/bpf.c to get struct
mptcp_sock from a given subflow socket.

v2: Emit BTF type, add func_id checks in verifier.c and bpf_trace.c,
remove build check for CONFIG_BPF_JIT
v5: Drop EXPORT_SYMBOL (Martin)
Co-developed-by: NNicolas Rybowski <nicolas.rybowski@tessares.net>
Co-developed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NNicolas Rybowski <nicolas.rybowski@tessares.net>
Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NGeliang Tang <geliang.tang@suse.com>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220519233016.105670-2-mathew.j.martineau@linux.intel.com

3bc253c2

20 5月, 2022 1 次提交

Bluetooth: eir: Add helpers for managing service data · 8f9ae5b3

由 Luiz Augusto von Dentz 提交于 5月 04, 2022

This adds helpers for accessing and appending service data (0x16) ad
type.
Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

8f9ae5b3

19 5月, 2022 1 次提交

tls: Add opt-in zerocopy mode of sendfile() · c1318b39

由 Boris Pismenny 提交于 5月 18, 2022

TLS device offload copies sendfile data to a bounce buffer before
transmitting. It allows to maintain the valid MAC on TLS records when
the file contents change and a part of TLS record has to be
retransmitted on TCP level.

In many common use cases (like serving static files over HTTPS) the file
contents are not changed on the fly. In many use cases breaking the
connection is totally acceptable if the file is changed during
transmission, because it would be received corrupted in any case.

This commit allows to optimize performance for such use cases to
providing a new optional mode of TLS sendfile(), in which the extra copy
is skipped. Removing this copy improves performance significantly, as
TLS and TCP sendfile perform the same operations, and the only overhead
is TLS header/trailer insertion.

The new mode can only be enabled with the new socket option named
TLS_TX_ZEROCOPY_SENDFILE on per-socket basis. It preserves backwards
compatibility with existing applications that rely on the copying
behavior.

The new mode is safe, meaning that unsolicited modifications of the file
being sent can't break integrity of the kernel. The worst thing that can
happen is sending a corrupted TLS record, which is in any case not
forbidden when using regular TCP sockets.

Sockets other than TLS device offload are not affected by the new socket
option. The actual status of zerocopy sendfile can be queried with
sock_diag.

Performance numbers in a single-core test with 24 HTTPS streams on
nginx, under 100% CPU load:

* non-zerocopy: 33.6 Gbit/s
* zerocopy: 79.92 Gbit/s

CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
Signed-off-by: NBoris Pismenny <borisp@nvidia.com>
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220518092731.1243494-1-maximmi@nvidia.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>

c1318b39

17 5月, 2022 1 次提交

cfg80211: fix kernel-doc for cfg80211_beacon_data · ee0e2f51

由 Johannes Berg 提交于 5月 17, 2022

The kernel-doc comment is formatted badly, resulting
in a warning:

  include/net/cfg80211.h:1188: warning: bad line: [...]

Fix that.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

ee0e2f51

16 5月, 2022 1 次提交

netfilter: nf_conncount: reduce unnecessary GC · d2659299

由 William Tu 提交于 5月 04, 2022

Currently nf_conncount can trigger garbage collection (GC)
at multiple places. Each GC process takes a spin_lock_bh
to traverse the nf_conncount_list. We found that when testing
port scanning use two parallel nmap, because the number of
connection increase fast, the nf_conncount_count and its
subsequent call to __nf_conncount_add take too much time,
causing several CPU lockup. This happens when user set the
conntrack limit to +20,000, because the larger the limit,
the longer the list that GC has to traverse.

The patch mitigate the performance issue by avoiding unnecessary
GC with a timestamp. Whenever nf_conncount has done a GC,
a timestamp is updated, and beforce the next time GC is
triggered, we make sure it's more than a jiffies.
By doin this we can greatly reduce the CPU cycles and
avoid the softirq lockup.

To reproduce it in OVS,
$ ovs-appctl dpctl/ct-set-limits zone=1,limit=20000
$ ovs-appctl dpctl/ct-get-limits

At another machine, runs two nmap
$ nmap -p1- <IP>
$ nmap -p1- <IP>
Signed-off-by: NWilliam Tu <u9012063@gmail.com>
Co-authored-by: NYifeng Sun <pkusunyifeng@gmail.com>
Reported-by: NGreg Rose <gvrose8192@gmail.com>
Suggested-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d2659299

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功