提交 · fa0241e9ba3fdb89bcd46fafbcf09bfd60c8d668 · openeuler / Kernel

18 7月, 2022 40 次提交

SUNRPC: Handle low memory situations in call_status() · fa0241e9

由 Trond Myklebust 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 9a45e08636bba2949819cbf4d9ee0b974dd5c7a6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9a45e08636bba2949819cbf4d9ee0b974dd5c7a6

--------------------------------

[ Upstream commit 9d82819d ]

We need to handle ENFILE, ENOBUFS, and ENOMEM, because
xprt_wake_pending_tasks() can be called with any one of these due to
socket creation failures.

Fixes: b61d59ff ("SUNRPC: xs_tcp_connect_worker{4,6}: merge common code")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

fa0241e9

SUNRPC: Handle ENOMEM in call_transmit_status() · 70391d56

由 Trond Myklebust 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 132cbe2f182ab535e0a7fa309d2c73ed6257a246
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=132cbe2f182ab535e0a7fa309d2c73ed6257a246

--------------------------------

[ Upstream commit d3c15033 ]

Both call_transmit() and call_bc_transmit() can now return ENOMEM, so
let's make sure that we handle the errors gracefully.

Fixes: 0472e476 ("SUNRPC: Convert socket page send code to use iov_iter()")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

70391d56

io_uring: don't touch scm_fp_list after queueing skb · 9cf5e90a

由 Pavel Begunkov 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit aed30a2054060e470db02eaaf352c14dc38aa611
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=aed30a2054060e470db02eaaf352c14dc38aa611

--------------------------------

[ Upstream commit a07211e3 ]

It's safer to not touch scm_fp_list after we queued an skb to which it
was assigned, there might be races lurking if we screw subtle sync
guarantees on the io_uring side.

Fixes: 6b06314c ("io_uring: add file set registration")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

9cf5e90a

drbd: Fix five use after free bugs in get_initial_state · e1119e30

由 Lv Yunlong 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 594205b4936771a250f9d141e7e0fff21c3dd2d9
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=594205b4936771a250f9d141e7e0fff21c3dd2d9

--------------------------------

[ Upstream commit aadb22ba ]

In get_initial_state, it calls notify_initial_state_done(skb,..) if
cb->args[5]==1. If genlmsg_put() failed in notify_initial_state_done(),
the skb will be freed by nlmsg_free(skb).
Then get_initial_state will goto out and the freed skb will be used by
return value skb->len, which is a uaf bug.

What's worse, the same problem goes even further: skb can also be
freed in the notify_*_state_change -> notify_*_state calls below.
Thus 4 additional uaf bugs happened.

My patch lets the problem callee functions: notify_initial_state_done
and notify_*_state_change return an error code if errors happen.
So that the error codes could be propagated and the uaf bugs can be avoid.

v2 reports a compilation warning. This v3 fixed this warning and built
successfully in my local environment with no additional warnings.
v2: https://lore.kernel.org/patchwork/patch/1435218/

Fixes: a2972846 ("drbd: Backport the "events2" command")
Signed-off-by: NLv Yunlong <lyl2019@mail.ustc.edu.cn>
Reviewed-by: NChristoph Böhmwalder <christoph.boehmwalder@linbit.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

e1119e30

bpf: Support dual-stack sockets in bpf_tcp_check_syncookie · 22c85679

由 Maxim Mikityanskiy 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 970a6bb72912f2accae52e811a6618e3d4a5b0ff
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=970a6bb72912f2accae52e811a6618e3d4a5b0ff

--------------------------------

[ Upstream commit 2e8702cc ]

bpf_tcp_gen_syncookie looks at the IP version in the IP header and
validates the address family of the socket. It supports IPv4 packets in
AF_INET6 dual-stack sockets.

On the other hand, bpf_tcp_check_syncookie looks only at the address
family of the socket, ignoring the real IP version in headers, and
validates only the packet size. This implementation has some drawbacks:

1. Packets are not validated properly, allowing a BPF program to trick
bpf_tcp_check_syncookie into handling an IPv6 packet on an IPv4
socket.

2. Dual-stack sockets fail the checks on IPv4 packets. IPv4 clients end
up receiving a SYNACK with the cookie, but the following ACK gets
dropped.

This patch fixes these issues by changing the checks in
bpf_tcp_check_syncookie to match the ones in bpf_tcp_gen_syncookie. IP
version from the header is taken into account, and it is validated
properly with address family.

Fixes: 39904084 ("bpf: add helper to check for a valid SYN cookie")
Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Acked-by: NArthur Fabre <afabre@cloudflare.com>
Link: https://lore.kernel.org/bpf/20220406124113.2795730-1-maximmi@nvidia.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

22c85679

spi: bcm-qspi: fix MSPI only access with bcm_qspi_exec_mem_op() · 8d205a67

由 Kamal Dasu 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 6c17f4ef3c4f662865ac99ad167a8b896ca70d2f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6c17f4ef3c4f662865ac99ad167a8b896ca70d2f

--------------------------------

[ Upstream commit 2c7d1b28 ]

This fixes case where MSPI controller is used to access spi-nor
flash and BSPI block is not present.

Fixes: 5f195ee7 ("spi: bcm-qspi: Implement the spi_mem interface")
Signed-off-by: NKamal Dasu <kdasu.kdev@gmail.com>
Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20220328142442.7553-1-kdasu.kdev@gmail.comSigned-off-by: NMark Brown <broonie@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

8d205a67

qede: confirm skb is allocated before using · d44c2828

由 Jamie Bainbridge 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 8928239e5e2e460d95b8a0b89f61671625e7ece0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8928239e5e2e460d95b8a0b89f61671625e7ece0

--------------------------------

[ Upstream commit 4e910dbe ]

qede_build_skb() assumes build_skb() always works and goes straight
to skb_reserve(). However, build_skb() can fail under memory pressure.
This results in a kernel panic because the skb to reserve is NULL.

Add a check in case build_skb() failed to allocate and return NULL.

The NULL return is handled correctly in callers to qede_build_skb().

Fixes: 8a863397 ("qede: Add build_skb() support.")
Signed-off-by: NJamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

d44c2828

net: phy: mscc-miim: reject clause 45 register accesses · 993871ac

由 Michael Walle 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit b7893388bb888cb15c7db0c0f71bd58a4fb5312c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b7893388bb888cb15c7db0c0f71bd58a4fb5312c

--------------------------------

[ Upstream commit 8d90991e ]

The driver doesn't support clause 45 register access yet, but doesn't
check if the access is a c45 one either. This leads to spurious register
reads and writes. Add the check.

Fixes: 542671fe ("net: phy: mscc-miim: Add MDIO driver")
Signed-off-by: NMichael Walle <michael@walle.cc>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

993871ac

rxrpc: fix a race in rxrpc_exit_net() · 16ee25f3

由 Eric Dumazet 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 08ff0e74fab517dbc44e11b8bc683dd4ecc65950
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=08ff0e74fab517dbc44e11b8bc683dd4ecc65950

--------------------------------

[ Upstream commit 1946014c ]

Current code can lead to the following race:

CPU0                                                 CPU1

rxrpc_exit_net()
                                                     rxrpc_peer_keepalive_worker()
                                                       if (rxnet->live)

  rxnet->live = false;
  del_timer_sync(&rxnet->peer_keepalive_timer);

                                                             timer_reduce(&rxnet->peer_keepalive_timer, jiffies + delay);

  cancel_work_sync(&rxnet->peer_keepalive_work);

rxrpc_exit_net() exits while peer_keepalive_timer is still armed,
leading to use-after-free.

syzbot report was:

ODEBUG: free active (active state 0) object type: timer_list hint: rxrpc_peer_keepalive_timeout+0x0/0xb0
WARNING: CPU: 0 PID: 3660 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Modules linked in:
CPU: 0 PID: 3660 Comm: kworker/u4:6 Not tainted 5.17.0-syzkaller-13993-g88e6c020 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd 00 1c 26 8a 4c 89 ee 48 c7 c7 00 10 26 8a e8 b1 e7 28 05 <0f> 0b 83 05 15 eb c5 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3
RSP: 0018:ffffc9000353fb00 EFLAGS: 00010082
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: ffff888029196140 RSI: ffffffff815efad8 RDI: fffff520006a7f52
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815ea4ae R11: 0000000000000000 R12: ffffffff89ce23e0
R13: ffffffff8a2614e0 R14: ffffffff816628c0 R15: dffffc0000000000
FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe1f2908924 CR3: 0000000043720000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __debug_check_no_obj_freed lib/debugobjects.c:992 [inline]
 debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1023
 kfree+0xd6/0x310 mm/slab.c:3809
 ops_free_list.part.0+0x119/0x370 net/core/net_namespace.c:176
 ops_free_list net/core/net_namespace.c:174 [inline]
 cleanup_net+0x591/0xb00 net/core/net_namespace.c:598
 process_one_work+0x996/0x1610 kernel/workqueue.c:2289
 worker_thread+0x665/0x1080 kernel/workqueue.c:2436
 kthread+0x2e9/0x3a0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
 </TASK>

Fixes: ace45bec ("rxrpc: Fix firewall route keepalive")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: linux-afs@lists.infradead.org
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

16ee25f3

net: openvswitch: fix leak of nested actions · b6cea599

由 Ilya Maximets 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 5ae05b5eb58773cfec307ff88aff4cfd843c4cff
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5ae05b5eb58773cfec307ff88aff4cfd843c4cff

--------------------------------

[ Upstream commit 1f30fb91 ]

While parsing user-provided actions, openvswitch module may dynamically
allocate memory and store pointers in the internal copy of the actions.
So this memory has to be freed while destroying the actions.

Currently there are only two such actions: ct() and set().  However,
there are many actions that can hold nested lists of actions and
ovs_nla_free_flow_actions() just jumps over them leaking the memory.

For example, removal of the flow with the following actions will lead
to a leak of the memory allocated by nf_ct_tmpl_alloc():

  actions:clone(ct(commit),0)

Non-freed set() action may also leak the 'dst' structure for the
tunnel info including device references.

Under certain conditions with a high rate of flow rotation that may
cause significant memory leak problem (2MB per second in reporter's
case).  The problem is also hard to mitigate, because the user doesn't
have direct control over the datapath flows generated by OVS.

Fix that by iterating over all the nested actions and freeing
everything that needs to be freed recursively.

New build time assertion should protect us from this problem if new
actions will be added in the future.

Unfortunately, openvswitch module doesn't use NLA_F_NESTED, so all
attributes has to be explicitly checked.  sample() and clone() actions
are mixing extra attributes into the user-provided action list.  That
prevents some code generalization too.

Fixes: 34ae932a ("openvswitch: Make tunnel set action attach a metadata dst")
Link: https://mail.openvswitch.org/pipermail/ovs-dev/2022-March/392922.htmlReported-by: NStéphane Graber <stgraber@ubuntu.com>
Signed-off-by: NIlya Maximets <i.maximets@ovn.org>
Acked-by: NAaron Conole <aconole@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

b6cea599

net: openvswitch: don't send internal clone attribute to the userspace. · 25b69957

由 Ilya Maximets 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 42ab401d22de3429f600a48acdd4c0ee5662783c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=42ab401d22de3429f600a48acdd4c0ee5662783c

--------------------------------

[ Upstream commit 3f2a3050 ]

'OVS_CLONE_ATTR_EXEC' is an internal attribute that is used for
performance optimization inside the kernel.  It's added by the kernel
while parsing user-provided actions and should not be sent during the
flow dump as it's not part of the uAPI.

The issue doesn't cause any significant problems to the ovs-vswitchd
process, because reported actions are not really used in the
application lifecycle and only supposed to be shown to a human via
ovs-dpctl flow dump.  However, the action list is still incorrect
and causes the following error if the user wants to look at the
datapath flows:

  # ovs-dpctl add-dp system@ovs-system
  # ovs-dpctl add-flow "<flow match>" "clone(ct(commit),0)"
  # ovs-dpctl dump-flows
  <flow match>, packets:0, bytes:0, used:never,
    actions:clone(bad length 4, expected -1 for: action0(01 00 00 00),
                  ct(commit),0)

With the fix:

  # ovs-dpctl dump-flows
  <flow match>, packets:0, bytes:0, used:never,
    actions:clone(ct(commit),0)

Additionally fixed an incorrect attribute name in the comment.

Fixes: b2335040 ("openvswitch: kernel datapath clone action")
Signed-off-by: NIlya Maximets <i.maximets@ovn.org>
Acked-by: NAaron Conole <aconole@redhat.com>
Link: https://lore.kernel.org/r/20220404104150.2865736-1-i.maximets@ovn.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

25b69957

ice: synchronize_rcu() when terminating rings · 3fa6cb4b

由 Maciej Fijalkowski 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit e54ea8fc51cae0232982f78c1b168d3b89a46ad2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e54ea8fc51cae0232982f78c1b168d3b89a46ad2

--------------------------------

[ Upstream commit f9124c68 ]

Unfortunately, the ice driver doesn't respect the RCU critical section that
XSK wakeup is surrounded with. To fix this, add synchronize_rcu() calls to
paths that destroy resources that might be in use.

This was addressed in other AF_XDP ZC enabled drivers, for reference see
for example commit b3873a5b ("net/i40e: Fix concurrency issues
between config flow and XSK")

Fixes: efc2214b ("ice: Add support for XDP")
Fixes: 2d4238f5 ("ice: Add support for AF_XDP")
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NShwetha Nagaraju <shwetha.nagaraju@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

3fa6cb4b

ipv6: Fix stats accounting in ip6_pkt_drop · a78dd66e

由 David Ahern 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit e3dd1202ab2ef81a69dbc20e18945b369dddea81
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e3dd1202ab2ef81a69dbc20e18945b369dddea81

--------------------------------

[ Upstream commit 1158f79f ]

VRF devices are the loopbacks for VRFs, and a loopback can not be
assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should
be '||' not '&&'.

Fixes: 1d3fd8a1 ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach")
Reported-by: NPudak, Filip <Filip.Pudak@windriver.com>
Reported-by: NXiao, Jiguang <Jiguang.Xiao@windriver.com>
Signed-off-by: NDavid Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220404150908.2937-1-dsahern@kernel.orgSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

a78dd66e

ice: Do not skip not enabled queues in ice_vc_dis_qs_msg · 97084648

由 Anatolii Gerasymenko 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit ffce126c952e0b849cf280e0344bc087adc7256c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ffce126c952e0b849cf280e0344bc087adc7256c

--------------------------------

[ Upstream commit 05ef6813 ]

Disable check for queue being enabled in ice_vc_dis_qs_msg, because
there could be a case when queues were created, but were not enabled.
We still need to delete those queues.

Normal workflow for VF looks like:
Enable path:
VIRTCHNL_OP_ADD_ETH_ADDR (opcode 10)
VIRTCHNL_OP_CONFIG_VSI_QUEUES (opcode 6)
VIRTCHNL_OP_ENABLE_QUEUES (opcode 8)

Disable path:
VIRTCHNL_OP_DISABLE_QUEUES (opcode 9)
VIRTCHNL_OP_DEL_ETH_ADDR (opcode 11)

The issue appears only in stress conditions when VF is enabled and
disabled very fast.
Eventually there will be a case, when queues are created by
VIRTCHNL_OP_CONFIG_VSI_QUEUES, but are not enabled by
VIRTCHNL_OP_ENABLE_QUEUES.
In turn, these queues are not deleted by VIRTCHNL_OP_DISABLE_QUEUES,
because there is a check whether queues are enabled in
ice_vc_dis_qs_msg.

When we bring up the VF again, we will see the "Failed to set LAN Tx queue
context" error during VIRTCHNL_OP_CONFIG_VSI_QUEUES step. This
happens because old 16 queues were not deleted and VF requests to create
16 more, but ice_sched_get_free_qparent in ice_ena_vsi_txq would fail to
find a parent node for first newly requested queue (because all nodes
are allocated to 16 old queues).

Testing Hints:

Just enable and disable VF fast enough, so it would be disabled before
reaching VIRTCHNL_OP_ENABLE_QUEUES.

while true; do
        ip link set dev ens785f0v0 up
        sleep 0.065 # adjust delay value for you machine
        ip link set dev ens785f0v0 down
done

Fixes: 77ca27c4 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap")
Signed-off-by: NAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: NAlice Michael <alice.michael@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

97084648

ice: Set txq_teid to ICE_INVAL_TEID on ring creation · c5bafbbd

由 Anatolii Gerasymenko 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit b003fc4913ea79c3c1458879acb474c66240489a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b003fc4913ea79c3c1458879acb474c66240489a

--------------------------------

[ Upstream commit ccfee182 ]

When VF is freshly created, but not brought up, ring->txq_teid
value is by default set to 0.
But 0 is a valid TEID. On some platforms the Root Node of
Tx scheduler has a TEID = 0. This can cause issues as shown below.

The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF).

Testing Hints:
echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
ip link set dev ens785f0v0 up
ip link set dev ens785f0v0 down

If we have freshly created VF and quickly turn it on and off, so there
would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then
VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error:
[  639.531454] disable queue 89 failed 14
[  639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR
[  639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5

The reason for the fail is that we are trying to send AQ command to
delete queue 89, which has never been created and receive an "invalid
argument" error from firmware.

As this queue has never been created, it's teid and ring->txq_teid
have default value 0.
ice_dis_vsi_txq has a check against non-existent queues:

node = ice_sched_find_node_by_teid(pi->root, q_teids[i]);
if (!node)
	continue;

But on some platforms the Root Node of Tx scheduler has a teid = 0.
Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is
pi->root), and we go further to submit an erroneous request to firmware.

Fixes: 37bb8390 ("ice: Move common functions out of ice_main.c part 7/7")
Signed-off-by: NAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Reviewed-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: NAlice Michael <alice.michael@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

c5bafbbd

dpaa2-ptp: Fix refcount leak in dpaa2_ptp_probe · a18d2070

由 Miaoqian Lin 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit ebd1e3458dbf6643aa0272da398cb5b537d492b7
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ebd1e3458dbf6643aa0272da398cb5b537d492b7

--------------------------------

[ Upstream commit 2b04bd4f ]

This node pointer is returned by of_find_compatible_node() with
refcount incremented. Calling of_node_put() to aovid the refcount leak.

Fixes: d346c9e8 ("dpaa2-ptp: reuse ptp_qoriq driver")
Signed-off-by: NMiaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20220404125336.13427-1-linmq006@gmail.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

a18d2070

IB/rdmavt: add lock to call to rvt_error_qp to prevent a race condition · d087ba14

由 Niels Dossche 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 43c2d7890ecabe527448a6c391fb2d9a5e6bbfe0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=43c2d7890ecabe527448a6c391fb2d9a5e6bbfe0

--------------------------------

[ Upstream commit 4d809f69 ]

The documentation of the function rvt_error_qp says both r_lock and s_lock
need to be held when calling that function.  It also asserts using lockdep
that both of those locks are held.  However, the commit I referenced in
Fixes accidentally makes the call to rvt_error_qp in rvt_ruc_loopback no
longer covered by r_lock.  This results in the lockdep assertion failing
and also possibly in a race condition.

Fixes: d757c60e ("IB/rdmavt: Fix concurrency panics in QP post_send and modify to error")
Link: https://lore.kernel.org/r/20220228165330.41546-1-dossche.niels@gmail.comSigned-off-by: NNiels Dossche <dossche.niels@gmail.com>
Acked-by: NDennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

d087ba14

RDMA/mlx5: Don't remove cache MRs when a delay is needed · 3889021d

由 Aharon Landau 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 3a57babfb6e9871b961e3c36e616af87df4ef1e8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3a57babfb6e9871b961e3c36e616af87df4ef1e8

--------------------------------

[ Upstream commit 84c2362f ]

Don't remove MRs from the cache if need to delay the removal.

Fixes: b9358bdb ("RDMA/mlx5: Fix locking in MR cache work queue")
Link: https://lore.kernel.org/r/c3087a90ff362c8796c7eaa2715128743ce36722.1649062436.git.leonro@nvidia.comSigned-off-by: NAharon Landau <aharonl@nvidia.com>
Reviewed-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

3889021d

sfc: Do not free an empty page_ring · 4923c9da

由 Martin Habets 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit d8992b393f974b47c4a72b9bacdbfc4110e29eea
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d8992b393f974b47c4a72b9bacdbfc4110e29eea

--------------------------------

[ Upstream commit 458f5d92 ]

When the page_ring is not used page_ptr_mask is 0.
Do not dereference page_ring[0] in this case.

Fixes: 2768935a ("sfc: reuse pages to avoid DMA mapping/unmapping costs")
Reported-by: NTaehee Yoo <ap420073@gmail.com>
Signed-off-by: NMartin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

4923c9da

bnxt_en: reserve space inside receive page for skb_shared_info · d1a5b626

由 Andy Gospodarek 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 0ac74169ebc34686a6ef2a033473a4913aa3e471
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0ac74169ebc34686a6ef2a033473a4913aa3e471

--------------------------------

[ Upstream commit facc173c ]

Insufficient space was being reserved in the page used for packet
reception, so the interface MTU could be set too large to still have
room for the contents of the packet when doing XDP redirect.  This
resulted in the following message when redirecting a packet between
3520 and 3822 bytes with an MTU of 3822:

[311815.561880] XDP_WARN: xdp_update_frame_from_buff(line:200): Driver BUG: missing reserved tailroom

Fixes: f18c2b77 ("bnxt_en: optimized XDP_REDIRECT support")
Reviewed-by: NSomnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: NPavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: NAndy Gospodarek <gospo@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

d1a5b626

drm/imx: Fix memory leak in imx_pd_connector_get_modes · 861e0d90

由 José Expósito 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit f8b0ef0a5889890b50482b6593bd8de544736351
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f8b0ef0a5889890b50482b6593bd8de544736351

--------------------------------

[ Upstream commit bce81feb ]

Avoid leaking the display mode variable if of_get_drm_display_mode
fails.

Fixes: 76ecd9c9 ("drm/imx: parallel-display: check return code from of_get_drm_display_mode()")
Addresses-Coverity-ID: 1443943 ("Resource leak")
Signed-off-by: NJosé Expósito <jose.exposito89@gmail.com>
Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>
Link: https://lore.kernel.org/r/20220108165230.44610-1-jose.exposito89@gmail.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

861e0d90

drm/imx: imx-ldb: Check for null pointer after calling kmemdup · ac294abf

由 Jiasheng Jiang 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 25bc9fd4c8d13036c0656a00ebedc2cff4650a91
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=25bc9fd4c8d13036c0656a00ebedc2cff4650a91

--------------------------------

[ Upstream commit 8027a9ad ]

As the possible failure of the allocation, kmemdup() may return NULL
pointer.
Therefore, it should be better to check the return value of kmemdup()
and return error if fails.

Fixes: dc80d703 ("drm/imx-ldb: Add support to drm-bridge")
Signed-off-by: NJiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>
Link: https://lore.kernel.org/r/20220105074729.2363657-1-jiasheng@iscas.ac.cnSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

ac294abf

net: stmmac: Fix unset max_speed difference between DT and non-DT platforms · 1ecc903f

由 Chen-Yu Tsai 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 02ab4abe5bbf60f6d0e020a974e0c40e42984a1a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=02ab4abe5bbf60f6d0e020a974e0c40e42984a1a

--------------------------------

[ Upstream commit c21cabb0 ]

In commit 9cbadf09 ("net: stmmac: support max-speed device tree
property"), when DT platforms don't set "max-speed", max_speed is set to
-1; for non-DT platforms, it stays the default 0.

Prior to commit eeef2f6b ("net: stmmac: Start adding phylink support"),
the check for a valid max_speed setting was to check if it was greater
than zero. This commit got it right, but subsequent patches just checked
for non-zero, which is incorrect for DT platforms.

In commit 92c3807b ("net: stmmac: convert to phylink_get_linkmodes()")
the conversion switched completely to checking for non-zero value as a
valid value, which caused 1000base-T to stop getting advertised by
default.

Instead of trying to fix all the checks, simply leave max_speed alone if
DT property parsing fails.

Fixes: 9cbadf09 ("net: stmmac: support max-speed device tree property")
Fixes: 92c3807b ("net: stmmac: convert to phylink_get_linkmodes()")
Signed-off-by: NChen-Yu Tsai <wens@csie.org>
Acked-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20220331184832.16316-1-wens@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

1ecc903f

net: ipv4: fix route with nexthop object delete warning · 7aa1bfa2

由 Nikolay Aleksandrov 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 63ea57478aaa3e06a597081a0f537318fc04e49f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=63ea57478aaa3e06a597081a0f537318fc04e49f

--------------------------------

[ Upstream commit 6bf92d70 ]

FRR folks have hit a kernel warning[1] while deleting routes[2] which is
caused by trying to delete a route pointing to a nexthop id without
specifying nhid but matching on an interface. That is, a route is found
but we hit a warning while matching it. The warning is from
fib_info_nh() in include/net/nexthop.h because we run it on a fib_info
with nexthop object. The call chain is:
 inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a
nexthop fib_info and also with fc_oif set thus calling fib_info_nh on
the fib_info and triggering the warning). The fix is to not do any
matching in that branch if the fi has a nexthop object because those are
managed separately. I.e. we should match when deleting without nh spec and
should fail when deleting a nexthop route with old-style nh spec because
nexthop objects are managed separately, e.g.:
 $ ip r show 1.2.3.4/32
 1.2.3.4 nhid 12 via 192.168.11.2 dev dummy0

 $ ip r del 1.2.3.4/32
 $ ip r del 1.2.3.4/32 nhid 12
 <both should work>

 $ ip r del 1.2.3.4/32 dev dummy0
 <should fail with ESRCH>

[1]
 [  523.462226] ------------[ cut here ]------------
 [  523.462230] WARNING: CPU: 14 PID: 22893 at include/net/nexthop.h:468 fib_nh_match+0x210/0x460
 [  523.462236] Modules linked in: dummy rpcsec_gss_krb5 xt_socket nf_socket_ipv4 nf_socket_ipv6 ip6table_raw iptable_raw bpf_preload xt_statistic ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_mark nf_tables xt_nat veth nf_conntrack_netlink nfnetlink xt_addrtype br_netfilter overlay dm_crypt nfsv3 nfs fscache netfs vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack 8021q garp mrp ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc rfcomm snd_seq_dummy snd_hrtimer rpcrdma rdma_cm iw_cm ib_cm ib_core ip6table_filter xt_comment ip6_tables vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) qrtr bnep binfmt_misc xfs vfat fat squashfs loop nvidia_drm(POE) nvidia_modeset(POE) nvidia_uvm(POE) nvidia(POE) intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi btusb btrtl iwlmvm uvcvideo btbcm snd_hda_intel edac_mce_amd
 [  523.462274]  videobuf2_vmalloc videobuf2_memops btintel snd_intel_dspcfg videobuf2_v4l2 snd_intel_sdw_acpi bluetooth snd_usb_audio snd_hda_codec mac80211 snd_usbmidi_lib joydev snd_hda_core videobuf2_common kvm_amd snd_rawmidi snd_hwdep snd_seq videodev ccp snd_seq_device libarc4 ecdh_generic mc snd_pcm kvm iwlwifi snd_timer drm_kms_helper snd cfg80211 cec soundcore irqbypass rapl wmi_bmof i2c_piix4 rfkill k10temp pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc drm zram ip_tables crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel nvme sp5100_tco r8169 nvme_core wmi ipmi_devintf ipmi_msghandler fuse
 [  523.462300] CPU: 14 PID: 22893 Comm: ip Tainted: P           OE     5.16.18-200.fc35.x86_64 #1
 [  523.462302] Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING EDGE WIFI (MS-7C37), BIOS 1.C0 10/29/2020
 [  523.462303] RIP: 0010:fib_nh_match+0x210/0x460
 [  523.462304] Code: 7c 24 20 48 8b b5 90 00 00 00 e8 bb ee f4 ff 48 8b 7c 24 20 41 89 c4 e8 ee eb f4 ff 45 85 e4 0f 85 2e fe ff ff e9 4c ff ff ff <0f> 0b e9 17 ff ff ff 3c 0a 0f 85 61 fe ff ff 48 8b b5 98 00 00 00
 [  523.462306] RSP: 0018:ffffaa53d4d87928 EFLAGS: 00010286
 [  523.462307] RAX: 0000000000000000 RBX: ffffaa53d4d87a90 RCX: ffffaa53d4d87bb0
 [  523.462308] RDX: ffff9e3d2ee6be80 RSI: ffffaa53d4d87a90 RDI: ffffffff920ed380
 [  523.462309] RBP: ffff9e3d2ee6be80 R08: 0000000000000064 R09: 0000000000000000
 [  523.462310] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000031
 [  523.462310] R13: 0000000000000020 R14: 0000000000000000 R15: ffff9e3d331054e0
 [  523.462311] FS:  00007f245517c1c0(0000) GS:ffff9e492ed80000(0000) knlGS:0000000000000000
 [  523.462313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [  523.462313] CR2: 000055e5dfdd8268 CR3: 00000003ef488000 CR4: 0000000000350ee0
 [  523.462315] Call Trace:
 [  523.462316]  <TASK>
 [  523.462320]  fib_table_delete+0x1a9/0x310
 [  523.462323]  inet_rtm_delroute+0x93/0x110
 [  523.462325]  rtnetlink_rcv_msg+0x133/0x370
 [  523.462327]  ? _copy_to_iter+0xb5/0x6f0
 [  523.462330]  ? rtnl_calcit.isra.0+0x110/0x110
 [  523.462331]  netlink_rcv_skb+0x50/0xf0
 [  523.462334]  netlink_unicast+0x211/0x330
 [  523.462336]  netlink_sendmsg+0x23f/0x480
 [  523.462338]  sock_sendmsg+0x5e/0x60
 [  523.462340]  ____sys_sendmsg+0x22c/0x270
 [  523.462341]  ? import_iovec+0x17/0x20
 [  523.462343]  ? sendmsg_copy_msghdr+0x59/0x90
 [  523.462344]  ? __mod_lruvec_page_state+0x85/0x110
 [  523.462348]  ___sys_sendmsg+0x81/0xc0
 [  523.462350]  ? netlink_seq_start+0x70/0x70
 [  523.462352]  ? __dentry_kill+0x13a/0x180
 [  523.462354]  ? __fput+0xff/0x250
 [  523.462356]  __sys_sendmsg+0x49/0x80
 [  523.462358]  do_syscall_64+0x3b/0x90
 [  523.462361]  entry_SYSCALL_64_after_hwframe+0x44/0xae
 [  523.462364] RIP: 0033:0x7f24552aa337
 [  523.462365] Code: 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
 [  523.462366] RSP: 002b:00007fff7f05a838 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 [  523.462368] RAX: ffffffffffffffda RBX: 000000006245bf91 RCX: 00007f24552aa337
 [  523.462368] RDX: 0000000000000000 RSI: 00007fff7f05a8a0 RDI: 0000000000000003
 [  523.462369] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
 [  523.462370] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000001
 [  523.462370] R13: 00007fff7f05ce08 R14: 0000000000000000 R15: 000055e5dfdd1040
 [  523.462373]  </TASK>
 [  523.462374] ---[ end trace ba537bc16f6bf4ed ]---

[2] https://github.com/FRRouting/frr/issues/6412

Fixes: 4c7e8084 ("ipv4: Plumb support for nexthop object in a fib_info")
Signed-off-by: NNikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

7aa1bfa2

ice: Clear default forwarding VSI during VSI release · 4052593d

由 Ivan Vecera 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 4be6ed03107b6e245f5ee0d9e273c868fda66154
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4be6ed03107b6e245f5ee0d9e273c868fda66154

--------------------------------

[ Upstream commit bd8c624c ]

VSI is set as default forwarding one when promisc mode is set for
PF interface, when PF is switched to switchdev mode or when VF
driver asks to enable allmulticast or promisc mode for the VF
interface (when vf-true-promisc-support priv flag is off).
The third case is buggy because in that case VSI associated with
VF remains as default one after VF removal.

Reproducer:
1. Create VF
   echo 1 > sys/class/net/ens7f0/device/sriov_numvfs
2. Enable allmulticast or promisc mode on VF
   ip link set ens7f0v0 allmulticast on
   ip link set ens7f0v0 promisc on
3. Delete VF
   echo 0 > sys/class/net/ens7f0/device/sriov_numvfs
4. Try to enable promisc mode on PF
   ip link set ens7f0 promisc on

Although it looks that promisc mode on PF is enabled the opposite
is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC
handling first checks if any other VSI is set as default forwarding
one and if so the function does not do anything. At this point
it is not possible to enable promisc mode on PF without re-probe
device.

To resolve the issue this patch clear default forwarding VSI
during ice_vsi_release() when the VSI to be released is the default
one.

Fixes: 01b5e89a ("ice: Add VF promiscuous support")
Signed-off-by: NIvan Vecera <ivecera@redhat.com>
Reviewed-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: NAlice Michael <alice.michael@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

4052593d

scsi: zorro7xx: Fix a resource leak in zorro7xx_remove_one() · c10e11a7

由 Christophe JAILLET 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit c5f77b595379b5191316edd365a542f8b1526066
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c5f77b595379b5191316edd365a542f8b1526066

--------------------------------

[ Upstream commit 16ed828b ]

The error handling path of the probe releases a resource that is not freed
in the remove function. In some cases, a ioremap() must be undone.

Add the missing iounmap() call in the remove function.

Link: https://lore.kernel.org/r/247066a3104d25f9a05de8b3270fc3c848763bcc.1647673264.git.christophe.jaillet@wanadoo.fr
Fixes: 45804fbb ("[SCSI] 53c700: Amiga Zorro NCR53c710 SCSI")
Reviewed-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

c10e11a7

Drivers: hv: vmbus: Fix potential crash on module unload · 909b3297

由 Guilherme G. Piccoli 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit cf580d2e3884dbafd6b90269b03a24d661578624
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cf580d2e3884dbafd6b90269b03a24d661578624

--------------------------------

[ Upstream commit 792f232d ]

The vmbus driver relies on the panic notifier infrastructure to perform
some operations when a panic event is detected. Since vmbus can be built
as module, it is required that the driver handles both registering and
unregistering such panic notifier callback.

After commit 74347a99 ("x86/Hyper-V: Unload vmbus channel in hv panic callback")
though, the panic notifier registration is done unconditionally in the module
initialization routine whereas the unregistering procedure is conditionally
guarded and executes only if HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE capability
is set.

This patch fixes that by unconditionally unregistering the panic notifier
in the module's exit routine as well.

Fixes: 74347a99 ("x86/Hyper-V: Unload vmbus channel in hv panic callback")
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220315203535.682306-1-gpiccoli@igalia.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

909b3297

drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire() · bff89179

由 Dan Carpenter 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 0c122eb3a109356fb2c39f49c1768321248d0f1d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0c122eb3a109356fb2c39f49c1768321248d0f1d

--------------------------------

[ Upstream commit 1647b54e ]

This post-op should be a pre-op so that we do not pass -1 as the bit
number to test_bit().  The current code will loop downwards from 63 to
-1.  After changing to a pre-op, it loops from 63 to 0.

Fixes: 71c37505 ("drm/amdgpu/gfx: move more common KIQ code to amdgpu_gfx.c")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

bff89179

Revert "hv: utils: add PTP_1588_CLOCK to Kconfig to fix build" · 91aae60f

由 Sasha Levin 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 84e5dfc05f37bbb7203af4ec6bd18a17d37abacf
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=84e5dfc05f37bbb7203af4ec6bd18a17d37abacf

--------------------------------

This reverts commit c4dc584a2d4c8d74b054f09d67e0a076767bdee5.

On Sat, Apr 09, 2022 at 09:07:51AM -0700, Randy Dunlap wrote:
>According to https://bugzilla.kernel.org/show_bug.cgi?id=215823,
>c4dc584a2d4c8d74b054f09d67e0a076767bdee5 ("hv: utils: add PTP_1588_CLOCK to Kconfig to fix build")
>is a problem for 5.10 since CONFIG_PTP_1588_CLOCK_OPTIONAL does not exist in 5.10.
>This prevents the hyper-V NIC timestamping from working, so please revert that commit.
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

91aae60f

mm: fix race between MADV_FREE reclaim and blkdev direct IO read · b873d752

由 Mauricio Faria de Oliveira 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 3c3fbfa6dddb022e91be18902c3785a82aa4867d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3c3fbfa6dddb022e91be18902c3785a82aa4867d

--------------------------------

commit 6c8e2a25 upstream.

Problem:
Reviewed-by: NWei Li <liwei391@huawei.com>

=======

Userspace might read the zero-page instead of actual data from a direct IO
read on a block device if the buffers have been called madvise(MADV_FREE)
on earlier (this is discussed below) due to a race between page reclaim on
MADV_FREE and blkdev direct IO read.

- Race condition:
  ==============

During page reclaim, the MADV_FREE page check in try_to_unmap_one() checks
if the page is not dirty, then discards its rmap PTE(s) (vs.  remap back
if the page is dirty).

However, after try_to_unmap_one() returns to shrink_page_list(), it might
keep the page _anyway_ if page_ref_freeze() fails (it expects exactly
_one_ page reference, from the isolation for page reclaim).

Well, blkdev_direct_IO() gets references for all pages, and on READ
operations it only sets them dirty _later_.

So, if MADV_FREE'd pages (i.e., not dirty) are used as buffers for direct
IO read from block devices, and page reclaim happens during
__blkdev_direct_IO[_simple]() exactly AFTER bio_iov_iter_get_pages()
returns, but BEFORE the pages are set dirty, the situation happens.

The direct IO read eventually completes.  Now, when userspace reads the
buffers, the PTE is no longer there and the page fault handler
do_anonymous_page() services that with the zero-page, NOT the data!

A synthetic reproducer is provided.

- Page faults:
  ===========

If page reclaim happens BEFORE bio_iov_iter_get_pages() the issue doesn't
happen, because that faults-in all pages as writeable, so
do_anonymous_page() sets up a new page/rmap/PTE, and that is used by
direct IO.  The userspace reads don't fault as the PTE is there (thus
zero-page is not used/setup).

But if page reclaim happens AFTER it / BEFORE setting pages dirty, the PTE
is no longer there; the subsequent page faults can't help:

The data-read from the block device probably won't generate faults due to
DMA (no MMU) but even in the case it wouldn't use DMA, that happens on
different virtual addresses (not user-mapped addresses) because `struct
bio_vec` stores `struct page` to figure addresses out (which are different
from user-mapped addresses) for the read.

Thus userspace reads (to user-mapped addresses) still fault, then
do_anonymous_page() gets another `struct page` that would address/ map to
other memory than the `struct page` used by `struct bio_vec` for the read.
(The original `struct page` is not available, since it wasn't freed, as
page_ref_freeze() failed due to more page refs.  And even if it were
available, its data cannot be trusted anymore.)

Solution:
========

One solution is to check for the expected page reference count in
try_to_unmap_one().

There should be one reference from the isolation (that is also checked in
shrink_page_list() with page_ref_freeze()) plus one or more references
from page mapping(s) (put in discard: label).  Further references mean
that rmap/PTE cannot be unmapped/nuked.

(Note: there might be more than one reference from mapping due to
fork()/clone() without CLONE_VM, which use the same `struct page` for
references, until the copy-on-write page gets copied.)

So, additional page references (e.g., from direct IO read) now prevent the
rmap/PTE from being unmapped/dropped; similarly to the page is not freed
per shrink_page_list()/page_ref_freeze()).

- Races and Barriers:
  ==================

The new check in try_to_unmap_one() should be safe in races with
bio_iov_iter_get_pages() in get_user_pages() fast and slow paths, as it's
done under the PTE lock.

The fast path doesn't take the lock, but it checks if the PTE has changed
and if so, it drops the reference and leaves the page for the slow path
(which does take that lock).

The fast path requires synchronization w/ full memory barrier: it writes
the page reference count first then it reads the PTE later, while
try_to_unmap() writes PTE first then it reads page refcount.

And a second barrier is needed, as the page dirty flag should not be read
before the page reference count (as in __remove_mapping()).  (This can be
a load memory barrier only; no writes are involved.)

Call stack/comments:

- try_to_unmap_one()
  - page_vma_mapped_walk()
    - map_pte()			# see pte_offset_map_lock():
        pte_offset_map()
        spin_lock()

  - ptep_get_and_clear()	# write PTE
  - smp_mb()			# (new barrier) GUP fast path
  - page_ref_count()		# (new check) read refcount

  - page_vma_mapped_walk_done()	# see pte_unmap_unlock():
      pte_unmap()
      spin_unlock()

- bio_iov_iter_get_pages()
  - __bio_iov_iter_get_pages()
    - iov_iter_get_pages()
      - get_user_pages_fast()
        - internal_get_user_pages_fast()

          # fast path
          - lockless_pages_from_mm()
            - gup_{pgd,p4d,pud,pmd,pte}_range()
                ptep = pte_offset_map()		# not _lock()
                pte = ptep_get_lockless(ptep)

                page = pte_page(pte)
                try_grab_compound_head(page)	# inc refcount
                                            	# (RMW/barrier
                                             	#  on success)

                if (pte_val(pte) != pte_val(*ptep)) # read PTE
                        put_compound_head(page) # dec refcount
                        			# go slow path

          # slow path
          - __gup_longterm_unlocked()
            - get_user_pages_unlocked()
              - __get_user_pages_locked()
                - __get_user_pages()
                  - follow_{page,p4d,pud,pmd}_mask()
                    - follow_page_pte()
                        ptep = pte_offset_map_lock()
                        pte = *ptep
                        page = vm_normal_page(pte)
                        try_grab_page(page)	# inc refcount
                        pte_unmap_unlock()

- Huge Pages:
  ==========

Regarding transparent hugepages, that logic shouldn't change, as MADV_FREE
(aka lazyfree) pages are PageAnon() && !PageSwapBacked()
(madvise_free_pte_range() -> mark_page_lazyfree() -> lru_lazyfree_fn())
thus should reach shrink_page_list() -> split_huge_page_to_list() before
try_to_unmap[_one](), so it deals with normal pages only.

(And in case unlikely/TTU_SPLIT_HUGE_PMD/split_huge_pmd_address() happens,
which should not or be rare, the page refcount should be greater than
mapcount: the head page is referenced by tail pages.  That also prevents
checking the head `page` then incorrectly call page_remove_rmap(subpage)
for a tail page, that isn't even in the shrink_page_list()'s page_list (an
effect of split huge pmd/pmvw), as it might happen today in this unlikely
scenario.)

MADV_FREE'd buffers:
===================

So, back to the "if MADV_FREE pages are used as buffers" note.  The case
is arguable, and subject to multiple interpretations.

The madvise(2) manual page on the MADV_FREE advice value says:

1) 'After a successful MADV_FREE ... data will be lost when
   the kernel frees the pages.'
2) 'the free operation will be canceled if the caller writes
   into the page' / 'subsequent writes ... will succeed and
   then [the] kernel cannot free those dirtied pages'
3) 'If there is no subsequent write, the kernel can free the
   pages at any time.'

Thoughts, questions, considerations... respectively:

1) Since the kernel didn't actually free the page (page_ref_freeze()
   failed), should the data not have been lost? (on userspace read.)
2) Should writes performed by the direct IO read be able to cancel
   the free operation?
   - Should the direct IO read be considered as 'the caller' too,
     as it's been requested by 'the caller'?
   - Should the bio technique to dirty pages on return to userspace
     (bio_check_pages_dirty() is called/used by __blkdev_direct_IO())
     be considered in another/special way here?
3) Should an upcoming write from a previously requested direct IO
   read be considered as a subsequent write, so the kernel should
   not free the pages? (as it's known at the time of page reclaim.)

And lastly:

Technically, the last point would seem a reasonable consideration and
balance, as the madvise(2) manual page apparently (and fairly) seem to
assume that 'writes' are memory access from the userspace process (not
explicitly considering writes from the kernel or its corner cases; again,
fairly)..  plus the kernel fix implementation for the corner case of the
largely 'non-atomic write' encompassed by a direct IO read operation, is
relatively simple; and it helps.

Reproducer:
==========

@ test.c (simplified, but works)

	#define _GNU_SOURCE
	#include <fcntl.h>
	#include <stdio.h>
	#include <unistd.h>
	#include <sys/mman.h>

	int main() {
		int fd, i;
		char *buf;

		fd = open(DEV, O_RDONLY | O_DIRECT);

		buf = mmap(NULL, BUF_SIZE, PROT_READ | PROT_WRITE,
                	   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

		for (i = 0; i < BUF_SIZE; i += PAGE_SIZE)
			buf[i] = 1; // init to non-zero

		madvise(buf, BUF_SIZE, MADV_FREE);

		read(fd, buf, BUF_SIZE);

		for (i = 0; i < BUF_SIZE; i += PAGE_SIZE)
			printf("%p: 0x%x\n", &buf[i], buf[i]);

		return 0;
	}

@ block/fops.c (formerly fs/block_dev.c)

	+#include <linux/swap.h>
	...
	... __blkdev_direct_IO[_simple](...)
	{
	...
	+	if (!strcmp(current->comm, "good"))
	+		shrink_all_memory(ULONG_MAX);
	+
         	ret = bio_iov_iter_get_pages(...);
	+
	+	if (!strcmp(current->comm, "bad"))
	+		shrink_all_memory(ULONG_MAX);
	...
	}

@ shell

        # NUM_PAGES=4
        # PAGE_SIZE=$(getconf PAGE_SIZE)

        # yes | dd of=test.img bs=${PAGE_SIZE} count=${NUM_PAGES}
        # DEV=$(losetup -f --show test.img)

        # gcc -DDEV=\"$DEV\" \
              -DBUF_SIZE=$((PAGE_SIZE * NUM_PAGES)) \
              -DPAGE_SIZE=${PAGE_SIZE} \
               test.c -o test

        # od -tx1 $DEV
        0000000 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a
        *
        0040000

        # mv test good
        # ./good
        0x7f7c10418000: 0x79
        0x7f7c10419000: 0x79
        0x7f7c1041a000: 0x79
        0x7f7c1041b000: 0x79

        # mv good bad
        # ./bad
        0x7fa1b8050000: 0x0
        0x7fa1b8051000: 0x0
        0x7fa1b8052000: 0x0
        0x7fa1b8053000: 0x0

Note: the issue is consistent on v5.17-rc3, but it's intermittent with the
support of MADV_FREE on v4.5 (60%-70% error; needs swap).  [wrap
do_direct_IO() in do_blockdev_direct_IO() @ fs/direct-io.c].

- v5.17-rc3:

        # for i in {1..1000}; do ./good; done \
            | cut -d: -f2 | sort | uniq -c
           4000  0x79

        # mv good bad
        # for i in {1..1000}; do ./bad; done \
            | cut -d: -f2 | sort | uniq -c
           4000  0x0

        # free | grep Swap
        Swap:             0           0           0

- v4.5:

        # for i in {1..1000}; do ./good; done \
            | cut -d: -f2 | sort | uniq -c
           4000  0x79

        # mv good bad
        # for i in {1..1000}; do ./bad; done \
            | cut -d: -f2 | sort | uniq -c
           2702  0x0
           1298  0x79

        # swapoff -av
        swapoff /swap

        # for i in {1..1000}; do ./bad; done \
            | cut -d: -f2 | sort | uniq -c
           4000  0x79

Ceph/TCMalloc:
=============

For documentation purposes, the use case driving the analysis/fix is Ceph
on Ubuntu 18.04, as the TCMalloc library there still uses MADV_FREE to
release unused memory to the system from the mmap'ed page heap (might be
committed back/used again; it's not munmap'ed.) - PageHeap::DecommitSpan()
-> TCMalloc_SystemRelease() -> madvise() - PageHeap::CommitSpan() ->
TCMalloc_SystemCommit() -> do nothing.

Note: TCMalloc switched back to MADV_DONTNEED a few commits after the
release in Ubuntu 18.04 (google-perftools/gperftools 2.5), so the issue
just 'disappeared' on Ceph on later Ubuntu releases but is still present
in the kernel, and can be hit by other use cases.

The observed issue seems to be the old Ceph bug #22464 [1], where checksum
mismatches are observed (and instrumentation with buffer dumps shows
zero-pages read from mmap'ed/MADV_FREE'd page ranges).

The issue in Ceph was reasonably deemed a kernel bug (comment #50) and
mostly worked around with a retry mechanism, but other parts of Ceph could
still hit that (rocksdb).  Anyway, it's less likely to be hit again as
TCMalloc switched out of MADV_FREE by default.

(Some kernel versions/reports from the Ceph bug, and relation with
the MADV_FREE introduction/changes; TCMalloc versions not checked.)
- 4.4 good
- 4.5 (madv_free: introduction)
- 4.9 bad
- 4.10 good? maybe a swapless system
- 4.12 (madv_free: no longer free instantly on swapless systems)
- 4.13 bad

[1] https://tracker.ceph.com/issues/22464

Thanks:
======

Several people contributed to analysis/discussions/tests/reproducers in
the first stages when drilling down on ceph/tcmalloc/linux kernel:

- Dan Hill
- Dan Streetman
- Dongdong Tao
- Gavin Guo
- Gerald Yang
- Heitor Alves de Siqueira
- Ioanna Alifieraki
- Jay Vosburgh
- Matthew Ruffell
- Ponnuvel Palaniyappan

Reviews, suggestions, corrections, comments:

- Minchan Kim
- Yu Zhao
- Huang, Ying
- John Hubbard
- Christoph Hellwig

[mfo@canonical.com: v4]
  Link: https://lkml.kernel.org/r/20220209202659.183418-1-mfo@canonical.comLink: https://lkml.kernel.org/r/20220131230255.789059-1-mfo@canonical.com

Fixes: 802a3a92 ("mm: reclaim MADV_FREE pages")
Signed-off-by: NMauricio Faria de Oliveira <mfo@canonical.com>
Reviewed-by: N"Huang, Ying" <ying.huang@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Dan Hill <daniel.hill@canonical.com>
Cc: Dan Streetman <dan.streetman@canonical.com>
Cc: Dongdong Tao <dongdong.tao@canonical.com>
Cc: Gavin Guo <gavin.guo@canonical.com>
Cc: Gerald Yang <gerald.yang@canonical.com>
Cc: Heitor Alves de Siqueira <halves@canonical.com>
Cc: Ioanna Alifieraki <ioanna-maria.alifieraki@canonical.com>
Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
Cc: Matthew Ruffell <matthew.ruffell@canonical.com>
Cc: Ponnuvel Palaniyappan <ponnuvel.palaniyappan@canonical.com>
Cc: <stable@vger.kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
[mfo: backport: replace folio/test_flag with page/flag equivalents;
 real Fixes: 854e9ed0 ("mm: support madvise(MADV_FREE)") in v4.]
Signed-off-by: NMauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b873d752

parisc: Fix patch code locking and flushing · a6f7b8d4

由 John David Anglin 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 1753a49e266d8f0433e60d4b59f12c2b2497646a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1753a49e266d8f0433e60d4b59f12c2b2497646a

--------------------------------

[ Upstream commit a9fe7fa7 ]

This change fixes the following:

1) The flags variable is not initialized. Always use raw_spin_lock_irqsave
and raw_spin_unlock_irqrestore to serialize patching.

2) flush_kernel_vmap_range is primarily intended for DMA flushes. Since
__patch_text_multiple is often called with interrupts disabled, it is
better to directly call flush_kernel_dcache_range_asm and
flush_kernel_icache_range_asm. This avoids an extra call.

3) The final call to flush_icache_range is unnecessary.
Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
Signed-off-by: NHelge Deller <deller@gmx.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

a6f7b8d4

parisc: Fix CPU affinity for Lasi, WAX and Dino chips · 2cfbd978

由 Helge Deller 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit f7c35220305f273bddc0bdaf1e453b4ca280f145
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f7c35220305f273bddc0bdaf1e453b4ca280f145

--------------------------------

[ Upstream commit 939fc856 ]

Add the missing logic to allow Lasi, WAX and Dino to set the
CPU affinity. This fixes IRQ migration to other CPUs when a
CPU is shutdown which currently holds the IRQs for one of those
chips.
Signed-off-by: NHelge Deller <deller@gmx.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

2cfbd978

NFS: Avoid writeback threads getting stuck in mempool_alloc() · 0824355f

由 Trond Myklebust 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit c74e2f6ecc51bd08bb5b0335477dba954a50592e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c74e2f6ecc51bd08bb5b0335477dba954a50592e

--------------------------------

[ Upstream commit 0bae835b ]

In a low memory situation, allow the NFS writeback code to fail without
getting stuck in infinite loops in mempool_alloc().
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

0824355f

NFS: nfsiod should not block forever in mempool_alloc() · 2549ae58

由 Trond Myklebust 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 34681aeddcfcc0c47d45154f5d39310def87c55f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=34681aeddcfcc0c47d45154f5d39310def87c55f

--------------------------------

[ Upstream commit 515dcdcd ]

The concern is that since nfsiod is sometimes required to kick off a
commit, it can get locked up waiting forever in mempool_alloc() instead
of failing gracefully and leaving the commit until later.

Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY,
then fall back to a non-blocking attempt to allocate from the memory
pool.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

2549ae58

SUNRPC: Fix socket waits for write buffer space · 625a6a27

由 Trond Myklebust 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 7a506fabcfe1a58bcdf6dd18fa3fca42635235f0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7a506fabcfe1a58bcdf6dd18fa3fca42635235f0

--------------------------------

[ Upstream commit 7496b59f ]

The socket layer requires that we use the socket lock to protect changes
to the sock->sk_write_pending field and others.
Reported-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

625a6a27

jfs: prevent NULL deref in diFree · a44452c4

由 Haimin Zhang 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit b9c5ac0a15f24d63b20f899072fa6dd8c93af136
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b9c5ac0a15f24d63b20f899072fa6dd8c93af136

--------------------------------

[ Upstream commit a5304629 ]

Add validation check for JFS_IP(ipimap)->i_imap to prevent a NULL deref
in diFree since diFree uses it without do any validations.
When function jfs_mount calls diMount to initialize fileset inode
allocation map, it can fail and JFS_IP(ipimap)->i_imap won't be
initialized. Then it calls diFreeSpecial to close fileset inode allocation
map inode and it will flow into jfs_evict_inode. Function jfs_evict_inode
just validates JFS_SBI(inode->i_sb)->ipimap, then calls diFree. diFree use
JFS_IP(ipimap)->i_imap directly, then it will cause a NULL deref.
Reported-by: NTCS Robot <tcs_robot@tencent.com>
Signed-off-by: NHaimin Zhang <tcs_kernel@tencent.com>
Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

a44452c4

virtio_console: eliminate anonymous module_init & module_exit · 62a0c4b1

由 Randy Dunlap 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit c69b442125bf009fce26e15aa5616caf8a3673c3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c69b442125bf009fce26e15aa5616caf8a3673c3

--------------------------------

[ Upstream commit fefb8a2a ]

Eliminate anonymous module_init() and module_exit(), which can lead to
confusion or ambiguity when reading System.map, crashes/oops/bugs,
or an initcall_debug log.

Give each of these init and exit functions unique driver-specific
names to eliminate the anonymous names.

Example 1: (System.map)
 ffffffff832fc78c t init
 ffffffff832fc79e t init
 ffffffff832fc8f8 t init

Example 2: (initcall_debug log)
 calling  init+0x0/0x12 @ 1
 initcall init+0x0/0x12 returned 0 after 15 usecs
 calling  init+0x0/0x60 @ 1
 initcall init+0x0/0x60 returned 0 after 2 usecs
 calling  init+0x0/0x9a @ 1
 initcall init+0x0/0x9a returned 0 after 74 usecs
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Reviewed-by: NAmit Shah <amit@kernel.org>
Cc: virtualization@lists.linux-foundation.org
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20220316192010.19001-3-rdunlap@infradead.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

62a0c4b1

serial: samsung_tty: do not unlock port->lock for uart_write_wakeup() · 27f1c36b

由 Jiri Slaby 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 3309b322171180828564c386443eb5a62eabc862
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3309b322171180828564c386443eb5a62eabc862

--------------------------------

[ Upstream commit 988c7c00 ]

The commit c15c3747 (serial: samsung: fix potential soft lockup
during uart write) added an unlock of port->lock before
uart_write_wakeup() and a lock after it. It was always problematic to
write data from tty_ldisc_ops::write_wakeup and it was even documented
that way. We fixed the line disciplines to conform to this recently.
So if there is still a missed one, we should fix them instead of this
workaround.

On the top of that, s3c24xx_serial_tx_dma_complete() in this driver
still holds the port->lock while calling uart_write_wakeup().

So revert the wrap added by the commit above.

Cc: Thomas Abraham <thomas.abraham@linaro.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Hyeonkook Kim <hk619.kim@samsung.com>
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20220308115153.4225-1-jslaby@suse.czSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

27f1c36b

x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy · bca15008

由 Nathan Chancellor 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit 9cb90f9ad5975ddfc90d0906b40e9c71c2eca44e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9cb90f9ad5975ddfc90d0906b40e9c71c2eca44e

--------------------------------

[ Upstream commit aaeed6ec ]

There are two outstanding issues with CONFIG_X86_X32_ABI and
llvm-objcopy, with similar root causes:

1. llvm-objcopy does not properly convert .note.gnu.property when going
   from x86_64 to x86_x32, resulting in a corrupted section when
   linking:

   https://github.com/ClangBuiltLinux/linux/issues/1141

2. llvm-objcopy produces corrupted compressed debug sections when going
   from x86_64 to x86_x32, also resulting in an error when linking:

   https://github.com/ClangBuiltLinux/linux/issues/514

After commit 41c5ef31ad71 ("x86/ibt: Base IBT bits"), the
.note.gnu.property section is always generated when
CONFIG_X86_KERNEL_IBT is enabled, which causes the first issue to become
visible with an allmodconfig build:

  ld.lld: error: arch/x86/entry/vdso/vclock_gettime-x32.o:(.note.gnu.property+0x1c): program property is too short

To avoid this error, do not allow CONFIG_X86_X32_ABI to be selected when
using llvm-objcopy. If the two issues ever get fixed in llvm-objcopy,
this can be turned into a feature check.
Signed-off-by: NNathan Chancellor <nathan@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220314194842.3452-3-nathan@kernel.orgSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

bca15008

NFS: swap-out must always use STABLE writes. · 9ae11d7a

由 NeilBrown 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit b3882e78aa0a924fa6c2c09fe74aaaa2299a34ad
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b3882e78aa0a924fa6c2c09fe74aaaa2299a34ad

--------------------------------

[ Upstream commit c265de25 ]

The commit handling code is not safe against memory-pressure deadlocks
when writing to swap.  In particular, nfs_commitdata_alloc() blocks
indefinitely waiting for memory, and this can consume all available
workqueue threads.

swap-out most likely uses STABLE writes anyway as COND_STABLE indicates
that a stable write should be used if the write fits in a single
request, and it normally does.  However if we ever swap with a small
wsize, or gather unusually large numbers of pages for a single write,
this might change.

For safety, make it explicit in the code that direct writes used for swap
must always use FLUSH_STABLE.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

9ae11d7a

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功