提交 · 201455bc053f4f903e4ef7446f6a77f05b7ba89b · openeuler / Kernel

07 7月, 2023 34 次提交

net: annotate sk->sk_err write from do_recvmmsg() · 201455bc

由 Eric Dumazet 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.284
commit 640bce625ccf667a1a80262175a81108889ac41d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF
CVE: NA

--------------------------------

[ Upstream commit e05a5f51 ]

do_recvmmsg() can write to sk->sk_err from multiple threads.

As said before, many other points reading or writing sk_err
need annotations.

Fixes: 34b88a68 ("net: Fix use after free in the recvmmsg exit path")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Reviewed-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

201455bc

netlink: annotate accesses to nlk->cb_running · 35734b0e

由 Eric Dumazet 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.284
commit 840a647499b093621167de56ffa8756dfc69f242
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF
CVE: NA

--------------------------------

[ Upstream commit a939d149 ]

Both netlink_recvmsg() and netlink_native_seq_show() read
nlk->cb_running locklessly. Use READ_ONCE() there.

Add corresponding WRITE_ONCE() to netlink_dump() and
__netlink_dump_start()

syzbot reported:
BUG: KCSAN: data-race in __netlink_dump_start / netlink_recvmsg

write to 0xffff88813ea4db59 of 1 bytes by task 28219 on cpu 0:
__netlink_dump_start+0x3af/0x4d0 net/netlink/af_netlink.c:2399
netlink_dump_start include/linux/netlink.h:308 [inline]
rtnetlink_rcv_msg+0x70f/0x8c0 net/core/rtnetlink.c:6130
netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2577
rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6192
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:724 [inline]
sock_sendmsg net/socket.c:747 [inline]
sock_write_iter+0x1aa/0x230 net/socket.c:1138
call_write_iter include/linux/fs.h:1851 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x463/0x760 fs/read_write.c:584
ksys_write+0xeb/0x1a0 fs/read_write.c:637
__do_sys_write fs/read_write.c:649 [inline]
__se_sys_write fs/read_write.c:646 [inline]
__x64_sys_write+0x42/0x50 fs/read_write.c:646
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffff88813ea4db59 of 1 bytes by task 28222 on cpu 1:
netlink_recvmsg+0x3b4/0x730 net/netlink/af_netlink.c:2022
sock_recvmsg_nosec+0x4c/0x80 net/socket.c:1017
____sys_recvmsg+0x2db/0x310 net/socket.c:2718
___sys_recvmsg net/socket.c:2762 [inline]
do_recvmmsg+0x2e5/0x710 net/socket.c:2856
__sys_recvmmsg net/socket.c:2935 [inline]
__do_sys_recvmmsg net/socket.c:2958 [inline]
__se_sys_recvmmsg net/socket.c:2951 [inline]
__x64_sys_recvmmsg+0xe2/0x160 net/socket.c:2951
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x00 -> 0x01

Fixes: 16b304f3 ("netlink: Eliminate kmalloc in netlink dump operation.")
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

35734b0e

quota: simplify drop_dquot_ref() · 17a06cc9

由 Baokun Li 提交于 7月 07, 2023

maillist inclusion
category: bugfix
bugzilla: 188812,https://gitee.com/openeuler/kernel/issues/I7E0YR
CVE: NA

Reference: https://www.spinics.net/lists/kernel/msg4844759.html

----------------------------------------

As Honza said, remove_inode_dquot_ref() currently does not release the
last dquot reference but instead adds the dquot to tofree_head list. This
is because dqput() can sleep while dropping of the last dquot reference
(writing back the dquot and calling ->release_dquot()) and that must not
happen under dq_list_lock. Now that dqput() queues the final dquot cleanup
into a workqueue, remove_inode_dquot_ref() can call dqput() unconditionally
and we can significantly simplify it.

Here we open code the simplified code of remove_inode_dquot_ref() into
remove_dquot_ref() and remove the function put_dquot_list() which is no
longer used.
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

17a06cc9

quota: fix dqput() to follow the guarantees dquot_srcu should provide · 1715dfd0

由 Baokun Li 提交于 7月 07, 2023

maillist inclusion
category: bugfix
bugzilla: 188812,https://gitee.com/openeuler/kernel/issues/I7E0YR
CVE: NA

Reference: https://www.spinics.net/lists/kernel/msg4844759.html

----------------------------------------

The dquot_mark_dquot_dirty() using dquot references from the inode
should be protected by dquot_srcu. quota_off code takes care to call
synchronize_srcu(&dquot_srcu) to not drop dquot references while they
are used by other users. But dquot_transfer() breaks this assumption.
We call dquot_transfer() to drop the last reference of dquot and add
it to free_dquots, but there may still be other users using the dquot
at this time, as shown in the function graph below:

       cpu1              cpu2
_________________|_________________
wb_do_writeback         CHOWN(1)
 ...
  ext4_da_update_reserve_space
   dquot_claim_block
    ...
     dquot_mark_dquot_dirty // try to dirty old quota
      test_bit(DQ_ACTIVE_B, &dquot->dq_flags) // still ACTIVE
      if (test_bit(DQ_MOD_B, &dquot->dq_flags))
      // test no dirty, wait dq_list_lock
                    ...
                     dquot_transfer
                      __dquot_transfer
                      dqput_all(transfer_from) // rls old dquot
                       dqput // last dqput
                        dquot_release
                         clear_bit(DQ_ACTIVE_B, &dquot->dq_flags)
                        atomic_dec(&dquot->dq_count)
                        put_dquot_last(dquot)
                         list_add_tail(&dquot->dq_free, &free_dquots)
                         // add the dquot to free_dquots
      if (!test_and_set_bit(DQ_MOD_B, &dquot->dq_flags))
        add dqi_dirty_list // add released dquot to dirty_list

This can cause various issues, such as dquot being destroyed by
dqcache_shrink_scan() after being added to free_dquots, which can trigger
a UAF in dquot_mark_dquot_dirty(); or after dquot is added to free_dquots
and then to dirty_list, it is added to free_dquots again after
dquot_writeback_dquots() is executed, which causes the free_dquots list to
be corrupted and triggers a UAF when dqcache_shrink_scan() is called for
freeing dquot twice.

As Honza said, we need to fix dquot_transfer() to follow the guarantees
dquot_srcu should provide. But calling synchronize_srcu() directly from
dquot_transfer() is too expensive (and mostly unnecessary). So we add
dquot whose last reference should be dropped to the new global dquot
list releasing_dquots, and then queue work item which would call
synchronize_srcu() and after that perform the final cleanup of all the
dquots on releasing_dquots.

Fixes: 4580b30e ("quota: Do not dirty bad dquots")
Suggested-by: NJan Kara <jack@suse.cz>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

1715dfd0

quota: add new helper dquot_active() · 810fb538

由 Baokun Li 提交于 7月 07, 2023

maillist inclusion
category: bugfix
bugzilla: 188812,https://gitee.com/openeuler/kernel/issues/I7E0YR
CVE: NA

Reference: https://www.spinics.net/lists/kernel/msg4844759.html

----------------------------------------

Add new helper function dquot_active() to make the code more concise.
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

810fb538

quota: rename dquot_active() to inode_quota_active() · 4222ec98

由 Baokun Li 提交于 7月 07, 2023

maillist inclusion
category: bugfix
bugzilla: 188812,https://gitee.com/openeuler/kernel/issues/I7E0YR
CVE: NA

Reference: https://www.spinics.net/lists/kernel/msg4844759.html

----------------------------------------

Now we have a helper function dquot_dirty() to determine if dquot has
DQ_MOD_B bit. dquot_active() can easily be misunderstood as a helper
function to determine if dquot has DQ_ACTIVE_B bit. So we avoid this by
renaming it to inode_quota_active() and later on we will add the helper
function dquot_active() to determine if dquot has DQ_ACTIVE_B bit.
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

4222ec98

quota: factor out dquot_write_dquot() · d78ea078

由 Baokun Li 提交于 7月 07, 2023

maillist inclusion
category: bugfix
bugzilla: 188812,https://gitee.com/openeuler/kernel/issues/I7E0YR
CVE: NA

Reference: https://www.spinics.net/lists/kernel/msg4844759.html

----------------------------------------

Refactor out dquot_write_dquot() to reduce duplicate code.
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

d78ea078

quota: add dqi_dirty_list description to comment of Dquot List Management · 46bbc9c5

由 Chengguang Xu 提交于 7月 07, 2023

mainline inclusion
from mainline-v5.3-rc1
commit f44840ad
category: bugfix
bugzilla: 188812,https://gitee.com/openeuler/kernel/issues/I7E0YR
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f44840ad1f822d9ecee6a3f91f2d17825a361307

--------------------------------

Actually there are four lists for dquot management, so add
the description of dqui_dirty_list to comment.
Signed-off-by: NChengguang Xu <cgxu519@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

46bbc9c5

quota: avoid increasing DQST_LOOKUPS when iterating over dirty/inuse list · 56993c73

由 Chengguang Xu 提交于 7月 07, 2023

mainline inclusion
from mainline-v5.5-rc1
commit 05848db2
category: bugfix
bugzilla: 188812,https://gitee.com/openeuler/kernel/issues/I7E0YR
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=05848db2083d4f232e84e385845dcd98d5c511b2

--------------------------------

It is meaningless to increase DQST_LOOKUPS number while iterating
over dirty/inuse list, so just avoid it.

Link: https://lore.kernel.org/r/20190926083408.4269-1-cgxu519@zoho.com.cnSigned-off-by: NChengguang Xu <cgxu519@zoho.com.cn>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

56993c73

kernel/extable.c: use address-of operator on section symbols · dfdd56d1

由 Nathan Chancellor 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.285
commit 85d0254e7ec88b9cf6c853d9f6ea9e3af64d7f81
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF
CVE: NA

--------------------------------

commit 63174f61 upstream.

Clang warns:

../kernel/extable.c:37:52: warning: array comparison always evaluates to
a constant [-Wtautological-compare]
        if (main_extable_sort_needed && __stop___ex_table > __start___ex_table) {
                                                          ^
1 warning generated.

These are not true arrays, they are linker defined symbols, which are just
addresses.  Using the address of operator silences the warning and does
not change the resulting assembly with either clang/ld.lld or gcc/ld
(tested with diff + objdump -Dr).
Suggested-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/892
Link: http://lkml.kernel.org/r/20200219202036.45702-1-natechancellor@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

dfdd56d1

arm64/mm: mark private VM_FAULT_X defines as vm_fault_t · c8079d2b

由 Min-Hua Chen 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.285
commit 90e687756316c6d9b97caf3dd29f04b66ce0aeab
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF
CVE: NA

--------------------------------

[ Upstream commit d91d5808 ]

This patch fixes several sparse warnings for fault.c:

arch/arm64/mm/fault.c:493:24: sparse: warning: incorrect type in return expression (different base types)
arch/arm64/mm/fault.c:493:24: sparse:    expected restricted vm_fault_t
arch/arm64/mm/fault.c:493:24: sparse:    got int
arch/arm64/mm/fault.c:501:32: sparse: warning: incorrect type in return expression (different base types)
arch/arm64/mm/fault.c:501:32: sparse:    expected restricted vm_fault_t
arch/arm64/mm/fault.c:501:32: sparse:    got int
arch/arm64/mm/fault.c:503:32: sparse: warning: incorrect type in return expression (different base types)
arch/arm64/mm/fault.c:503:32: sparse:    expected restricted vm_fault_t
arch/arm64/mm/fault.c:503:32: sparse:    got int
arch/arm64/mm/fault.c:511:24: sparse: warning: incorrect type in return expression (different base types)
arch/arm64/mm/fault.c:511:24: sparse:    expected restricted vm_fault_t
arch/arm64/mm/fault.c:511:24: sparse:    got int
arch/arm64/mm/fault.c:670:13: sparse: warning: restricted vm_fault_t degrades to integer
arch/arm64/mm/fault.c:670:13: sparse: warning: restricted vm_fault_t degrades to integer
arch/arm64/mm/fault.c:713:39: sparse: warning: restricted vm_fault_t degrades to integer
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NMin-Hua Chen <minhuadotchen@gmail.com>
Link: https://lore.kernel.org/r/20230502151909.128810-1-minhuadotchen@gmail.comSigned-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

c8079d2b

x86/mm: Avoid incomplete Global INVLPG flushes · ef3d4916

由 Dave Hansen 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.284
commit 4d19f7698681c59b4c1f25dc343a025d91cc4827
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF
CVE: NA

--------------------------------

commit ce0b15d1 upstream.

The INVLPG instruction is used to invalidate TLB entries for a
specified virtual address.  When PCIDs are enabled, INVLPG is supposed
to invalidate TLB entries for the specified address for both the
current PCID *and* Global entries.  (Note: Only kernel mappings set
Global=1.)

Unfortunately, some INVLPG implementations can leave Global
translations unflushed when PCIDs are enabled.

As a workaround, never enable PCIDs on affected processors.

I expect there to eventually be microcode mitigations to replace this
software workaround.  However, the exact version numbers where that
will happen are not known today.  Once the version numbers are set in
stone, the processor list can be tweaked to only disable PCIDs on
affected processors with affected microcode.

Note: if anyone wants a quick fix that doesn't require patching, just
stick 'nopcid' on your kernel command-line.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: NDaniel Sneddon <daniel.sneddon@linux.intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

ef3d4916

sched: Fix KCSAN noinstr violation · f61443e0

由 Josh Poimboeuf 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.284
commit 446e8d258ae5067786338723e73c601edfbd8a0e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF
CVE: NA

--------------------------------

[ Upstream commit e0b081d1 ]

With KCSAN enabled, end_of_stack() can get out-of-lined.  Force it
inline.

Fixes the following warnings:

  vmlinux.o: warning: objtool: check_stackleak_irqoff+0x2b: call to end_of_stack() leaves .noinstr.text section
Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/cc1b4d73d3a428a00d206242a68fdf99a934ca7b.1681320026.git.jpoimboe@kernel.orgSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

f61443e0

serial: 8250: Reinit port->pm on port specific driver unbind · 707c0d48

由 Tony Lindgren 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.284
commit c9e080c3005fd183c56ff8f4d75edb5da0765d2c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF
CVE: NA

--------------------------------

[ Upstream commit 04e82793 ]

When we unbind a serial port hardware specific 8250 driver, the generic
serial8250 driver takes over the port. After that we see an oops about 10
seconds later. This can produce the following at least on some TI SoCs:

Unhandled fault: imprecise external abort (0x1406)
Internal error: : 1406 [#1] SMP ARM

Turns out that we may still have the serial port hardware specific driver
port->pm in use, and serial8250_pm() tries to call it after the port
specific driver is gone:

serial8250_pm [8250_base] from uart_change_pm+0x54/0x8c [serial_base]
uart_change_pm [serial_base] from uart_hangup+0x154/0x198 [serial_base]
uart_hangup [serial_base] from __tty_hangup.part.0+0x328/0x37c
__tty_hangup.part.0 from disassociate_ctty+0x154/0x20c
disassociate_ctty from do_exit+0x744/0xaac
do_exit from do_group_exit+0x40/0x8c
do_group_exit from __wake_up_parent+0x0/0x1c

Let's fix the issue by calling serial8250_set_defaults() in
serial8250_unregister_port(). This will set the port back to using
the serial8250 default functions, and sets the port->pm to point to
serial8250_pm.
Signed-off-by: NTony Lindgren <tony@atomide.com>
Link: https://lore.kernel.org/r/20230418101407.12403-1-tony@atomide.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

707c0d48

ACPICA: ACPICA: check null return of ACPI_ALLOCATE_ZEROED in acpi_db_display_objects · 68b2ece6

由 void0red 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.284
commit 35d67ffad6f5d78dbd800d354f5334c7b71a19e0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF
CVE: NA

--------------------------------

[ Upstream commit ae5a0ecc ]

ACPICA commit 0d5f467d6a0ba852ea3aad68663cbcbd43300fd4

ACPI_ALLOCATE_ZEROED may fails, object_info might be null and will cause
null pointer dereference later.

Link: https://github.com/acpica/acpica/commit/0d5f467dSigned-off-by: NBob Moore <robert.moore@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

68b2ece6

ACPI: EC: Fix oops when removing custom query handlers · 47ff3eba

由 Armin Wolf 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.284
commit 0d528a7c421b1f1772fc1d29370b3b5fc0f42b19
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF
CVE: NA

--------------------------------

[ Upstream commit e5b492c6 ]

When removing custom query handlers, the handler might still
be used inside the EC query workqueue, causing a kernel oops
if the module holding the callback function was already unloaded.

Fix this by flushing the EC query workqueue when removing
custom query handlers.

Tested on a Acer Travelmate 4002WLMi
Signed-off-by: NArmin Wolf <W_Armin@gmx.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

47ff3eba

lib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release() · bb341c43

由 Ben Hutchings 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.286
commit 53f16fa73f71abfcac67e71a5513653b8db28b76
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=53f16fa73f71abfcac67e71a5513653b8db28b76

--------------------------------

[ Upstream commit 7c5d4801 ]

irq_cpu_rmap_release() calls cpu_rmap_put(), which may free the rmap.
So we need to clear the pointer to our glue structure in rmap before
doing that, not after.

Fixes: 4e0473f1 ("lib: cpu_rmap: Avoid use after free on rmap->obj array entries")
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Reviewed-by: NSimon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/ZHo0vwquhOy3FaXc@decadent.org.ukSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

bb341c43

lib: cpu_rmap: Avoid use after free on rmap->obj array entries · 80c81c17

由 Eli Cohen 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.284
commit d1308bd0b24cb1d78fa2747d5fa3e055cc628a48
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d1308bd0b24cb1d78fa2747d5fa3e055cc628a48

--------------------------------

[ Upstream commit 4e0473f1 ]

When calling irq_set_affinity_notifier() with NULL at the notify
argument, it will cause freeing of the glue pointer in the
corresponding array entry but will leave the pointer in the array. A
subsequent call to free_irq_cpu_rmap() will try to free this entry again
leading to possible use after free.

Fix that by setting NULL to the array entry and checking that we have
non-zero at the array entry when iterating over the array in
free_irq_cpu_rmap().

The current code does not suffer from this since there are no cases
where irq_set_affinity_notifier(irq, NULL) (note the NULL passed for the
notify arg) is called, followed by a call to free_irq_cpu_rmap() so we
don't hit and issue. Subsequent patches in this series excersize this
flow, hence the required fix.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NEli Cohen <elic@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

80c81c17

ext4: improve error recovery code paths in __ext4_remount() · d3d9df7c

由 Theodore Ts'o 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.283
commit 37302d4c2724dc92be5f90a3718eafa29834d586
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF
CVE: NA

Reference:https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=37302d4c2724dc92be5f90a3718eafa29834d586

--------------------------------

commit 4c0b4818 upstream.

If there are failures while changing the mount options in
__ext4_remount(), we need to restore the old mount options.

This commit fixes two problem.  The first is there is a chance that we
will free the old quota file names before a potential failure leading
to a use-after-free.  The second problem addressed in this commit is
if there is a failed read/write to read-only transition, if the quota
has already been suspended, we need to renable quota handling.

Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20230506142419.984260-2-tytso@mit.eduSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

Conflicts:
	fs/ext4/super.c
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

d3d9df7c

scsi: core: Improve scsi_vpd_inquiry() checks · 463aed61

由 Damien Le Moal 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.282
commit 5cca80d4f3a842340fd0addf9ecaf9d83589bdec
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF
CVE: NA

--------------------------------

[ Upstream commit f0aa59a3 ]

Some USB-SATA adapters have broken behavior when an unsupported VPD page is
probed: Depending on the VPD page number, a 4-byte header with a valid VPD
page number but with a 0 length is returned. Currently, scsi_vpd_inquiry()
only checks that the page number is valid to determine if the page is
valid, which results in receiving only the 4-byte header for the
non-existent page. This error manifests itself very often with page 0xb9
for the Concurrent Positioning Ranges detection done by sd_read_cpr(),
resulting in the following error message:

sd 0:0:0:0: [sda] Invalid Concurrent Positioning Ranges VPD page

Prevent such misleading error message by adding a check in
scsi_vpd_inquiry() to verify that the page length is not 0.
Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20230322022211.116327-1-damien.lemoal@opensource.wdc.comReviewed-by: NBenjamin Block <bblock@linux.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

463aed61

PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock · 86e63999

由 Lukas Wunner 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.283
commit 2a226e8ca95107cd434d967ecfd83409e26a730e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7J5UF
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2a226e8ca95107cd434d967ecfd83409e26a730e

--------------------------------

commit f5eff559 upstream.

In 2013, commits

  2e35afae ("PCI: pciehp: Add reset_slot() method")
  608c3881 ("PCI: Add slot reset option to pci_dev_reset()")

amended PCIe hotplug to mask Presence Detect Changed events during a
Secondary Bus Reset.  The reset thus no longer causes gratuitous slot
bringdown and bringup.

However the commits neglected to serialize reset with code paths reading
slot registers.  For instance, a slot bringup due to an earlier hotplug
event may see the Presence Detect State bit cleared during a concurrent
Secondary Bus Reset.

In 2018, commit

  5b3f7b7d ("PCI: pciehp: Avoid slot access during reset")

retrofitted the missing locking.  It introduced a reset_lock which
serializes a Secondary Bus Reset with other parts of pciehp.

Unfortunately the locking turns out to be overzealous:  reset_lock is
held for the entire enumeration and de-enumeration of hotplugged devices,
including driver binding and unbinding.

Driver binding and unbinding acquires device_lock while the reset_lock
of the ancestral hotplug port is held.  A concurrent Secondary Bus Reset
acquires the ancestral reset_lock while already holding the device_lock.
The asymmetric locking order in the two code paths can lead to AB-BA
deadlocks.

Michael Haeuptle reports such deadlocks on simultaneous hot-removal and
vfio release (the latter implies a Secondary Bus Reset):

  pciehp_ist()                                    # down_read(reset_lock)
    pciehp_handle_presence_or_link_change()
      pciehp_disable_slot()
        __pciehp_disable_slot()
          remove_board()
            pciehp_unconfigure_device()
              pci_stop_and_remove_bus_device()
                pci_stop_bus_device()
                  pci_stop_dev()
                    device_release_driver()
                      device_release_driver_internal()
                        __device_driver_lock()    # device_lock()

  SYS_munmap()
    vfio_device_fops_release()
      vfio_device_group_close()
        vfio_device_close()
          vfio_device_last_close()
            vfio_pci_core_close_device()
              vfio_pci_core_disable()             # device_lock()
                __pci_reset_function_locked()
                  pci_reset_bus_function()
                    pci_dev_reset_slot_function()
                      pci_reset_hotplug_slot()
                        pciehp_reset_slot()       # down_write(reset_lock)

Ian May reports the same deadlock on simultaneous hot-removal and an
AER-induced Secondary Bus Reset:

  aer_recover_work_func()
    pcie_do_recovery()
      aer_root_reset()
        pci_bus_error_reset()
          pci_slot_reset()
            pci_slot_lock()                       # device_lock()
            pci_reset_hotplug_slot()
              pciehp_reset_slot()                 # down_write(reset_lock)

Fix by releasing the reset_lock during driver binding and unbinding,
thereby splitting and shrinking the critical section.

Driver binding and unbinding is protected by the device_lock() and thus
serialized with a Secondary Bus Reset.  There's no need to additionally
protect it with the reset_lock.  However, pciehp does not bind and
unbind devices directly, but rather invokes PCI core functions which
also perform certain enumeration and de-enumeration steps.

The reset_lock's purpose is to protect slot registers, not enumeration
and de-enumeration of hotplugged devices.  That would arguably be the
job of the PCI core, not the PCIe hotplug driver.  After all, an
AER-induced Secondary Bus Reset may as well happen during boot-time
enumeration of the PCI hierarchy and there's no locking to prevent that
either.

Exempting *de-enumeration* from the reset_lock is relatively harmless:
A concurrent Secondary Bus Reset may foil config space accesses such as
PME interrupt disablement.  But if the device is physically gone, those
accesses are pointless anyway.  If the device is physically present and
only logically removed through an Attention Button press or the sysfs
"power" attribute, PME interrupts as well as DMA cannot come through
because pciehp_unconfigure_device() disables INTx and Bus Master bits.
That's still protected by the reset_lock in the present commit.

Exempting *enumeration* from the reset_lock also has limited impact:
The exempted call to pci_bus_add_device() may perform device accesses
through pcibios_bus_add_device() and pci_fixup_device() which are now
no longer protected from a concurrent Secondary Bus Reset.  Otherwise
there should be no impact.

In essence, the present commit seeks to fix the AB-BA deadlocks while
still retaining a best-effort reset protection for enumeration and
de-enumeration of hotplugged devices -- until a general solution is
implemented in the PCI core.

Link: https://lore.kernel.org/linux-pci/CS1PR8401MB0728FC6FDAB8A35C22BD90EC95F10@CS1PR8401MB0728.NAMPRD84.PROD.OUTLOOK.COM
Link: https://lore.kernel.org/linux-pci/20200615143250.438252-1-ian.may@canonical.com
Link: https://lore.kernel.org/linux-pci/ce878dab-c0c4-5bd0-a725-9805a075682d@amd.com
Link: https://lore.kernel.org/linux-pci/ed831249-384a-6d35-0831-70af191e9bce@huawei.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215590
Fixes: 5b3f7b7d ("PCI: pciehp: Avoid slot access during reset")
Link: https://lore.kernel.org/r/fef2b2e9edf245c049a8c5b94743c0f74ff5008a.1681191902.git.lukas@wunner.deReported-by: NMichael Haeuptle <michael.haeuptle@hpe.com>
Reported-by: NIan May <ian.may@canonical.com>
Reported-by: NAndrey Grodzovsky <andrey2805@gmail.com>
Reported-by: NRahul Kumar <rahul.kumar1@amd.com>
Reported-by: NJialin Zhang <zhangjialin11@huawei.com>
Tested-by: NAnatoli Antonovitch <Anatoli.Antonovitch@amd.com>
Signed-off-by: NLukas Wunner <lukas@wunner.de>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v4.19+
Cc: Dan Stein <dstein@hpe.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Alex Michon <amichon@kalrayinc.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

 Conflicts:
	drivers/pci/hotplug/pciehp_pci.c
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

86e63999

loop: loop_set_status_from_info() check before assignment · 91a08b45

由 Zhong Jinghua 提交于 7月 07, 2023

mainline inclusion
from mainline-v6.3-rc1
commit 9f6ad5d5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7JHOA
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9f6ad5d533d1c71e51bdd06a5712c4fbc8768dfa

--------------------------------

In loop_set_status_from_info(), lo->lo_offset and lo->lo_sizelimit should
be checked before reassignment, because if an overflow error occurs, the
original correct value will be changed to the wrong value, and it will not
be changed back.

More, the original patch did not solve the problem, the value was set and
ioctl returned an error, but the subsequent io used the value in the loop
driver, which still caused an alarm:

loop_handle_cmd
 do_req_filebacked
  loff_t pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset;
  lo_rw_aio
   cmd->iocb.ki_pos = pos
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20230221095027.3656193-1-zhongjinghua@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

91a08b45

loop: Check for overflow while configuring loop · 1472bb0a

由 Siddh Raman Pant 提交于 7月 07, 2023

mainline inclusion
from mainline-v6.0-rc3
commit c490a0b5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7JHOA
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c490a0b5a4f36da3918181a8acdc6991d967c5f3

----------------------------------------

The userspace can configure a loop using an ioctl call, wherein
a configuration of type loop_config is passed (see lo_ioctl()'s
case on line 1550 of drivers/block/loop.c). This proceeds to call
loop_configure() which in turn calls loop_set_status_from_info()
(see line 1050 of loop.c), passing &config->info which is of type
loop_info64*. This function then sets the appropriate values, like
the offset.

loop_device has lo_offset of type loff_t (see line 52 of loop.c),
which is typdef-chained to long long, whereas loop_info64 has
lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).

The function directly copies offset from info to the device as
follows (See line 980 of loop.c):
	lo->lo_offset = info->lo_offset;

This results in an overflow, which triggers a warning in iomap_iter()
due to a call to iomap_iter_done() which has:
	WARN_ON_ONCE(iter->iomap.offset > iter->pos);

Thus, check for negative value during loop_set_status_from_info().

Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29e

Reported-and-tested-by: syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NSiddh Raman Pant <code@siddh.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.meSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

1472bb0a

Revert "loop: Check for overflow while configuring loop" · 2d627bc9

由 Zhong Jinghua 提交于 7月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7JHOA
CVE: NA

----------------------------------------

This reverts commit ba3f75d0.

The location of the code merge is wrong, roll back and start again.
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

2d627bc9

block: don't set GD_NEED_PART_SCAN if scan partition failed · 75715c6a

由 Yu Kuai 提交于 7月 07, 2023

mainline inclusion
from mainline-v6.3-rc6
commit 3723091e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7F3M1
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc7&id=3723091ea1884d599cc8b8bf719d6f42e8d4d8b1

--------------------------------

Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still
set, and partition scan will be proceed again when blkdev_get_by_dev()
is called. However, this will cause a problem that re-assemble partitioned
raid device will creat partition for underlying disk.

Test procedure:

mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0
sgdisk -n 0:0:+100MiB /dev/md0
blockdev --rereadpt /dev/sda
blockdev --rereadpt /dev/sdb
mdadm -S /dev/md0
mdadm -A /dev/md0 /dev/sda /dev/sdb

Test result: underlying disk partition and raid partition can be
observed at the same time

Note that this can still happen in come corner cases that
GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid
device.

Fixes: e5cfefa9 ("block: fix scan partition for exclusively open device again")
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

Conflicts:
  block/genhd.c
Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

75715c6a

block: return -EBUSY when there are open partitions in blkdev_reread_part · 98f62c2f

由 Christoph Hellwig 提交于 7月 07, 2023

mainline inclusion
from mainline-v5.12~10
commit 68e6582e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7F3M1
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4-rc7&id=68e6582e8f2dc32fd2458b9926564faa1fb4560e

----------------------------------------------

The switch to go through blkdev_get_by_dev means we now ignore the
return value from bdev_disk_changed in __blkdev_get.  Add a manual
check to restore the old semantics.

Fixes: 4601b4b1 ("block: reopen the device in blkdev_reread_part")
Reported-by: NKarel Zak <kzak@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210421160502.447418-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

Conflicts:
  block/ioctl.c
Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

98f62c2f

blk-wbt: make enable_state more accurate · 8c91fb1a

由 Yu Kuai 提交于 7月 07, 2023

mainline inclusion
from mainline-v6.2-rc1
commit a9a236d2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6Z1UG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3&id=a9a236d238a5e8ab2e74ca62c2c7ba5dd435af77

----------------------------------------

Currently, if user disable wbt through sysfs, 'enable_state' will be
'WBT_STATE_ON_MANUAL', which will be confusing. Add a new state
'WBT_STATE_OFF_MANUAL' to cover that case.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221019121518.3865235-4-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

Conflict:
  block/blk-wbt.c
Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

8c91fb1a

block: Limit number of items taken from the I/O scheduler in one go · c1ea82c0

由 Salman Qazi 提交于 7月 07, 2023

mainline inclusion
from mainline-v5.8-rc1
commit 28d65729
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6RQVT
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc4&id=28d65729b050977d8a9125e6726871e83bd22124

--------------------------------

Flushes bypass the I/O scheduler and get added to hctx->dispatch
in blk_mq_sched_bypass_insert.  This can happen while a kworker is running
hctx->run_work work item and is past the point in
blk_mq_sched_dispatch_requests where hctx->dispatch is checked.

The blk_mq_do_dispatch_sched call is not guaranteed to end in bounded time,
because the I/O scheduler can feed an arbitrary number of commands.

Since we have only one hctx->run_work, the commands waiting in
hctx->dispatch will wait an arbitrary length of time for run_work to be
rerun.

A similar phenomenon exists with dispatches from the software queue.

The solution is to poll hctx->dispatch in blk_mq_do_dispatch_sched and
blk_mq_do_dispatch_ctx and return from the run_work handler and let it
rerun.
Signed-off-by: NSalman Qazi <sqazi@google.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

Conflicts:
  block/blk-mq-sched.c
Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

c1ea82c0

crypto: cryptd - Protect per-CPU resource by disabling BH. · 7d2380ec

由 Sebastian Andrzej Siewior 提交于 7月 07, 2023

mainline inclusion
from mainline-v5.19-rc1
commit 91e8bcd7
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7JJ9V
CVE: NA

--------------------------------

The access to cryptd_queue::cpu_queue is synchronized by disabling
preemption in cryptd_enqueue_request() and disabling BH in
cryptd_queue_worker(). This implies that access is allowed from BH.

If cryptd_enqueue_request() is invoked from preemptible context _and_
soft interrupt then this can lead to list corruption since
cryptd_enqueue_request() is not protected against access from
soft interrupt.

Replace get_cpu() in cryptd_enqueue_request() with local_bh_disable()
to ensure BH is always disabled.
Remove preempt_disable() from cryptd_queue_worker() since it is not
needed because local_bh_disable() ensures synchronisation.

Fixes: 254eff77 ("crypto: cryptd - Per-CPU thread implementation...")
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Conflicts:
	crypto/cryptd.c
Signed-off-by: NGUO Zihua <guozihua@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

7d2380ec

random: fix data race on crng_node_pool · 4dbe47bc

由 Eric Biggers 提交于 7月 07, 2023

stable inclusion
from stable-v4.19.226
commit a6f8ba674655f511bbf0e55d4f16c581f7a8a35a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

----------------------------------------

commit 5d73d1e3 upstream.

extract_crng() and crng_backtrack_protect() load crng_node_pool with a
plain load, which causes undefined behavior if do_numa_crng_init()
modifies it concurrently.

Fix this by using READ_ONCE().  Note: as per the previous discussion
https://lore.kernel.org/lkml/20211219025139.31085-1-ebiggers@kernel.org/T/#u,
READ_ONCE() is believed to be sufficient here, and it was requested that
it be used here instead of smp_load_acquire().

Also change do_numa_crng_init() to set crng_node_pool using
cmpxchg_release() instead of mb() + cmpxchg(), as the former is
sufficient here but is more lightweight.

Fixes: 1e7f583a ("random: make /dev/urandom scalable for silly userspace programs")
Cc: stable@vger.kernel.org
Signed-off-by: NEric Biggers <ebiggers@google.com>
Acked-by: NPaul E. McKenney <paulmck@kernel.org>
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NGONG, Ruiqi <gongruiqi@huaweicloud.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

4dbe47bc

x86/kprobes: Fix the error judgment for debug exceptions · 70447fb8

由 Li Huafei 提交于 7月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7EU4Q
CVE: NA

--------------------------------

We get the following crash caused by a null pointer access:

 <SNIP>
 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 54 PID: 2469325 Comm: ftracetest Kdump: loaded Tainted: GFS         OE     5.10.0+ #12
 Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 3.35 10/20/2016
 RIP: 0010:resume_execution+0x35/0x190
 Code: 41 54 55 48 89 fd 53 48 89 f3 48 83 ec 08 4c 8b 6f 60 4c 8b b6 98 00 00 00 48 89 14 24 4c 8b 67 28 4d 89 ef eb 04 49 83 c5 01 <41> 0f b6 7d 00 e8 f1 12 57 00 83 e0 0f 8d 48 ff 83 f9 0a 76 e7 83
 RSP: 0018:fffffe16118acec0 EFLAGS: 00010086
 RAX: 0000000000000000 RBX: fffffe16118acf58 RCX: 00000000eefdca76
 RDX: ffff8f8cffd1f400 RSI: fffffe16118acf58 RDI: ffff8f5500d3c000
 RBP: ffff8f5500d3c000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff970e39c0
 R13: 0000000000000000 R14: ffffb55ba0bd7c60 R15: 0000000000000000
 FS:  00002aaaaad22740(0000) GS:ffff8f8cffd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 0000003ac202c003 CR4: 00000000003706e0
 DR0: ffffffff98000160 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Call Trace:
  <#DB>
  kprobe_debug_handler+0x41/0xd0
  exc_debug+0xe5/0x1b0
  asm_exc_debug+0x19/0x30
 RIP: 0010:copy_from_kernel_nofault.part.0+0x55/0xc0
 Code: 85 c0 74 e1 65 48 8b 04 25 40 f0 01 00 83 a8 78 14 00 00 01 48 c7 c0 f2 ff ff ff c3 cc cc cc cc 48 83 fa 03 76 16 31 c0 8b 0e <89> 0f 85 c0 75 d4 48 83 c7 04 48 83 c6 04 48 83 ea 04 48 83 fa 01
 RSP: 0018:ffffb55ba0bd7c60 EFLAGS: 00000246
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000076207325
 RDX: 0000000000000004 RSI: ffffffff98000160 RDI: ffff8f5500c0902c
 RBP: ffff8f5500c0902c R08: ffffb55ba0bd7bdc R09: 0000000000000001
 R10: ffff8f5548a0faf8 R11: 0000000000000000 R12: ffffffff98000160
 R13: 0000000000000000 R14: ffff8f560da128c0 R15: 0000000000000000
  </#DB>
  process_fetch_insn+0xfb/0x720
  kprobe_trace_func+0x199/0x2c0
  ? kernel_clone+0x5/0x2f0
  kprobe_dispatcher+0x3d/0x60
  aggr_pre_handler+0x40/0x80
  ? kernel_clone+0x1/0x2f0
  kprobe_ftrace_handler+0x82/0xf0
  ? __se_sys_clone+0x65/0x90
  ftrace_ops_assist_func+0x86/0x110
  ? rcu_nocb_try_bypass+0x1f3/0x370
  0xffffffffc07e60c8
  ? kernel_clone+0x1/0x2f0
  kernel_clone+0x5/0x2f0
 <SNIP>

The analysis reveals that kprobe and hardware breakpoints conflict in
the use of debug exceptions.

If we set a hardware breakpoint on a memory address, and at the same
time there is a kprobe event that also goes to get the memory of that
address, then when kprobe triggers, it will go to read the memory and
trigger hardware breakpoint monitoring, at this time, because kprobe
handles debug exceptions earlier than hardware breakpoints, it will
cause kprobe to incorrectly consider this exception as a kprobe trigger.

kprobe will change the status from KPROBE_HIT_ACTIVE to KPROBE_HIT_SS or
KPROBE_HIT_SSDONE before single-step execution, so if the current status
is KPROBE_HIT_ACTIVE, its not a debug exception triggered by kprobe.
Signed-off-by: NLi Huafei <lihuafei1@huawei.com>
Reviewed-by: NYe Weihua <yeweihua4@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

70447fb8

ext4: turning quotas off if mount failed after enable quotas · 8400bff7

由 Baokun Li 提交于 7月 07, 2023

mainline inclusion
from mainline-v6.5
commit d13f99632748462c32fc95d729f5e754bab06064
category: bugfix
bugzilla: 188906, https://gitee.com/openeuler/kernel/issues/I7E9M5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d13f99632748462c32fc95d729f5e754bab06064

--------------------------------

Yi found during a review of the patch "ext4: don't BUG on inconsistent
journal feature" that when ext4_mark_recovery_complete() returns an error
value, the error handling path does not turn off the enabled quotas,
which triggers the following kmemleak:

================================================================
unreferenced object 0xffff8cf68678e7c0 (size 64):
comm "mount", pid 746, jiffies 4294871231 (age 11.540s)
hex dump (first 32 bytes):
00 90 ef 82 f6 8c ff ff 00 00 00 00 41 01 00 00  ............A...
c7 00 00 00 bd 00 00 00 0a 00 00 00 48 00 00 00  ............H...
backtrace:
[<00000000c561ef24>] __kmem_cache_alloc_node+0x4d4/0x880
[<00000000d4e621d7>] kmalloc_trace+0x39/0x140
[<00000000837eee74>] v2_read_file_info+0x18a/0x3a0
[<0000000088f6c877>] dquot_load_quota_sb+0x2ed/0x770
[<00000000340a4782>] dquot_load_quota_inode+0xc6/0x1c0
[<0000000089a18bd5>] ext4_enable_quotas+0x17e/0x3a0 [ext4]
[<000000003a0268fa>] __ext4_fill_super+0x3448/0x3910 [ext4]
[<00000000b0f2a8a8>] ext4_fill_super+0x13d/0x340 [ext4]
[<000000004a9489c4>] get_tree_bdev+0x1dc/0x370
[<000000006e723bf1>] ext4_get_tree+0x1d/0x30 [ext4]
[<00000000c7cb663d>] vfs_get_tree+0x31/0x160
[<00000000320e1bed>] do_new_mount+0x1d5/0x480
[<00000000c074654c>] path_mount+0x22e/0xbe0
[<0000000003e97a8e>] do_mount+0x95/0xc0
[<000000002f3d3736>] __x64_sys_mount+0xc4/0x160
[<0000000027d2140c>] do_syscall_64+0x3f/0x90
================================================================

To solve this problem, we add a "failed_mount10" tag, and call
ext4_quota_off_umount() in this tag to release the enabled qoutas.

Fixes: 11215630 ("ext4: don't BUG on inconsistent journal feature")
Cc: stable@kernel.org
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230327141630.156875-2-libaokun1@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

Conflicts:
	fs/ext4/super.c
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

8400bff7

ext4: forbid commit inconsistent quota data when errors=remount-ro · 505ffd27

由 Ye Bin 提交于 7月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7D7ZL?from=project-issue

--------------------------------

This issue as follows When do IO fault injection test:
Quota error (device dm-3): find_block_dqentry: Quota for id 101 referenced but not present
Quota error (device dm-3): qtree_read_dquot: Can't read quota structure for id 101
Quota error (device dm-3): do_check_range: Getting block 2021161007 out of range 1-186
Quota error (device dm-3): qtree_read_dquot: Can't read quota structure for id 661

Now, ext4_write_dquot()/ext4_acquire_dquot()/ext4_release_dquot() may commit
inconsistent quota data even if process failed. This may lead to filesystem
corruption.
To ensure filesystem consistent when errors=remount-ro there is need to call
ext4_handle_error() to abort journal.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

505ffd27

quota: fixup *_write_file_info() to return proper error code · 4445b4dc

由 Yangtao Li 提交于 7月 07, 2023

mainline inclusion
from mainline-v6.4-rc1
commit f8107c99
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7D7ZL?from=project-issue
CVE: NA

--------------------------------

For v1_write_file_info function, when quota_write() returns 0,
it should be considered an EIO error. And for v2_write_file_info(),
fix to proper error return code instead of raw number.
Signed-off-by: NYangtao Li <frank.li@vivo.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Message-Id: <20230227120216.31306-1-frank.li@vivo.com>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

4445b4dc

06 7月, 2023 5 次提交

ipmi_si: fix a memleak in try_smi_init() · 18b2bde9

由 Yi Yang 提交于 7月 06, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7GMNK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=6cf1a126de2992b4efe1c3c4d398f8de4aed6e3f

----------------------------------------

Kmemleak reported the following leak info in try_smi_init():

unreferenced object 0xffff00018ecf9400 (size 1024):
  comm "modprobe", pid 2707763, jiffies 4300851415 (age 773.308s)
  backtrace:
    [<000000004ca5b312>] __kmalloc+0x4b8/0x7b0
    [<00000000953b1072>] try_smi_init+0x148/0x5dc [ipmi_si]
    [<000000006460d325>] 0xffff800081b10148
    [<0000000039206ea5>] do_one_initcall+0x64/0x2a4
    [<00000000601399ce>] do_init_module+0x50/0x300
    [<000000003c12ba3c>] load_module+0x7a8/0x9e0
    [<00000000c246fffe>] __se_sys_init_module+0x104/0x180
    [<00000000eea99093>] __arm64_sys_init_module+0x24/0x30
    [<0000000021b1ef87>] el0_svc_common.constprop.0+0x94/0x250
    [<0000000070f4f8b7>] do_el0_svc+0x48/0xe0
    [<000000005a05337f>] el0_svc+0x24/0x3c
    [<000000005eb248d6>] el0_sync_handler+0x160/0x164
    [<0000000030a59039>] el0_sync+0x160/0x180

The problem was that when an error occurred before handlers registration
and after allocating `new_smi->si_sm`, the variable wouldn't be freed in
the error handling afterwards since `shutdown_smi()` hadn't been
registered yet. Fix it by adding a `kfree()` in the error handling path
in `try_smi_init()`.

Cc: stable@vger.kernel.org # 4.19+
Fixes: 7960f18a ("ipmi_si: Convert over to a shutdown handler")
Signed-off-by: NYi Yang <yiyang13@huawei.com>
Co-developed-by: NGONG, Ruiqi <gongruiqi@huaweicloud.com>
Signed-off-by: NGONG, Ruiqi <gongruiqi@huaweicloud.com>
Message-Id: <20230629123328.2402075-1-gongruiqi@huaweicloud.com>
Signed-off-by: NCorey Minyard <minyard@acm.org>

conflict:
	drivers/char/ipmi/ipmi_si_intf.c
Signed-off-by: NYi Yang <yiyang13@huawei.com>
Reviewed-by: NGUO Zihua <guozihua@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

18b2bde9

net: add vlan_get_protocol_and_depth() helper · b2202edd

由 Eric Dumazet 提交于 7月 06, 2023

mainline inclusion
from mainline-v6.4-rc2
commit 4063384e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7IIRH
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4063384ef762cc5946fc7a3f89879e76c6ec51e2

---------------------------

Before blamed commit, pskb_may_pull() was used instead
of skb_header_pointer() in __vlan_get_protocol() and friends.

Few callers depended on skb->head being populated with MAC header,
syzbot caught one of them (skb_mac_gso_segment())

Add vlan_get_protocol_and_depth() to make the intent clearer
and use it where sensible.

This is a more generic fix than commit e9d3f809
("net/af_packet: make sure to pull mac header") which was
dealing with a similar issue.

kernel BUG at include/linux/skbuff.h:2655 !
invalid opcode: 0000 [#1] SMP KASAN
CPU: 0 PID: 1441 Comm: syz-executor199 Not tainted 6.1.24-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/14/2023
RIP: 0010:__skb_pull include/linux/skbuff.h:2655 [inline]
RIP: 0010:skb_mac_gso_segment+0x68f/0x6a0 net/core/gro.c:136
Code: fd 48 8b 5c 24 10 44 89 6b 70 48 c7 c7 c0 ae 0d 86 44 89 e6 e8 a1 91 d0 00 48 c7 c7 00 af 0d 86 48 89 de 31 d2 e8 d1 4a e9 ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
RSP: 0018:ffffc90001bd7520 EFLAGS: 00010286
RAX: ffffffff8469736a RBX: ffff88810f31dac0 RCX: ffff888115a18b00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90001bd75e8 R08: ffffffff84697183 R09: fffff5200037adf9
R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000012
R13: 000000000000fee5 R14: 0000000000005865 R15: 000000000000fed7
FS: 000055555633f300(0000) GS:ffff8881f6a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000000 CR3: 0000000116fea000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
[<ffffffff847018dd>] __skb_gso_segment+0x32d/0x4c0 net/core/dev.c:3419
[<ffffffff8470398a>] skb_gso_segment include/linux/netdevice.h:4819 [inline]
[<ffffffff8470398a>] validate_xmit_skb+0x3aa/0xee0 net/core/dev.c:3725
[<ffffffff84707042>] __dev_queue_xmit+0x1332/0x3300 net/core/dev.c:4313
[<ffffffff851a9ec7>] dev_queue_xmit+0x17/0x20 include/linux/netdevice.h:3029
[<ffffffff851b4a82>] packet_snd net/packet/af_packet.c:3111 [inline]
[<ffffffff851b4a82>] packet_sendmsg+0x49d2/0x6470 net/packet/af_packet.c:3142
[<ffffffff84669a12>] sock_sendmsg_nosec net/socket.c:716 [inline]
[<ffffffff84669a12>] sock_sendmsg net/socket.c:736 [inline]
[<ffffffff84669a12>] __sys_sendto+0x472/0x5f0 net/socket.c:2139
[<ffffffff84669c75>] __do_sys_sendto net/socket.c:2151 [inline]
[<ffffffff84669c75>] __se_sys_sendto net/socket.c:2147 [inline]
[<ffffffff84669c75>] __x64_sys_sendto+0xe5/0x100 net/socket.c:2147
[<ffffffff8551d40f>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<ffffffff8551d40f>] do_syscall_64+0x2f/0x50 arch/x86/entry/common.c:80
[<ffffffff85600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: 469acedd ("vlan: consolidate VLAN parsing code and limit max parsing depth")
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: NSimon Horman <simon.horman@corigine.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NLiu Jian <liujian56@huawei.com>

Conflicts:
	drivers/net/tap.c
	net/packet/af_packet.c
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

b2202edd

net: tap: check vlan with eth_type_vlan() method · 119d5f3d

由 Menglong Dong 提交于 7月 06, 2023

mainline inclusion
from mainline-v5.12-rc1-dontuse
commit b69df260
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7IIRH
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b69df2608281b71575fbb3b9f426dbcc4be8a700

---------------------------

Replace some checks for ETH_P_8021Q and ETH_P_8021AD in
drivers/net/tap.c with eth_type_vlan.
Signed-off-by: NMenglong Dong <dong.menglong@zte.com.cn>
Link: https://lore.kernel.org/r/20210115023238.4681-1-dong.menglong@zte.com.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NLiu Jian <liujian56@huawei.com>

Conflicts:
	drivers/net/tap.c
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

119d5f3d

!1317 ext4: Stop trying writing pages if no free blocks generated · 92ea9f89

由 openeuler-ci-bot 提交于 7月 06, 2023

Merge Pull Request from: @ci-robot 
 
PR sync from: Zhihao Cheng <chengzhihao1@huawei.com>
https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/Q5W2A2OT4AR2WQL2U5INIGRMTPG2BNC2/ 
 
 
Link:https://gitee.com/openeuler/kernel/pulls/1317 

Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> 
Signed-off-by: Liu YongQiang <liuyongqiang13@huawei.com>

92ea9f89

!1323 jbd2: fix several checkpoint · e9ee7895

由 openeuler-ci-bot 提交于 7月 06, 2023

Merge Pull Request from: @ci-robot 
 
PR sync from: Zhihao Cheng <chengzhihao1@huawei.com>
https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/4U4YTLJOOUUPXWOECKE354DTL4PZICPI/ 
Zhang Yi (4):
  jbd2: remove journal_clean_one_cp_list()
  jbd2: fix a race when checking checkpoint buffer busy
  jbd2: remove __journal_try_to_free_buffer()
  jbd2: fix checkpoint cleanup performance regression

Zhihao Cheng (1):
  jbd2: Fix wrongly judgement for buffer head removing while doing
    checkpoint


-- 
2.31.1
 
 
Link:https://gitee.com/openeuler/kernel/pulls/1323 

Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> 
Signed-off-by: Liu YongQiang <liuyongqiang13@huawei.com>

e9ee7895

05 7月, 2023 1 次提交

jbd2: fix checkpoint cleanup performance regression · cdbe929a

由 Zhang Yi 提交于 7月 05, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7IO1D
CVE: NA

--------------------------------

journal_clean_one_cp_list() has been merged into
journal_shrink_one_cp_list(), but do chekpoint buffer cleanup from the
committing process is just a best effort, it should stop scan once it
meet a busy buffer, or else it will cause a lot of invalid buffer scan
and checks. We catch a performance regression when doing fs_mark tests
below.

Test cmd:
 ./fs_mark  -d  scratch  -s  1024  -n  10000  -t  1  -D  100  -N  100

Before merging checkpoint buffer cleanup:
 FSUse%        Count         Size    Files/sec     App Overhead
     95        10000         1024       8304.9            49033

After merging checkpoint buffer cleanup:
 FSUse%        Count         Size    Files/sec     App Overhead
     95        10000         1024       7649.0            50012
 FSUse%        Count         Size    Files/sec     App Overhead
     95        10000         1024       2107.1            50871

After merging checkpoint buffer cleanup, the total loop count in
journal_shrink_one_cp_list() could be up to 6,261,600+ (50,000+ ~
100,000+ in general), most of them are invalid. This patch fix it
through passing 'shrink_type' into journal_shrink_one_cp_list() and add
a new 'SHRINK_BUSY_STOP' to indicate it should stop once meet a busy
buffer. After fix, the loop count descending back to 10,000+.

After this fix:
 FSUse%        Count         Size    Files/sec     App Overhead
     95        10000         1024       8558.4            49109
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>

cdbe929a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功