提交 · 8653cd5ec14805d843d329ecb16274fd35967bd5 · openeuler / Kernel

21 3月, 2022 1 次提交

blk-mq: add exception handling when srcu->sda alloc failed · 8653cd5e

由 Laibin Qiu 提交于 3月 21, 2022

hulk inclusion
category: bugfix
bugzilla: 186352, https://gitee.com/openeuler/kernel/issues/I4YADX
CVE: NA

--------------------------------

In case of BLK_MQ_F_BLOCKING, per-hctx srcu is used to protect dispatch
critical area. But the current process is not aware when memory of srcu
allocation failed in blk_mq_alloc_hctx, which will leads to illegal
address BUG. Add return value validation to avoid this problem.
Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8653cd5e

20 3月, 2022 9 次提交

mm/dynamic_hugetlb: initialize subpages before merging · 78774718

由 Liu Shixin 提交于 3月 20, 2022

hulk inclusion
category: bugfix
bugzilla: 46904 https://gitee.com/openeuler/kernel/issues/I4Y0XO

--------------------------------

Patch ("hugetlb: address ref count racing in prep_compound_gigantic_page") add
a check of ref count in prep_compound_gigantic_page. We will call this function
in dynamic hugetlb feature too, so we should initialize subpages before calling
prep_compound_gigantic_page to satisfy the change.
Further, the input of prep_compound_gigantic_page should be a group of pages
rather than compound page, so clear the properties related to compound page.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

78774718

mm/dynamic_hugetlb: set/clear HPageFreed · fec7b436

由 Liu Shixin 提交于 3月 20, 2022

hulk inclusion
category: bugfix
bugzilla: 46904 https://gitee.com/openeuler/kernel/issues/I4Y0XO

--------------------------------

Patch ("mm: hugetlb: fix a race between freeing and dissolving the page") add
PageHugeFreed to check whether a page is freed in hugetlb.
Patch ("hugetlb: convert PageHugeFreed to HPageFreed flag") convert it to
HPageFreed. We need to clear it when alloc hugepage from hugetlb to and set it
when free hugepage back to hugetlb.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fec7b436

mm/dynamic_hugetlb: only support to merge 2M dynamicly · 3f3eaf96

由 Liu Shixin 提交于 3月 20, 2022

hulk inclusion
category: bugfix
bugzilla: 46904 https://gitee.com/openeuler/kernel/issues/I4Y0XO

--------------------------------

We do not support dynamic combination of 1G hugepages dynamicly as this can
result in a significant performance loss. We suggest to configure the number of
hugepages immediately after creating a dynamic hugetlb pool rather than modify
them dynamicly while some processes are runing.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3f3eaf96

mm/dynamic_hugetlb: hold the lock until pages back to hugetlb · 687fb2b1

由 Liu Shixin 提交于 3月 20, 2022

hulk inclusion
category: bugfix
bugzilla: 46904 https://gitee.com/openeuler/kernel/issues/I4Y0XO

--------------------------------

Do not release the lock after merging all pages, otherwise some other process
may allocate the pages, and then some pages can't be put back to hugetlb.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

687fb2b1

mm/dynamic_hugetlb: use mem_cgroup_force_empty to reclaim pages · 94c749a3

由 Liu Shixin 提交于 3月 20, 2022

hulk inclusion
category: bugfix
bugzilla: 46904 https://gitee.com/openeuler/kernel/issues/I4Y0XO

--------------------------------

When all processes in the memory cgroup are finished, some memory may still be
occupied such as file cache. Use mem_cgroup_force_empty to reclaim these pages
that charged in the memory cgroup before merging all pages.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

94c749a3

mm/dynamic_hugetlb: check page using check_new_page · c3aad494

由 Liu Shixin 提交于 3月 20, 2022

hulk inclusion
category: bugfix
bugzilla: 46904 https://gitee.com/openeuler/kernel/issues/I4Y0XO

--------------------------------

Use check_new_page to check the page to be allocated.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c3aad494

mm/dynamic_hugetlb: use pfn to traverse subpages · d0c9c735

由 Liu Shixin 提交于 3月 20, 2022

hulk inclusion
category: bugfix
bugzilla: 46904 https://gitee.com/openeuler/kernel/issues/I4Y0XO

--------------------------------

For 1G huge pages, the struct page of each subpages may be discontinuous, but
pfn must be continuous, so it's better to traverse subpages using pfn rathan
than struct page.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d0c9c735

mm/dynamic_hugetlb: improve the initialization of huge pages · 9a60fa50

由 Liu Shixin 提交于 3月 20, 2022

hulk inclusion
category: bugfix
bugzilla: 46904 https://gitee.com/openeuler/kernel/issues/I4Y0XO

--------------------------------

Referring to alloc_buddy_huge_page function, replace prep_compound_page
with prep_new_page which is more appropriate because it's the opposite of
free_pages_prepare.
And initialize page->mapping for huge pages as they are initialized in
free_huge_page too.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9a60fa50

mm/dynamic_hugetlb: check free_pages_prepares when split pages · aa1306e1

由 Liu Shixin 提交于 3月 20, 2022

hulk inclusion
category: bugfix
bugzilla: 46904 https://gitee.com/openeuler/kernel/issues/I4Y0XO

--------------------------------

The hugepages may still remain PG_uptodate flags when freed. When splitting
hugepage to pages, the flag is not clear. This causes the page to be allocated
with PG_uptodate flags and user may read incorrect datas.

In order to solve this problem and similar problems, add free_pages_prepares()
to clear page when splitting pages to small pool.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

aa1306e1

17 3月, 2022 10 次提交

irqchip/gic-phytium-2500: Fix issue that interrupts are concentrated in one cpu · 0a1bc196

由 Mao HongBo 提交于 3月 17, 2022

Phytium inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I41AUQ
CVE: NA

-------------------------------------------------

Fix the issue that interrupts are concentrated in one cpu
for Phytium S2500 server.
Signed-off-by: NMao HongBo <maohongbo@phytium.com.cn>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0a1bc196

scsi: ses: Fix crash caused by kfree an invalid pointer · 74885845

由 Ding Hui 提交于 3月 17, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I3BNT6
CVE: NA

-----------------------------------------------

We can get a crash when disconnecting the iSCSI session,
the call trace like this:

  [ffff00002a00fb70] kfree at ffff00000830e224
  [ffff00002a00fba0] ses_intf_remove at ffff000001f200e4
  [ffff00002a00fbd0] device_del at ffff0000086b6a98
  [ffff00002a00fc50] device_unregister at ffff0000086b6d58
  [ffff00002a00fc70] __scsi_remove_device at ffff00000870608c
  [ffff00002a00fca0] scsi_remove_device at ffff000008706134
  [ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4
  [ffff00002a00fd10] scsi_remove_target at ffff0000087064c0
  [ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4
  [ffff00002a00fdb0] process_one_work at ffff00000810f35c
  [ffff00002a00fe00] worker_thread at ffff00000810f648
  [ffff00002a00fe70] kthread at ffff000008116e98

In ses_intf_add, components count could be 0, and kcalloc 0 size scomp,
but not saved in edev->component[i].scratch

In this situation, edev->component[0].scratch is an invalid pointer,
when kfree it in ses_intf_remove_enclosure, a crash like above would happen
The call trace also could be other random cases when kfree cannot catch
the invalid pointer

We should not use edev->component[] array when the components count is 0
We also need check index when use edev->component[] array in
ses_enclosure_data_process

Another fix option is report error and do not attach in ses_intf_add if we
meet a zero component enclosure
Tested-by: NZeng Zhicong <timmyzeng@163.com>
Signed-off-by: NDing Hui <dinghui@sangfor.com.cn>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

74885845

arm64: kexec: Fix missing error code 'ret' warning in load_other_segments() · 28081b79

由 Lakshmi Ramasubramanian 提交于 3月 17, 2022

mainline inclusion
from mainline-v5.16-rc6
commit 9c5d89bc
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4Y3UC

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9c5d89bc10551f1aecd768b00fca3339a7b8c8ee

--------------------------------

Since commit ac10be5c ("arm64: Use common
of_kexec_alloc_and_setup_fdt()"), smatch reports the following warning:

  arch/arm64/kernel/machine_kexec_file.c:152 load_other_segments()
  warn: missing error code 'ret'

Return code is not set to an error code in load_other_segments() when
of_kexec_alloc_and_setup_fdt() call returns a NULL dtb. This results
in status success (return code set to 0) being returned from
load_other_segments().

Set return code to -EINVAL if of_kexec_alloc_and_setup_fdt() returns
NULL dtb.
Signed-off-by: NLakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: ac10be5c ("arm64: Use common of_kexec_alloc_and_setup_fdt()")
Link: https://lore.kernel.org/r/20211210010121.101823-1-nramas@linux.microsoft.comSigned-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NLi Huafei <lihuafei1@huawei.com>
Reviewed-by: NXu Kuohai <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

28081b79

ovl: fix incorrect extent info in metacopy case · 677ea7d5

由 Chengguang Xu 提交于 3月 17, 2022

mainline inclusion
from mainline-v5.11-rc1
commit c11faf32
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4Y3CE?from=project-issue
CVE: NA

--------------------------------

In metacopy case, we should use ovl_inode_realdata() instead of
ovl_inode_real() to get real inode which has data, so that
we can get correct information of extentes in ->fiemap operation.
Signed-off-by: NChengguang Xu <cgxu519@mykernel.net>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NZheng Liang <zhengliang6@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

677ea7d5

perf sched: Cast PTHREAD_STACK_MIN to int as it may turn into sysconf(__SC_THREAD_STACK_MIN_VALUE) · ee5130c7

由 Arnaldo Carvalho de Melo 提交于 3月 17, 2022

mainline inclusion
from mainline-v5.14-rc2
commit d08c84e0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4Y3D0

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d08c84e01afa7a7eee6badab25d5420fa847f783

----------------------------

In fedora rawhide the PTHREAD_STACK_MIN define may end up expanded to a
sysconf() call, and that will return 'long int', breaking the build:

    45 fedora:rawhide                : FAIL gcc version 11.1.1 20210623 (Red Hat 11.1.1-6) (GCC)
      builtin-sched.c: In function 'create_tasks':
      /git/perf-5.14.0-rc1/tools/include/linux/kernel.h:43:24: error: comparison of distinct pointer types lacks a cast [-Werror]
         43 |         (void) (&_max1 == &_max2);              \
            |                        ^~
      builtin-sched.c:673:34: note: in expansion of macro 'max'
        673 |                         (size_t) max(16 * 1024, PTHREAD_STACK_MIN));
            |                                  ^~~
      cc1: all warnings being treated as errors

  $ grep __sysconf /usr/include/*/*.h
  /usr/include/bits/pthread_stack_min-dynamic.h:extern long int __sysconf (int __name) __THROW;
  /usr/include/bits/pthread_stack_min-dynamic.h:#   define PTHREAD_STACK_MIN __sysconf (__SC_THREAD_STACK_MIN_VALUE)
  /usr/include/bits/time.h:extern long int __sysconf (int);
  /usr/include/bits/time.h:# define CLK_TCK ((__clock_t) __sysconf (2))	/* 2 is _SC_CLK_TCK */
  $

So cast it to int to cope with that.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NZou Wei <zou_wei@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ee5130c7

arm64: remove page granularity limitation from KFENCE · d851728c

由 Jisheng Zhang 提交于 3月 17, 2022

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4XWBS

Reference: https://lore.kernel.org/all/20210524172433.015b3b6b@xhacker.debian/

--------------------------------

Jisheng Zhang has another way of saving memory, we combine his two patches into
one and made some adaptations with dynamic kfence objects.

Description of the original patch:

Some architectures may want to allocate the __kfence_pool differently
for example, allocate the __kfence_pool earlier before paging_init().
We also delay the memset() to kfence_init_pool().

KFENCE requires linear map to be mapped at page granularity, so that
it is possible to protect/unprotect single pages in the KFENCE pool.
Currently if KFENCE is enabled, arm64 maps all pages at page
granularity, it seems overkilled. In fact, we only need to map the
pages in KFENCE pool itself at page granularity. We acchieve this goal
by allocating KFENCE pool before paging_init() so we know the KFENCE
pool address, then we take care to map the pool at page granularity
during map_mem().
Signed-off-by: NJisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d851728c

Revert "arm64: remove page granularity limitation from KFENCE" · e21f213b

由 Liu Shixin 提交于 3月 17, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4XWBS

--------------------------------

This reverts commit d3d0ca13

We found that this patch may lead to a TLB conflicts abort. This may results by
the changes from block<->table mappings. This problem may have something to do
with the Break-Before-Make sequence rule but not yet clear.
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e21f213b

kfence: Fix wrong size of alloc_covered when enable dynamic · 4a46412a

由 Peng Liu 提交于 3月 17, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V388
CVE: NA

--------------------------------

Patch "kfence: Add a module parameter to adjust kfence objects"
enable dynamic configuration of the number of KFENCE guarded
objects, but the size of alloc_covered is not the same with the
original kfence. This is because const_ilog2 is just valid for
a constant, and KFENCE_NR_OBJECTS is not a constant when enabling
dynamic configuration.

This difference between original kfence will lead to a confusion
loggic in the process of skipping covered path. In a arm64 machine,
the following panic is observed.

  Call trace:
   __kfence_alloc+0x378/0x780
   kmem_cache_alloc+0x204/0x614
   getname_kernel+0x38/0xf4
   filp_open+0x2c/0x6c
   populate_rootfs+0xcc/0x174
   do_one_initcall+0xac/0x20c
   kernel_init_freeable+0x380/0x3c8
   kernel_init+0x18/0xf0
   ret_from_fork+0x10/0x18
  Code: 54000080 a9400381 f9000420 f9000001 (f900039c)
  ---[ end trace 814fe40d608e1b74 ]---
  Kernel panic - not syncing: TLB conflict abort: Fatal exception

To fix this, ilog2 is used to replace const_ilog2 when enable
dynamic configuration of KFENCE guarded objects.

Fixes: 901b983c ("kfence: Add a module parameter to adjust kfence objects")
Signed-off-by: NPeng Liu <liupeng256@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4a46412a

audit: improve audit queue handling when "audit=1" on cmdline · c947ee58

由 Paul Moore 提交于 3月 17, 2022

mainline inclusion
from mainline-v5.17-rc3
commit f26d0433
category: bugfix
bugzilla: 186383 https://gitee.com/openeuler/kernel/issues/I4X1AI?from=project-issue
CVE: NA

--------------------------------

When an admin enables audit at early boot via the "audit=1" kernel
command line the audit queue behavior is slightly different; the
audit subsystem goes to greater lengths to avoid dropping records,
which unfortunately can result in problems when the audit daemon is
forcibly stopped for an extended period of time.

This patch makes a number of changes designed to improve the audit
queuing behavior so that leaving the audit daemon in a stopped state
for an extended period does not cause a significant impact to the
system.

- kauditd_send_queue() is now limited to looping through the
  passed queue only once per call.  This not only prevents the
  function from looping indefinitely when records are returned
  to the current queue, it also allows any recovery handling in
  kauditd_thread() to take place when kauditd_send_queue()
  returns.

- Transient netlink send errors seen as -EAGAIN now cause the
  record to be returned to the retry queue instead of going to
  the hold queue.  The intention of the hold queue is to store,
  perhaps for an extended period of time, the events which led
  up to the audit daemon going offline.  The retry queue remains
  a temporary queue intended to protect against transient issues
  between the kernel and the audit daemon.

- The retry queue is now limited by the audit_backlog_limit
  setting, the same as the other queues.  This allows admins
  to bound the size of all of the audit queues on the system.

- kauditd_rehold_skb() now returns records to the end of the
  hold queue to ensure ordering is preserved in the face of
  recent changes to kauditd_send_queue().

Cc: stable@vger.kernel.org
Fixes: 5b52330b ("audit: fix auditd/kernel connection state tracking")
Fixes: f4b3ee3c ("audit: improve robustness of the audit queue handling")
Reported-by: NGaosheng Cui <cuigaosheng1@huawei.com>
Tested-by: NGaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NCui GaoSheng <cuigaosheng1@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c947ee58

Revert "audit: bugfix for infinite loop when flush the hold queue" · d9d06865

由 Cui GaoSheng 提交于 3月 17, 2022

hulk inclusion
category: bugfix
bugzilla: 186383 https://gitee.com/openeuler/kernel/issues/I4X1AI?from=project-issue
CVE: NA

--------------------------------

This reverts commit fcfdde9c.
Signed-off-by: NCui GaoSheng <cuigaosheng1@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d9d06865

15 3月, 2022 1 次提交

arm/arm64: paravirt: Remove GPL from pv_ops export · 38134bb6

由 Zengruan Ye 提交于 3月 15, 2022

virt inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4VZPC
CVE: NA

--------------------------------

Commit 63042c58 ("KVM: arm64: Add interface to support vCPU
preempted check") introduced paravirt spinlock operations, as pv_lock_ops
was exported via EXPORT_SYMBOL(), while the pv_ops structure containing
the pv lock operations is exported via EXPORT_SYMBOL_GPL().

Change that by using EXPORT_SYMBOL(pv_ops) for arm/arm64, as with the x86
architecture changes, the following:
https://lore.kernel.org/all/20181029150116.25372-1-jgross@suse.com/T/#u

Fixes: 63042c58 ("KVM: arm64: Add interface to support vCPU preempted
check")
Signed-off-by: Nyezengruan <yezengruan@huawei.com>
Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
Acked-by: NXie Xiuqi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

38134bb6

11 3月, 2022 4 次提交

ima: bugfix for digest lists importing · 8dbf8e5f

由 shenxiangwei 提交于 3月 11, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4XHBM
CVE: NA

-------------

The check for control character shouldn't be added when import a
binary digest list.
Signed-off-by: Nshenxiangwei <shenxiangwei1@huawei.com>
Reviewed-by: NLu Huaxin <luhuaxin1@huawei.com>
Reviewed-by: NRoberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8dbf8e5f

net/hinic: Fix call trace when the rx_buff module parameter is grater than 2 · 04edfc4f

由 Chiqijun 提交于 3月 11, 2022

driver inclusion
category: bugfix
bugzilla: 4472 https://gitee.com/openeuler/kernel/issues/I4O2ZZ

-----------------------------------------------------------------------

When rx_buff is greater than 2, the driver will alloc for more than 1
page of memory for network rx, but the __GFP_COMP gfp flag is not set,
resulting in the following call trace:

CPU: 3 PID: 494041 Comm: ping Kdump: loaded Tainted: G        W  OE     4.19.90-2106.3.0.0095.oe1.x86_64 #1
Hardware name: Huawei Technologies Co., Ltd. RH2288H V3/BC11HGSA0, BIOS 5.15 05/21/2019
RIP: 0010:copy_page_to_iter+0x154/0x310
Code: 31 b8 00 10 00 00 f7 c6 00 80 00 00 74 07 0f b6 49 51 48 d3 e0 48 39 c2 0f 86 ed fe ff ff 48 c7 c7 30
RSP: 0018:ffffbd6907d03bd8 EFLAGS: 00010286
RAX: 0000000000000024 RBX: ffffe0ffee5b3000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff9edbbfcd6858 RDI: ffff9edbbfcd6858
RBP: 0000000000000001 R08: 000000000001574a R09: 0000000000000004
R10: 000000000000004e R11: 0000000000000001 R12: ffffbd6907d03ed0
R13: 0000000000002100 R14: 0000000000000030 R15: 0000000000000000
FS:  00007f9d37244dc0(0000) GS:ffff9edbbfcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe0e715f80 CR3: 000000203c018005 CR4: 00000000001606e0
Call Trace:
 skb_copy_datagram_iter+0x16c/0x2a0
 raw_recvmsg+0xd0/0x1f0
 inet_recvmsg+0x5b/0xd0
 ____sys_recvmsg+0x95/0x160
 ? import_iovec+0x37/0xd0
 ? copy_msghdr_from_user+0x5c/0x90
 ___sys_recvmsg+0x8c/0xd0
 ? __audit_syscall_exit+0x228/0x290
 ? kretprobe_trampoline+0x25/0x50
 ? __sys_recvmsg+0x5b/0xa0
 __sys_recvmsg+0x5b/0xa0
 do_syscall_64+0x5f/0x240
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Use 'dev_alloc_pages' instead of calling ’alloc_pages_node‘ directly.
Signed-off-by: NChiqijun <chiqijun@huawei.com>
Reviewed-by: NWangxiaoyun <cloud.wangxiaoyun@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

04edfc4f

net/hinic: Fix null pointer dereference in hinic_physical_port_id · 98541cfd

由 Chiqijun 提交于 3月 11, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I4XF98
CVE: NA

-----------------------------------------------------------------------

The hinic driver currently generates a NULL pointer dereference
when performing the hinicadm tool command during device probe.
This is because the hinicadm process accesses the NULL hwif
pointer in the hwdev which have not been allocated in probe.

Fix this by checking the initialization state of device before
accessing it.
Signed-off-by: NChiqijun <chiqijun@huawei.com>
Reviewed-by: NWangxiaoyun <cloud.wangxiaoyun@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

98541cfd

net/hinic: Fix double free issue · a8052325

由 Chiqijun 提交于 3月 10, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I4WWH4
CVE: NA

-----------------------------------------------------------------------

When hinic_remove is executed concurrently, chip_node is double freed.
Signed-off-by: NChiqijun <chiqijun@huawei.com>
Reviewed-by: NWangxiaoyun <cloud.wangxiaoyun@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a8052325

09 3月, 2022 15 次提交

eulerfs: remove redundant calculations · 164d0706

由 Gou Hao 提交于 3月 09, 2022

uniontech inclusion
category: cleanup
bugzilla: https://gitee.com/openeuler/kernel/issues/I4X47D?from=project-issue
CVE: NA

--------------------------------

The 'left' always is 0.
If it is not 0, it will 'goto out;' from the previous if judgment.
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

164d0706

scsi: spfc: Remove redundant mask and spinlock · a8f304a8

由 Yanling Song 提交于 3月 09, 2022

Ramaxel inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4UA67
CVE: NA

----------------------------------------------

Fix:
1.Remove UNF_ORIGIN_HOTTAG_MASK and UNF_HOTTAG_FLAG
2.Update some output string
3.Remove spinlock protect in free_parent_sq() because there is
  spinlock protect in caller function free_parent_queue_info()
Signed-off-by: NYanling Song <songyl@ramaxel.com>
Reviewed-by: NYun Xu <xuyun@ramaxel.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a8f304a8

xfs: order CIL checkpoint start records · ddb0a54d

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit 68a74dca
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=68a74dcae6737c27b524b680e070fe41f0cad43a

-------------------------------------------------

Because log recovery depends on strictly ordered start records as
well as strictly ordered commit records.

This is a zero day bug in the way XFS writes pipelined transactions
to the journal which is exposed by fixing the zero day bug that
prevents the CIL from pipelining checkpoints. This re-introduces
explicit concurrent commits back into the on-disk journal and hence
out of order start records.

The XFS journal commit code has never ordered start records and we
have relied on strict commit record ordering for correct recovery
ordering of concurrently written transactions. Unfortunately, root
cause analysis uncovered the fact that log recovery uses the LSN of
the start record for transaction commit processing. Hence, whilst
the commits are processed in strict order by recovery, the LSNs
associated with the commits can be out of order and so recovery may
stamp incorrect LSNs into objects and/or misorder intents in the AIL
for later processing. This can result in log recovery failures
and/or on disk corruption, sometimes silent.

Because this is a long standing log recovery issue, we can't just
fix log recovery and call it good. This still leaves older kernels
susceptible to recovery failures and corruption when replaying a log
from a kernel that pipelines checkpoints. There is also the issue
that in-memory ordering for AIL pushing and data integrity
operations are based on checkpoint start LSNs, and if the start LSN
is incorrect in the journal, it is also incorrect in memory.

Hence there's really only one choice for fixing this zero-day bug:
we need to strictly order checkpoint start records in ascending
sequence order in the log, the same way we already strictly order
commit records.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ddb0a54d

xfs: attach iclog callbacks in xlog_cil_set_ctx_write_state() · f805da35

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit caa80090
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=caa80090d17c89d0caca1dcb4c8a9cdef5335e71

-------------------------------------------------

Now that we have a mechanism to guarantee that the callbacks
attached to an iclog are owned by the context that attaches them
until they drop their reference to the iclog via
xlog_state_release_iclog(), we can attach callbacks to the iclog at
any time we have an active reference to the iclog.

xlog_state_get_iclog_space() always guarantees that the commit
record will fit in the iclog it returns, so we can move this IO
callback setting to xlog_cil_set_ctx_write_state(), record the
commit iclog in the context and remove the need for the commit iclog
to be returned by xlog_write() altogether.

This, in turn, allows us to move the wakeup for ordered commit
record writes up into xlog_cil_set_ctx_write_state(), too, because
we have been guaranteed that this commit record will be physically
located in the iclog before any waiting commit record at a higher
sequence number will be granted iclog space.

This further cleans up the post commit record write processing in
the CIL push code, especially as xlog_state_release_iclog() will now
clean up the context when shutdown errors occur.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f805da35

xfs: factor out log write ordering from xlog_cil_push_work() · fcf905e9

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit bf034bc8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bf034bc827807ac15affa051e6a94b03f93b1a03

-------------------------------------------------

So we can use it for start record ordering as well as commit record
ordering in future.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fcf905e9

xfs: pass a CIL context to xlog_write() · 2eede308

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit c45aba40
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c45aba40cf5b2988c0bebee8c9b846c88aa651eb

-------------------------------------------------

Pass the CIL context to xlog_write() rather than a pointer to a LSN
variable. Only the CIL checkpoint calls to xlog_write() need to know
about the start LSN of the writes, so rework xlog_write to directly
write the LSNs into the CIL context structure.

This removes the commit_lsn variable from xlog_cil_push_work(), so
now we only have to issue the commit record ordering wakeup from
there.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2eede308

xfs: fix the forward progress assertion in xfs_iwalk_run_callbacks · 33b947b8

由 Darrick J. Wong 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.10-rc5
commit a5336d6b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a5336d6bb2d02d0e9d4d3c8be04b80b8b68d56c8

-------------------------------------------------

In commit 27c14b5d we started tracking the last inode seen during an
inode walk to avoid infinite loops if a corrupt inobt record happens to
have a lower ir_startino than the record preceeding it.  Unfortunately,
the assertion trips over the case where there are completely empty inobt
records (which can happen quite easily on 64k page filesystems) because
we advance the tracking cursor without actually putting the empty record
into the processing buffer.  Fix the assert to allow for this case.

Reported-by: zlang@redhat.com
Fixes: 27c14b5d ("xfs: ensure inobt record walks always make forward progress")
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NZorro Lang <zlang@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

33b947b8

xfs: move xlog_commit_record to xfs_log_cil.c · 5977c106

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit 2ce82b72
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ce82b722de980deef809438603b7e95156d3818

-------------------------------------------------

It is only used by the CIL checkpoints, and is the counterpart to
start record formatting and writing that is already local to
xfs_log_cil.c.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5977c106

xfs: log head and tail aren't reliable during shutdown · 41769f8e

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit 2562c322
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2562c322404d81ee5fa82f3cf601a2e27393ab57

-------------------------------------------------

I'm seeing assert failures from xlog_space_left() after a shutdown
has begun that look like:

XFS (dm-0): log I/O error -5
XFS (dm-0): xfs_do_force_shutdown(0x2) called from line 1338 of file fs/xfs/xfs_log.c. Return address = xlog_ioend_work+0x64/0xc0
XFS (dm-0): Log I/O Error Detected.
XFS (dm-0): Shutting down filesystem. Please unmount the filesystem and rectify the problem(s)
XFS (dm-0): xlog_space_left: head behind tail
XFS (dm-0):   tail_cycle = 6, tail_bytes = 2706944
XFS (dm-0):   GH   cycle = 6, GH   bytes = 1633867
XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 1310
------------[ cut here ]------------
Call Trace:
 xlog_space_left+0xc3/0x110
 xlog_grant_push_threshold+0x3f/0xf0
 xlog_grant_push_ail+0x12/0x40
 xfs_log_reserve+0xd2/0x270
 ? __might_sleep+0x4b/0x80
 xfs_trans_reserve+0x18b/0x260
.....

There are two things here. Firstly, after a shutdown, the log head
and tail can be out of whack as things abort and release (or don't
release) resources, so checking them for sanity doesn't make much
sense. Secondly, xfs_log_reserve() can race with shutdown and so it
can still fail like this even though it has already checked for a
log shutdown before calling xlog_grant_push_ail().

So, before ASSERT failing in xlog_space_left(), make sure we haven't
already shut down....
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

41769f8e

xfs: don't run shutdown callbacks on active iclogs · 2654e8f6

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit 502a01fa
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=502a01fac0983406e7c312764d7a03e06b3d0748

-------------------------------------------------

When the log is shutdown, it currently walks all the iclogs and runs
callbacks that are attached to the iclogs, regardless of whether the
iclog is queued for IO completion or not. This creates a problem for
contexts attaching callbacks to iclogs in that a racing shutdown can
run the callbacks even before the attaching context has finished
processing the iclog and releasing it for IO submission.

If the callback processing of the iclog frees the structure that is
attached to the iclog, then this leads to an UAF scenario that can
only be protected against by holding the icloglock from the point
callbacks are attached through to the release of the iclog. While we
currently do this, it is not practical or sustainable.

Hence we need to make shutdown processing the responsibility of the
context that holds active references to the iclog. We know that the
contexts attaching callbacks to the iclog must have active
references to the iclog, and that means they must be in either
ACTIVE or WANT_SYNC states. xlog_state_do_callback() will skip over
iclogs in these states -except- when the log is shut down.

xlog_state_do_callback() checks the state of the iclogs while
holding the icloglock, therefore the reference count/state change
that occurs in xlog_state_release_iclog() after the callbacks are
atomic w.r.t. shutdown processing.

We can't push the responsibility of callback cleanup onto the CIL
context because we can have ACTIVE iclogs that have callbacks
attached that have already been released. Hence we really need to
internalise the cleanup of callbacks into xlog_state_release_iclog()
processing.

Indeed, we already have that internalisation via:

xlog_state_release_iclog
drop last reference
->SYNCING
xlog_sync
xlog_write_iclog
if (log_is_shutdown)
xlog_state_done_syncing()
xlog_state_do_callback()
<process shutdown on iclog that is now in SYNCING state>

The problem is that xlog_state_release_iclog() aborts before doing
anything if the log is already shut down. It assumes that the
callbacks have already been cleaned up, and it doesn't need to do
any cleanup.

Hence the fix is to remove the xlog_is_shutdown() check from
xlog_state_release_iclog() so that reference counts are correctly
released from the iclogs, and when the reference count is zero we
always transition to SYNCING if the log is shut down. Hence we'll
always enter the xlog_sync() path in a shutdown and eventually end
up erroring out the iclog IO and running xlog_state_do_callback() to
process the callbacks attached to the iclog.

This allows us to stop processing referenced ACTIVE/WANT_SYNC iclogs
directly in the shutdown code, and in doing so gets rid of the UAF
vector that currently exists. This then decouples the adding of
callbacks to the iclogs from xlog_state_release_iclog() as we
guarantee that xlog_state_release_iclog() will process the callbacks
if the log has been shut down before xlog_state_release_iclog() has
been called.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2654e8f6

xfs: separate out log shutdown callback processing · 6cc9892d

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit aad7272a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aad7272a920869b950d937b87562e494af72523c

-------------------------------------------------

The iclog callback processing done during a forced log shutdown has
different logic to normal runtime IO completion callback processing.
Separate out the shutdown callbacks into their own function and call
that from the shutdown code instead.

We don't need this shutdown specific logic in the normal runtime
completion code - we'll always run the shutdown version on shutdown,
and it will do what shutdown needs regardless of whether there are
racing IO completion callbacks scheduled or in progress. Hence we
can also simplify the normal IO completion callpath and only abort
if shutdown occurred while we actively were processing callbacks.

Further, separating out the IO completion logic from the shutdown
logic avoids callback race conditions from being triggered by log IO
completion after a shutdown. IO completion will now only run
callbacks on iclogs that are in the correct state for a callback to
be run, avoiding the possibility of running callbacks on a
referenced iclog that hasn't yet been submitted for IO.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6cc9892d

xfs: rework xlog_state_do_callback() · e66c6279

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit 8bb92005
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8bb92005b0e4682a6e5dad131c5f3636c7d56dc1

-------------------------------------------------

Clean it up a bit by factoring and rearranging some of the code.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e66c6279

xfs: make forced shutdown processing atomic · e4976011

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit b36d4651
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b36d4651e1650082d27fa477318183c4a7210e30

-------------------------------------------------

The running of a forced shutdown is a bit of a mess. It does racy
checks for XFS_MOUNT_SHUTDOWN in xfs_do_force_shutdown(), then
does more racy checks in xfs_log_force_unmount() before finally
setting XFS_MOUNT_SHUTDOWN and XLOG_IO_ERROR under the
log->icloglock.

Move the checking and setting of XFS_MOUNT_SHUTDOWN into
xfs_do_force_shutdown() so we only process a shutdown once and once
only. Serialise this with the mp->m_sb_lock spinlock so that the
state change is atomic and won't race. Move all the mount specific
shutdown state changes from xfs_log_force_unmount() to
xfs_do_force_shutdown() so they are done atomically with setting
XFS_MOUNT_SHUTDOWN.

Then get rid of the racy xlog_is_shutdown() check from
xlog_force_shutdown(), and gate the log shutdown on the
test_and_set_bit(XLOG_IO_ERROR) test under the icloglock. This
means that the log is shutdown once and once only, and code that
needs to prevent races with shutdown can do so by holding the
icloglock and checking the return value of xlog_is_shutdown().

This results in a predictable shutdown execution process - we set the
shutdown flags once and process the shutdown once rather than the
current "as many concurrent shutdowns as can race to the flag
setting" situation we have now.

Also, now that shutdown is atomic, alway emit a stack trace when the
error level for the filesystem is high enough. This means that we
always get a stack trace when trying to diagnose the cause of
shutdowns in the field, rather than just for SHUTDOWN_CORRUPT_INCORE
cases.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e4976011

xfs: convert log flags to an operational state field · d5ca9d5d

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit e1d06e5f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e1d06e5f668a403f48538f0d6b163edfd4342adf

-------------------------------------------------

log->l_flags doesn't actually contain "flags" as such, it contains
operational state information that can change at runtime. For the
shutdown state, this at least should be an atomic bit because
it is read without holding locks in many places and so using atomic
bitops for the state field modifications makes sense.

This allows us to use things like test_and_set_bit() on state
changes (e.g. setting XLOG_TAIL_WARN) to avoid races in setting the
state when we aren't holding locks.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d5ca9d5d

xfs: move recovery needed state updates to xfs_log_mount_finish · 32a8357e

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit fd67d8a0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fd67d8a07208ab06560287b7b9334c2d50b7d6d7

-------------------------------------------------

xfs_log_mount_finish() needs to know if recovery is needed or not to
make decisions on whether to flush the log and AIL.  Move the
handling of the NEED_RECOVERY state out to this function rather than
needing a temporary variable to store this state over the call to
xlog_recover_finish().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

32a8357e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功