提交 · 88f04d8bd2d27d3140a26ef70a996000c7379e5e · openeuler / Kernel

27 1月, 2022 1 次提交

ipmi_si: Phytium S2500 workaround for MMIO-based IPMI · 88f04d8b

由 Laibin Qiu 提交于 1月 26, 2022

phytium inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4RK58
CVE: NA

--------------------------------

The system would hang up when the Phytium S2500 communicates with
some BMCs after several rounds of transactions, unless we reset
the controller timeout counter manually by calling firmware through
SMC.
Signed-off-by: NWang Yinfeng <wangyinfeng@phytium.com.cn>
Signed-off-by: Chen Baozi <chenbaozi@phytium.com.cn> #openEuler_contributor
Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

88f04d8b

26 1月, 2022 3 次提交

etmem: Add a scan flag to support specified page swap-out · 353db299

由 liubo 提交于 1月 26, 2022

euleros inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4QVXW
CVE: NA

-------------------------------------------------
etmem, the memory vertical expansion technology,

The existing memory expansion tool etmem swaps out all pages that can be
swapped out for the process by default, unless the page is marked with
lock flag.

The function of swapping out specified pages is added. The process adds
VM_SWAPFLAG flags for pages to be swapped out. The etmem adds filters to
the scanning module and swaps out only these pages.
Signed-off-by: Nliubo <liubo254@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

353db299

etmem: add swapcache reclaim to etmem · d2869c60

由 liubo 提交于 1月 26, 2022

euleros inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4QVXW
CVE: NA

-------------------------------------------------
etmem, the memory vertical expansion technology,

In the current etmem process, memory page swapping is implemented by
invoking shrink_page_list. When this interface is invoked for the first
time, pages are added to the swap cache and written to disks.The swap
cache page is reclaimed only when this interface is invoked for the
second time and no process accesses the page.However, in the etmem
process, the user mode scans pages that have been accessed, and the
migration is not delivered to pages that are not accessed by processes.
Therefore, the swap cache may always be occupied.
To solve the preceding problem, add the logic for actively reclaiming
the swap cache.When the swap cache occupies a large amount of memory,
the system proactively scans the LRU linked list and reclaims the
swap cache to save memory within the specified range.
Signed-off-by: Nliubo <liubo254@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d2869c60

etmem: add original kernel swap enabled options · 44983705

由 liubo 提交于 1月 26, 2022

euleros inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4QVXW
CVE: NA

-------------------------------------------------

etmem, the memory vertical expansion technology,
uses DRAM and high-performance storage new media to form multi-level
memory storage.
By grading the stored data, etmem migrates the classified cold
storage data from the storage medium to the high-performance
storage medium,
so as to achieve the purpose of memory capacity expansion and
memory cost reduction.

When the memory expansion function etmem is running, the native
swap function of the kernel needs to be disabled in certain
scenarios to avoid the impact of kernel swap.

This feature provides the preceding functions.

The /sys/kernel/mm/swap/ directory provides the kernel_swap_enable
sys interface to enable or disable the native swap function
of the kernel.

The default value of /sys/kernel/mm/swap/kernel_swap_enable is true,
that is, kernel swap is enabled by default.

Turn on kernel swap:
	echo true > /sys/kernel/mm/swap/kernel_swap_enable

Turn off kernel swap:
	echo false > /sys/kernel/mm/swap/kernel_swap_enable
Signed-off-by: Nliubo <liubo254@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

44983705

25 1月, 2022 1 次提交

net: bridge: clear bridge's private skb space on xmit · 724c2ccf

由 Nikolay Aleksandrov 提交于 1月 24, 2022

mainline inclusion
from mainline-v5.9-rc1
commit fd65e5a9
category: bugfix
bugzilla: 186114
CVE: NA

--------------------------------

We need to clear all of the bridge private skb variables as they can be
stale due to the packet being recirculated through the stack and then
transmitted through the bridge device. Similar memset is already done on
bridge's input. We've seen cases where proxyarp_replied was 1 on routed
multicast packets transmitted through the bridge to ports with neigh
suppress which were getting dropped. Same thing can in theory happen with
the port isolation bit as well.

Fixes: 821f1b21 ("bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd flood")
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHuang Guobin <huangguobin4@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

724c2ccf

20 1月, 2022 1 次提交

audit: bugfix for infinite loop when flush the hold queue · 67ab712f

由 Cui GaoSheng 提交于 1月 20, 2022

hulk inclusion
category: bugfix
bugzilla: 186105, https://gitee.com/openeuler/kernel/issues/I4RGWS?from=project-issue
CVE: NA

-----------------------------------------------------------------

When we add "audit=1" to the cmdline, if we keep the audit_hold_queue
non-empty, flush the hold queue will fall into an infinite loop. So we
need to fix it by stoping flush the hold queue when netlink abnormal.

Fixes: 3413ddc9 ("audit: improve robustness of the audit queue handling")
Signed-off-by: NCui GaoSheng <cuigaosheng1@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

67ab712f

19 1月, 2022 2 次提交

blk-throttle: enable hierarchical throttle in cgroup v1 · a17cb07b

由 Yu Kuai 提交于 1月 19, 2022

hulk inclusion
category: feature
bugzilla: 186072, https://gitee.com/openeuler/kernel/issues/I4RH0V
CVE: NA

-----------------------------------------------

blkio subsytem is not under default hierarchy in cgroup v1 by default,
which means configurations will only be effective on current cgroup
for io throttle.

This patch introduces a new feature that enable default hierarchy for
io throttle, which means configurations will be effective on child cgroups.
Such feature is disabled by default, and can be enabled by adding
"blkcg_global_limit=1" or "blkcg_global_limit=Y" or "blkcg_global_limit=y"
in boot cmd.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a17cb07b

xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate · 2d646063

由 Darrick J. Wong 提交于 1月 19, 2022

mainline inclusion
from mainline-v5.16-rc5
commit 983d8e60
category: bugfix
bugzilla: 186083
CVE: CVE-2021-4155

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=983d8e60f50806f90534cc5373d0ce867e5aaf79

--------------------------------

The old ALLOCSP/FREESP ioctls in XFS can be used to preallocate space at
the end of files, just like fallocate and RESVSP.  Make the behavior
consistent with the other ioctls.
Reported-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2d646063

18 1月, 2022 1 次提交

ip_gre: validate csum_start only on pull · a68f6a67

由 Willem de Bruijn 提交于 1月 18, 2022

mainline inclusion
from mainline-v5.14
commit 8a0ed250
category: bugfix
bugzilla: NA
CVE: CVE-2021-39633

-------------------------------------------------

The GRE tunnel device can pull existing outer headers in ipge_xmit.
This is a rare path, apparently unique to this device. The below
commit ensured that pulling does not move skb->data beyond csum_start.

But it has a false positive if ip_summed is not CHECKSUM_PARTIAL and
thus csum_start is irrelevant.

Refine to exclude this. At the same time simplify and strengthen the
test.

Simplify, by moving the check next to the offending pull, making it
more self documenting and removing an unnecessary branch from other
code paths.

Strengthen, by also ensuring that the transport header is correct and
therefore the inner headers will be after skb_reset_inner_headers.
The transport header is set to csum_start in skb_partial_csum_set.

Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
Fixes: 1d011c48 ("ip_gre: add validation for csum_start")
Reported-by: NIdo Schimmel <idosch@idosch.org>
Suggested-by: NAlexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHuang Guobin <huangguobin4@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a68f6a67

17 1月, 2022 20 次提交

hugetlbfs: fix issue of preallocation of gigantic pages can't work · d168c42d

由 Zhenguo Yao 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16-rc5
commit 4178158e
category: bugfix
bugzilla: 186043
CVE: NA

--------------------------------

Preallocation of gigantic pages can't work bacause of commit
b5389086 ("hugetlbfs: extend the definition of hugepages parameter
to support node allocation").  When nid is NUMA_NO_NODE(-1),
alloc_bootmem_huge_page will always return without doing allocation.
Fix this by adding more check.

Link: https://lkml.kernel.org/r/20211129133803.15653-1-yaozhenguo1@gmail.com
Fixes: b5389086 ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Signed-off-by: NZhenguo Yao <yaozhenguo1@gmail.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Tested-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d168c42d

hugetlbfs: extend the definition of hugepages parameter to support node allocation · b0750f70

由 Zhenguo Yao 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16-rc1
commit b5389086
category: feature
bugzilla: 186043
CVE: NA

--------------------------------

We can specify the number of hugepages to allocate at boot.  But the
hugepages is balanced in all nodes at present.  In some scenarios, we
only need hugepages in one node.  For example: DPDK needs hugepages
which are in the same node as NIC.

If DPDK needs four hugepages of 1G size in node1 and system has 16 numa
nodes we must reserve 64 hugepages on the kernel cmdline.  But only four
hugepages are used.  The others should be free after boot.  If the
system memory is low(for example: 64G), it will be an impossible task.

So extend the hugepages parameter to support specifying hugepages on a
specific node.  For example add following parameter:

  hugepagesz=1G hugepages=0:1,1:3

It will allocate 1 hugepage in node0 and 3 hugepages in node1.

Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.comSigned-off-by: NZhenguo Yao <yaozhenguo1@gmail.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	Documentation/admin-guide/kernel-parameters.txt
	Documentation/admin-guide/mm/hugetlbpage.rst
	arch/powerpc/mm/hugetlbpage.c
	include/linux/hugetlb.h
	mm/hugetlb.c
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b0750f70

mm: remove sharepool sp_unshare_uva current->mm NULL check · 4ec99782

由 Guo Mengqi 提交于 1月 17, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ODJ6
CVE: NA

---------------------------

Remove the unnecessary current->mm NULL check in sp_unshare_uva, and
allow process to unshare kernel mapped addresses in do_exit().
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

4ec99782

share pool: use rwsem to protect sp group exit · 3aa4f0a7

由 Guo Mengqi 提交于 1月 17, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ODMN
CVE: NA

-------------------------------------------------
Fix following situation:

when the last process in a group exits, and a second process tries to add
to this group.

The second process may get a invalid spg. However the group's
use_count is increased by 1, which caused the first process failed to
free the group when it exits. And then second process called
sp_group_drop --> free_sp_group and cause a double request of rwsem.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

3aa4f0a7

Add new module parameters:time out · a819b461

由 Yanling Song 提交于 1月 17, 2022

Ramaxel inclusion
category: features
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ON8F
CVE: NA

Changes:
1. Split scmd_tmout_nonpt into two parameters:
   scmd_tmout_vd/scmd_tmout_rawdisk
2. Return -ETIME instead of -EINVAL when command is timeout.
3. Add one module parameters: max_io_force.
Signed-off-by: NYanling Song <songyl@ramaxel.com>
Reviewed-by: NJiang Yu <yujiang@ramaxel.com>
Reviewed-by: NZhang Lei <zhanglei48@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a819b461

virtio-blk: validate num_queues during probe · 9e4cd940

由 Jason Wang 提交于 1月 17, 2022

mainline inclusion
from mainline-5.16
commit 6ae6ff6f
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

If an untrusted device neogitates BLK_F_MQ but advertises a zero
num_queues, the driver may end up trying to allocating zero size
buffers where ZERO_SIZE_PTR is returned which may pass the checking
against the NULL. This will lead unexpected results.

Fixing this by failing the probe in this case.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-2-jasowang@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NWenchao Hao <haowenchao@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

9e4cd940

virtio-blk: Use blk_validate_block_size() to validate block size · 8910dce8

由 Xie Yongji 提交于 1月 17, 2022

mainline inclusion
from mainline-5.16
commit 57a13a5b
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

The block layer can't support a block size larger than
page size yet. And a block size that's too small or
not a power of two won't work either. If a misconfigured
device presents an invalid block size in configuration space,
it will result in the kernel crash something like below:

[  506.154324] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  506.160416] RIP: 0010:create_empty_buffers+0x24/0x100
[  506.174302] Call Trace:
[  506.174651]  create_page_buffers+0x4d/0x60
[  506.175207]  block_read_full_page+0x50/0x380
[  506.175798]  ? __mod_lruvec_page_state+0x60/0xa0
[  506.176412]  ? __add_to_page_cache_locked+0x1b2/0x390
[  506.177085]  ? blkdev_direct_IO+0x4a0/0x4a0
[  506.177644]  ? scan_shadow_nodes+0x30/0x30
[  506.178206]  ? lru_cache_add+0x42/0x60
[  506.178716]  do_read_cache_page+0x695/0x740
[  506.179278]  ? read_part_sector+0xe0/0xe0
[  506.179821]  read_part_sector+0x36/0xe0
[  506.180337]  adfspart_check_ICS+0x32/0x320
[  506.180890]  ? snprintf+0x45/0x70
[  506.181350]  ? read_part_sector+0xe0/0xe0
[  506.181906]  bdev_disk_changed+0x229/0x5c0
[  506.182483]  blkdev_get_whole+0x6d/0x90
[  506.183013]  blkdev_get_by_dev+0x122/0x2d0
[  506.183562]  device_add_disk+0x39e/0x3c0
[  506.184472]  virtblk_probe+0x3f8/0x79b [virtio_blk]
[  506.185461]  virtio_dev_probe+0x15e/0x1d0 [virtio]

So let's use a block layer helper to validate the block size.

Conflict: origin patch used blk_cleanup_disk() which is introduced in
f525464a (block: add blk_alloc_disk and blk_cleanup_disk APIs) to
clean resource, this patch just call blk_cleanup_queue() to perform the
same operations.
Signed-off-by: NXie Yongji <xieyongji@bytedance.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20211026144015.188-5-xieyongji@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NWenchao Hao <haowenchao@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

8910dce8

block: Add a helper to validate the block size · 996af2e0

由 Xie Yongji 提交于 1月 17, 2022

mainline inclusion
from mainline-5.16
commit 570b1cac
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

There are some duplicated codes to validate the block
size in block drivers. This limitation actually comes
from block layer, so this patch tries to add a new block
layer helper for that.
Signed-off-by: NXie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20211026144015.188-2-xieyongji@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NWenchao Hao <haowenchao@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

996af2e0

Revert "virtio-blk: Add validation for block size in config space" · 0df07d96

由 Michael S. Tsirkin 提交于 1月 17, 2022

mainline inclusion
from mainline-5.15
commit ff631988
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

It turns out that access to config space before completing the feature
negotiation is broken for big endian guests at least with QEMU hosts up
to 6.1 inclusive.  This affects any device that accesses config space in
the validate callback: at the moment that is virtio-net with
VIRTIO_NET_F_MTU but since 82e89ea0 ("virtio-blk: Add validation for
block size in config space") that also started affecting virtio-blk with
VIRTIO_BLK_F_BLK_SIZE. Further, unlike VIRTIO_NET_F_MTU which is off by
default on QEMU, VIRTIO_BLK_F_BLK_SIZE is on by default, which resulted
in lots of people not being able to boot VMs on BE.

The spec is very clear that what we are doing is legal so QEMU needs to
be fixed, but given it's been broken for so many years and no one
noticed, we need to give QEMU a bit more time before applying this.

Further, this patch is incomplete (does not check blk size is a power
of two) and it duplicates the logic from nbd.

Revert for now, and we'll reapply a cleaner logic in the next release.

Cc: stable@vger.kernel.org
Fixes: 82e89ea0 ("virtio-blk: Add validation for block size in config space")
Cc: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NWenchao Hao <haowenchao@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

0df07d96

scsi: virtio_scsi: Rescan the entire target on transport reset when LUN is 0 · f3728a9d

由 Matej Genci 提交于 1月 17, 2022

mainline inclusion
from mainline-5.10
commit beef6fd0
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

VirtIO 1.0 spec says:

    The removed and rescan events ... when sent for LUN 0, they MAY
    apply to the entire target so the driver can ask the initiator
    to rescan the target to detect this.

This change introduces the behaviour described above by scanning the entire
SCSI target when LUN is set to 0. This is both a functional and a
performance fix. It aligns the driver with the spec and allows control
planes to hotplug targets with large numbers of LUNs without having to
request a RESCAN for each one of them.

Link: https://lore.kernel.org/r/CY4PR02MB33354370E0A81E75DD9DFE74FB520@CY4PR02MB3335.namprd02.prod.outlook.comSuggested-by: NFelipe Franciosi <felipe@nutanix.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMatej Genci <matej.genci@nutanix.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NWenchao Hao <haowenchao@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f3728a9d

Revert "svm: Add support to get svm mpam configuration" · c3eecacf

由 Xingang Wang 提交于 1月 17, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I49RB2
CVE: NA

-------------------------------------------------

This reverts commit 0cc88dd8.
The commit "svm: Add support to get svm mpam configuration"
add and export interface in svm module, this makes the mpam depend on
the svm module, just revert this to avoid coupling.
Signed-off-by: NXingang Wang <wangxingang5@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c3eecacf

Revert "svm: Add support to set svm mpam configuration" · 9f47aa00

由 Xingang Wang 提交于 1月 17, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I49RB2
CVE: NA

-------------------------------------------------

This reverts commit 464f6990.
The commit "svm: Add support to set svm mpam configuration"
add and export interface in svm module, this makes the mpam depend on
the svm module, just revert this to avoid coupling.
Signed-off-by: NXingang Wang <wangxingang5@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

9f47aa00

Revert "svm: Add svm_set_user_mpam_en to enable/disable mpam for smmu" · 63518bb5

由 Xingang Wang 提交于 1月 17, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I49RB2
CVE: NA

-------------------------------------------------

This reverts commit c50ad40d.
The commit "svm: Add svm_set_user_mpam_en to enable/disable mpam for smmu"
add and export interface in svm module, this makes the mpam depend on
the svm module, just revert this to avoid coupling.
Signed-off-by: NXingang Wang <wangxingang5@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

63518bb5

cgroup: Use open-time cgroup namespace for process migration perm checks · 6b6c3fb5

由 Tejun Heo 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16
commit e5745764
category: bugfix
bugzilla: NA
CVE: CVE-2021-4197

------------------------------------------------------------------------

cgroup process migration permission checks are performed at write time as
whether a given operation is allowed or not is dependent on the content of
the write - the PID. This currently uses current's cgroup namespace which is
a potential security weakness as it may allow scenarios where a less
privileged process tricks a more privileged one into writing into a fd that
it created.

This patch makes cgroup remember the cgroup namespace at the time of open
and uses it for migration permission checks instad of current's. Note that
this only applies to cgroup2 as cgroup1 doesn't have namespace support.

This also fixes a use-after-free bug on cgroupns reported in

 https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com

Note that backporting this fix also requires the preceding patch.
Reported-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Suggested-by: NLinus Torvalds <torvalds@linuxfoundation.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Reported-by: syzbot+50f5cf33a284ce738b62@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com
Fixes: 5136f636 ("cgroup: implement "nsdelegate" mount option")
Signed-off-by: NTejun Heo <tj@kernel.org>
Conflicts:
	kernel/cgroup/cgroup-internal.h
	kernel/cgroup/cgroup.c
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

6b6c3fb5

cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv · 75acfe71

由 Tejun Heo 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16
commit 0d2b5955
category: bugfix
bugzilla: NA
CVE: CVE-2021-4197

-------------------------------------------------------------------------

of->priv is currently used by each interface file implementation to store
private information. This patch collects the current two private data usages
into struct cgroup_file_ctx which is allocated and freed by the common path.
This allows generic private data which applies to multiple files, which will
be used to in the following patch.

Note that cgroup_procs iterator is now embedded as procs.iter in the new
cgroup_file_ctx so that it doesn't need to be allocated and freed
separately.

v2: union dropped from cgroup_file_ctx and the procs iterator is embedded in
    cgroup_file_ctx as suggested by Linus.

v3: Michal pointed out that cgroup1's procs pidlist uses of->priv too.
    Converted. Didn't change to embedded allocation as cgroup1 pidlists get
    stored for caching.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Conflicts:
	kernel/cgroup/cgroup-internal.h
	kernel/cgroup/cgroup.c
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

75acfe71

cgroup: Use open-time credentials for process migraton perm checks · d1bd89d1

由 Tejun Heo 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16
commit 1756d799
category: bugfix
bugzilla: NA
CVE: CVE-2021-4197

---------------------------------------------------

cgroup process migration permission checks are performed at write time as
whether a given operation is allowed or not is dependent on the content of
the write - the PID. This currently uses current's credentials which is a
potential security weakness as it may allow scenarios where a less
privileged process tricks a more privileged one into writing into a fd that
it created.

This patch makes both cgroup2 and cgroup1 process migration interfaces to
use the credentials saved at the time of open (file->f_cred) instead of
current's.
Reported-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Suggested-by: NLinus Torvalds <torvalds@linuxfoundation.org>
Fixes: 187fe840 ("cgroup: require write perm on common ancestor when moving processes on the default hierarchy")
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Conflicts:
	kernel/cgroup/cgroup.c
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d1bd89d1

NFC: add necessary privilege flags in netlink layer · ee2ad765

由 Lin Ma 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16-rc1
commit aedddb4e
category: bugfix
bugzilla: NA
CVE: CVE-2021-4202

--------------------------------

The CAP_NET_ADMIN checks are needed to prevent attackers faking a
device under NCIUARTSETDRIVER and exploit privileged commands.

This patch add GENL_ADMIN_PERM flags in genl_ops to fulfill the check.
Except for commands like NFC_CMD_GET_DEVICE, NFC_CMD_GET_TARGET,
NFC_CMD_LLC_GET_PARAMS, and NFC_CMD_GET_SE, which are mainly information-
read operations.
Signed-off-by: NLin Ma <linma@zju.edu.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Conflicts:
  net/nfc/netlink.c
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ee2ad765

NFC: add NCI_UNREG flag to eliminate the race · b77d2ff8

由 Lin Ma 提交于 1月 17, 2022

stable inclusion
from linux-4.19.219
commit 2350cffd71e74bf81dedc989fdec12aebe89a4a5
CVE: CVE-2021-4202

--------------------------------

commit 48b71a9e upstream.

There are two sites that calls queue_work() after the
destroy_workqueue() and lead to possible UAF.

The first site is nci_send_cmd(), which can happen after the
nci_close_device as below

nfcmrvl_nci_unregister_dev   |  nfc_genl_dev_up
  nci_close_device           |
    flush_workqueue          |
    del_timer_sync           |
  nci_unregister_device      |    nfc_get_device
    destroy_workqueue        |    nfc_dev_up
    nfc_unregister_device    |      nci_dev_up
      device_del             |        nci_open_device
                             |          __nci_request
                             |            nci_send_cmd
                             |              queue_work !!!

Another site is nci_cmd_timer, awaked by the nci_cmd_work from the
nci_send_cmd.

  ...                        |  ...
  nci_unregister_device      |  queue_work
    destroy_workqueue        |
    nfc_unregister_device    |  ...
      device_del             |  nci_cmd_work
                             |  mod_timer
                             |  ...
                             |  nci_cmd_timer
                             |    queue_work !!!

For the above two UAF, the root cause is that the nfc_dev_up can race
between the nci_unregister_device routine. Therefore, this patch
introduce NCI_UNREG flag to easily eliminate the possible race. In
addition, the mutex_lock in nci_close_device can act as a barrier.
Signed-off-by: NLin Ma <linma@zju.edu.cn>
Fixes: 6a2968aa ("NFC: basic NCI protocol implementation")
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20211116152732.19238-1-linma@zju.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b77d2ff8

NFC: reorder the logic in nfc_{un,}register_device · f985b762

由 Lin Ma 提交于 1月 17, 2022

stable inclusion
from linux-4.19.218
commit c45cea83e13699bdfd47842e04d09dd43af4c371
CVE: CVE-2021-4202

--------------------------------

[ Upstream commit 3e3b5dfc ]

There is a potential UAF between the unregistration routine and the NFC
netlink operations.

The race that cause that UAF can be shown as below:

 (FREE)                      |  (USE)
nfcmrvl_nci_unregister_dev   |  nfc_genl_dev_up
  nci_close_device           |
  nci_unregister_device      |    nfc_get_device
    nfc_unregister_device    |    nfc_dev_up
      rfkill_destory         |
      device_del             |      rfkill_blocked
  ...                        |    ...

The root cause for this race is concluded below:
1. The rfkill_blocked (USE) in nfc_dev_up is supposed to be placed after
the device_is_registered check.
2. Since the netlink operations are possible just after the device_add
in nfc_register_device, the nfc_dev_up() can happen anywhere during the
rfkill creation process, which leads to data race.

This patch reorder these actions to permit
1. Once device_del is finished, the nfc_dev_up cannot dereference the
rfkill object.
2. The rfkill_register need to be placed after the device_add of nfc_dev
because the parent device need to be created first. So this patch keeps
the order but inject device_lock to prevent the data race.
Signed-off-by: NLin Ma <linma@zju.edu.cn>
Fixes: be055b2f ("NFC: RFKILL support")
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20211116152652.19217-1-linma@zju.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f985b762

NFC: reorganize the functions in nci_request · b302dc7f

由 Lin Ma 提交于 1月 17, 2022

stable inclusion
from linux-4.19.218
commit 62be2b1e7914b7340281f09412a7bbb62e6c8b67
CVE: CVE-2021-4202

--------------------------------

[ Upstream commit 86cdf8e3 ]

There is a possible data race as shown below:

thread-A in nci_request()       | thread-B in nci_close_device()
                                | mutex_lock(&ndev->req_lock);
test_bit(NCI_UP, &ndev->flags); |
...                             | test_and_clear_bit(NCI_UP, &ndev->flags)
mutex_lock(&ndev->req_lock);    |
                                |

This race will allow __nci_request() to be awaked while the device is
getting removed.

Similar to commit e2cb6b89 ("bluetooth: eliminate the potential race
condition when removing the HCI controller"). this patch alters the
function sequence in nci_request() to prevent the data races between the
nci_close_device().
Signed-off-by: NLin Ma <linma@zju.edu.cn>
Fixes: 6a2968aa ("NFC: basic NCI protocol implementation")
Link: https://lore.kernel.org/r/20211115145600.8320-1-linma@zju.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b302dc7f

13 1月, 2022 3 次提交

ext4: Fix BUG_ON in ext4_bread when write quota data · c113ae0d

由 Ye Bin 提交于 1月 13, 2022

mainline inclusion
from mainline-v5.17
commit ce85548ab4295234b4f8e63a0eea0c157d2f6b25
category: bugfix
bugzilla: 185930
CVE: NA

-----------------------------------------------

We got issue as follows when run syzkaller:
[  167.936972] EXT4-fs error (device loop0): __ext4_remount:6314: comm rep: Abort forced by user
[  167.938306] EXT4-fs (loop0): Remounting filesystem read-only
[  167.981637] Assertion failure in ext4_getblk() at fs/ext4/inode.c:847: '(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) || handle != NULL || create == 0'
[  167.983601] ------------[ cut here ]------------
[  167.984245] kernel BUG at fs/ext4/inode.c:847!
[  167.984882] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[  167.985624] CPU: 7 PID: 2290 Comm: rep Tainted: G    B             5.16.0-rc5-next-20211217+ #123
[  167.986823] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[  167.988590] RIP: 0010:ext4_getblk+0x17e/0x504
[  167.989189] Code: c6 01 74 28 49 c7 c0 a0 a3 5c 9b b9 4f 03 00 00 48 c7 c2 80 9c 5c 9b 48 c7 c6 40 b6 5c 9b 48 c7 c7 20 a4 5c 9b e8 77 e3 fd ff <0f> 0b 8b 04 244
[  167.991679] RSP: 0018:ffff8881736f7398 EFLAGS: 00010282
[  167.992385] RAX: 0000000000000094 RBX: 1ffff1102e6dee75 RCX: 0000000000000000
[  167.993337] RDX: 0000000000000001 RSI: ffffffff9b6e29e0 RDI: ffffed102e6dee66
[  167.994292] RBP: ffff88816a076210 R08: 0000000000000094 R09: ffffed107363fa09
[  167.995252] R10: ffff88839b1fd047 R11: ffffed107363fa08 R12: ffff88816a0761e8
[  167.996205] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000001
[  167.997158] FS:  00007f6a1428c740(0000) GS:ffff88839b000000(0000) knlGS:0000000000000000
[  167.998238] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  167.999025] CR2: 00007f6a140716c8 CR3: 0000000133216000 CR4: 00000000000006e0
[  167.999987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  168.000944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  168.001899] Call Trace:
[  168.002235]  <TASK>
[  168.007167]  ext4_bread+0xd/0x53
[  168.007612]  ext4_quota_write+0x20c/0x5c0
[  168.010457]  write_blk+0x100/0x220
[  168.010944]  remove_free_dqentry+0x1c6/0x440
[  168.011525]  free_dqentry.isra.0+0x565/0x830
[  168.012133]  remove_tree+0x318/0x6d0
[  168.014744]  remove_tree+0x1eb/0x6d0
[  168.017346]  remove_tree+0x1eb/0x6d0
[  168.019969]  remove_tree+0x1eb/0x6d0
[  168.022128]  qtree_release_dquot+0x291/0x340
[  168.023297]  v2_release_dquot+0xce/0x120
[  168.023847]  dquot_release+0x197/0x3e0
[  168.024358]  ext4_release_dquot+0x22a/0x2d0
[  168.024932]  dqput.part.0+0x1c9/0x900
[  168.025430]  __dquot_drop+0x120/0x190
[  168.025942]  ext4_clear_inode+0x86/0x220
[  168.026472]  ext4_evict_inode+0x9e8/0xa22
[  168.028200]  evict+0x29e/0x4f0
[  168.028625]  dispose_list+0x102/0x1f0
[  168.029148]  evict_inodes+0x2c1/0x3e0
[  168.030188]  generic_shutdown_super+0xa4/0x3b0
[  168.030817]  kill_block_super+0x95/0xd0
[  168.031360]  deactivate_locked_super+0x85/0xd0
[  168.031977]  cleanup_mnt+0x2bc/0x480
[  168.033062]  task_work_run+0xd1/0x170
[  168.033565]  do_exit+0xa4f/0x2b50
[  168.037155]  do_group_exit+0xef/0x2d0
[  168.037666]  __x64_sys_exit_group+0x3a/0x50
[  168.038237]  do_syscall_64+0x3b/0x90
[  168.038751]  entry_SYSCALL_64_after_hwframe+0x44/0xae

In order to reproduce this problem, the following conditions need to be met:
1. Ext4 filesystem with no journal;
2. Filesystem image with incorrect quota data;
3. Abort filesystem forced by user;
4. umount filesystem;

As in ext4_quota_write:
...
         if (EXT4_SB(sb)->s_journal && !handle) {
                 ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)"
                         " cancelled because transaction is not started",
                         (unsigned long long)off, (unsigned long long)len);
                 return -EIO;
         }
...
We only check handle if NULL when filesystem has journal. There is need
check handle if NULL even when filesystem has no journal.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211223015506.297766-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c113ae0d

PM: hibernate: use correct mode for swsusp_close() · 2ae13e7a

由 Thomas Zeitlhofer 提交于 1月 13, 2022

stable inclusion
from linux-v4.19.219
commit 68945e943519df1532e598fafab16ac54488933f

---------------------------------------------------

[ Upstream commit cefcf24b ]

Commit 39fbef4b ("PM: hibernate: Get block device exclusively in
swsusp_check()") changed the opening mode of the block device to
(FMODE_READ | FMODE_EXCL).

In the corresponding calls to swsusp_close(), the mode is still just
FMODE_READ which triggers the warning in blkdev_flush_mapping() on
resume from hibernate.

So, use the mode (FMODE_READ | FMODE_EXCL) also when closing the
device.

Fixes: 39fbef4b ("PM: hibernate: Get block device exclusively in swsusp_check()")
Signed-off-by: NThomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2ae13e7a

Revert "watchdog: Fix check_preemption_disabled() error" · c8f15bf5

由 Yang Yingliang 提交于 1月 13, 2022

hulk inclusion
category: bugfix
bugzilla: 173968, https://gitee.com/openeuler/kernel/issues/I3J87Y
CVE: NA

---------------------------

This reverts commit b2e484e9.

When CONFIG_LOCKDEP and CONFIG_DEBUG_LOCKDEP are enabled, it detects the following error:

[   10.145007] BUG: sleeping function called from invalid context at mm/slab.h:418
[   10.145394] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
[   10.145765] Preemption disabled at:
[   10.145978] [<ffff000008f8e7b4>] hardlockup_detector_perf_init+0x20/0x100
[   10.146770] CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.90+ #3
[   10.148242] Hardware name: linux,dummy-virt (DT)
[   10.148572] Call trace:
[   10.148667]  dump_backtrace+0x0/0x190
[   10.148765]  show_stack+0x24/0x30
[   10.148875]  dump_stack+0xa4/0xf8
[   10.148964]  ___might_sleep+0x150/0x180
[   10.149065]  __might_sleep+0x58/0x90
[   10.149199]  kmem_cache_alloc_trace+0x244/0x2b0
[   10.149308]  perf_event_alloc+0x74/0x680
[   10.149402]  perf_event_create_kernel_counter+0x2c/0x190
[   10.149516]  arch_probe_cpu_freq+0x84/0x1ac
[   10.149611]  hw_nmi_get_sample_period+0xb8/0x180
[   10.149713]  hardlockup_detector_event_create+0x28/0xfc
[   10.149827]  hardlockup_detector_perf_init+0x24/0x100
[   10.149943]  watchdog_nmi_probe+0x14/0x1c
[   10.150037]  lockup_detector_init+0x58/0x98
[   10.150173]  kernel_init_freeable+0x10c/0x1c4
[   10.150298]  kernel_init+0x18/0x110
[   10.150422]  ret_from_fork+0x10/0x18

In 'b2e484e9 ("watchdog: Fix check_preemption_disabled() error")', we
tried to fix check_preemption_disabled() error by disabling preemption in
hardlockup_detector_perf_init(), but missed that function
perf_event_create_kernel_counter() may sleep.

The preemption is always disabled, the problem that wanted be fixed is not
existed, so just revert this commit.

Fixes: b2e484e9 ("watchdog: Fix check_preemption_disabled() error")
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c8f15bf5

06 1月, 2022 2 次提交

arm64/mpam: fix mpam dts init arm_mpam_of_device_ids error · 7ea0c3fe

由 Xingang Wang 提交于 1月 06, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I49RB2
CVE: NA

---------------------------------------------------

[    0.596145] BUG: KASAN: global-out-of-bounds in __of_match_node.part.0+0xe0/0x110
[    0.596731] Read of size 1 at addr ffff2000099a8288 by task swapper/0/1
[    0.597247]
[    0.597372] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.90+ #34
[    0.597858] Hardware name: linux,dummy-virt (DT)
[    0.598243] Call trace:
[    0.598443]  dump_backtrace+0x0/0x360
[    0.598734]  show_stack+0x24/0x30
[    0.599004]  dump_stack+0xdc/0x128
[    0.599323]  print_address_description+0x184/0x278
[    0.599771]  kasan_report+0x204/0x330
[    0.600117]  __asan_report_load1_noabort+0x30/0x40
[    0.600566]  __of_match_node.part.0+0xe0/0x110
[    0.600980]  of_match_node+0x6c/0xa8
[    0.601316]  of_match_device+0x48/0x70
[    0.601669]  platform_match+0xa4/0x260
[    0.602037]  __driver_attach+0x68/0x128
[    0.602397]  bus_for_each_dev+0x118/0x198
[    0.602773]  driver_attach+0x48/0x60
[    0.603112]  bus_add_driver+0x330/0x658
[    0.603472]  driver_register+0x148/0x398
[    0.603839]  __platform_driver_register+0xd4/0x108
[    0.604288]  arm_mpam_driver_init+0x64/0x78
[    0.604680]  do_one_initcall+0xbc/0x488
[    0.605039]  kernel_init_freeable+0x604/0x6f8
[    0.605447]  kernel_init+0x18/0x130
[    0.605775]  ret_from_fork+0x10/0x18
[    0.606130]
[    0.606274] The buggy address belongs to the variable:
[    0.606754]  arm_mpam_of_device_ids+0xc8/0x380
[    0.607168]
[    0.607314] Memory state around the buggy address:
[    0.607762]  ffff2000099a8180: 00 00 00 fa fa fa fa fa 00 00 00 00 00 00 00 00
[    0.608429]  ffff2000099a8200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.609095] >ffff2000099a8280: 00 fa fa fa fa fa fa fa 05 fa fa fa fa fa fa fa
[    0.609760]                       ^
[    0.610101]  ffff2000099a8300: 00 00 07 fa fa fa fa fa 00 04 fa fa fa fa fa fa
[    0.610771]  ffff2000099a8380: 00 00 00 06 fa fa fa fa 00 01 fa fa fa fa fa fa

The arm_mpam_of_device_ids array has no end item, so the array access
might be out of bounds. When enable the KASAN config, the out of bounds
call trace occured. The add empty end item for arm_mpam_of_device_ids
array to fix this issue.

Fixes: b45bdb5a ("arm64/mpam: add device tree support for mpam initialization")
Signed-off-by: NXingang Wang <wangxingang5@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

7ea0c3fe

arm64/mpam: fix mpam probe error for wrong init order · 82e2f45f

由 Xingang Wang 提交于 1月 06, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I49RB2
CVE: NA

---------------------------------------------------

The mpam init procedure failed when probe with ACPI:
[    1.148657 ] ACPI MPAM: No CPU has cache with PPTT reference 0x72
[    1.148658 ] ACPI MPAM: All CPUs must be online to probe mpam.
[    1.148660 ] ACPI MPAM: discovery failed: -19

This is because mpam need to be probed after all cpus be online, the
arm_mpam_driver_init must be called after cacheinfo_sysfs_init, so the
device_initcall should be replaced with device_initcall_sync.
Fixes: b45bdb5a ("arm64/mpam: add device tree support for mpam initialization")
Signed-off-by: NXingang Wang <wangxingang5@huawei.com>
Reviewed-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

82e2f45f

31 12月, 2021 6 次提交

mm: export collect_procs() · bb784b81

由 Zhang Jian 提交于 12月 31, 2021

ascend inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OXH9
CVE: NA

-------------------------------------------------

Collect the processes who have the page mapped via collect_procs().

@page if the page is a part of the hugepages/compound-page, we must
using compound_head() to find it's head page to prevent the kernel panic,
and make the page be locked.

@to_kill the function will return a linked list, when we have used
this list, we must kfree the list.

@force_early if we want to find all process, we must make it be true, if
it's false, the function will only return the process who have PF_MCE_PROCESS
or PF_MCE_EARLY mark.

limits: if force_early is true, sysctl_memory_failure_early_kill is useless.
If it's false, no process have PF_MCE_PROCESS and PF_MCE_EARLY flag, and
the sysctl_memory_failure_early_kill is enabled, function will return all tasks
whether the task have the PF_MCE_PROCESS and PF_MCE_EARLY flag.
Signed-off-by: NZhang Jian <zhangjian210@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

bb784b81

net: hns: update hns version to 21.12.1 · 5dd1df36

由 Yonglong Liu 提交于 12月 31, 2021

driver inclusion
category: other
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OSUK
CVE: NA

----------------------------
Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
Reviewed-by: NKangfenglong <kangfenglong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

5dd1df36

net: hns: fix bug when two ports opened promisc mode both · 12601a9b

由 Yonglong Liu 提交于 12月 31, 2021

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OSUK
CVE: NA

----------------------------

When just adds eth1 to an OVS network, and eth1 and eth0 open
promisc mode both, the icmp6 neighbor solicitation packets from
OVS to eth1 will be sent back to the OVS network, cause
incorrect learning of arp.

The hns driver used a TCAM table to handle the promisc settings,
when setting TCAM table, the port mask of multicast should be
'0xf'(exact match), not 'port number'(fuzzy match). So when two
ports has the wrong port mask both, The icmp6 neighbor
solicitation packets will be incorrectly sent back to eth1.

This patch adds a mac_key to record the acturally port number,
use mask_key to record the 'exact match' port number to fix the
bug.

Fixes: a6c8c2c9a089 ("net: hns: fix non-promiscuous mode does not take effect problem")
Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
Reviewed-by: NKangfenglong <kangfenglong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

12601a9b

net: hns3: update hns3 version to 21.12.4 · cf7dfd77

由 Yonglong Liu 提交于 12月 31, 2021

driver inclusion
category: other
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OSRU
CVE: NA

----------------------------
Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
Reviewed-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

cf7dfd77

net: hns3: fix the concurrency between functions reading debugfs · aae46585

由 Yufeng Mo 提交于 12月 31, 2021

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OSRU
CVE: NA

----------------------------

[1298504.847848] Call trace:
[1298504.847859] [<ffff000008089e14>] dump_backtrace+0x0/0x23c
[1298504.847865] [<ffff00000808a074>] show_stack+0x24/0x2c
[1298504.847870] [<ffff0000088568a8>] dump_stack+0x84/0xa8
[1298504.847878] [<ffff0000082122fc>] bad_page+0xec/0x14c
[1298504.847883] [<ffff000008219384>] free_pages_check_bad+0x90/0x9c
[1298504.847888] [<ffff00000821307c>] __free_pages_ok+0x2b8/0x2ec
[1298504.847894] [<ffff0000082153ec>] __free_pages+0x44/0x64
[1298504.847900] [<ffff000008288788>] kfree+0x198/0x1a0
[1298504.847905] [<ffff00000823432c>] kvfree+0x3c/0x58
[1298504.847937] [<ffff0000014fabf4>] hns3_dbg_read+0xf4/0x278 [hns3]
[1298504.847944] [<ffff000008359550>] full_proxy_read+0x60/0x90
[1298504.847949] [<ffff0000082b22a4>] __vfs_read+0x58/0x178
[1298504.847952] [<ffff0000082b2454>] vfs_read+0x90/0x14c
[1298504.847956] [<ffff0000082b2b70>] SyS_read+0x60/0xc0

When different functions reading the same debugfs node, it will
cause double free problem, because different functions shared
the same node buffer.

This patch make different functions have their own buffer to fix
the problem.

Fixes: 319ba0a4 ("net: hns3: fix race condition in debugfs")
Fixes: c91910ef ("net: hns3: refactor the debugfs process")
Signed-off-by: NYufeng Mo <moyufeng@huawei.com>
Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
Reviewed-by: NJian Shen <shenjian15@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

aae46585

f2fs: fix to do sanity check on last xattr entry in __f2fs_setxattr() · 17bc8efe

由 Chao Yu 提交于 12月 31, 2021

mainline inclusion
from mainline-v5.17
commit 5598b24efaf4892741c798b425d543e4bed357a1
category: bugfix
CVE: CVE-2021-45469

--------------------------------

As Wenqing Liu reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=215235

- Overview
page fault in f2fs_setxattr() when mount and operate on corrupted image

- Reproduce
tested on kernel 5.16-rc3, 5.15.X under root

1. unzip tmp7.zip
2. ./single.sh f2fs 7

Sometimes need to run the script several times

- Kernel dump
loop0: detected capacity change from 0 to 131072
F2FS-fs (loop0): Found nat_bits in checkpoint
F2FS-fs (loop0): Mounted with checkpoint version = 7548c2ee
BUG: unable to handle page fault for address: ffffe47bc7123f48
RIP: 0010:kfree+0x66/0x320
Call Trace:
 __f2fs_setxattr+0x2aa/0xc00 [f2fs]
 f2fs_setxattr+0xfa/0x480 [f2fs]
 __f2fs_set_acl+0x19b/0x330 [f2fs]
 __vfs_removexattr+0x52/0x70
 __vfs_removexattr_locked+0xb1/0x140
 vfs_removexattr+0x56/0x100
 removexattr+0x57/0x80
 path_removexattr+0xa3/0xc0
 __x64_sys_removexattr+0x17/0x20
 do_syscall_64+0x37/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xae

The root cause is in __f2fs_setxattr(), we missed to do sanity check on
last xattr entry, result in out-of-bound memory access during updating
inconsistent xattr data of target inode.

After the fix, it can detect such xattr inconsistency as below:

F2FS-fs (loop11): inode (7) has invalid last xattr entry, entry_size: 60676
F2FS-fs (loop11): inode (8) has corrupted xattr
F2FS-fs (loop11): inode (8) has corrupted xattr
F2FS-fs (loop11): inode (8) has invalid last xattr entry, entry_size: 47736

Cc: stable@vger.kernel.org
Reported-by: NWenqing Liu <wenqingliu0120@gmail.com>
Signed-off-by: NChao Yu <chao@kernel.org>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
Conflicts:
  fs/f2fs/xattr.c
[yyl: replace f2fs_err() with f2fs_msg(KERN_ERR)]
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Nfang wei <fangwei1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

17bc8efe

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功