提交 · 06f66958fd4d359a6bf7a7dcf03861bae7e462eb · openeuler / Kernel

29 1月, 2022 7 次提交

shm: extend forced shm destroy to support objects from several IPC nses · 06f66958

由 Alexander Mikhalitsyn 提交于 1月 27, 2022

stable inclusion
from linux-4.19.220
commit 4e91adc73764fd0cde00be381461e52fa641fadd

--------------------------------

commit 85b6d246 upstream.

Currently, the exit_shm() function not designed to work properly when
task->sysvshm.shm_clist holds shm objects from different IPC namespaces.

This is a real pain when sysctl kernel.shm_rmid_forced = 1, because it
leads to use-after-free (reproducer exists).

This is an attempt to fix the problem by extending exit_shm mechanism to
handle shm's destroy from several IPC ns'es.

To achieve that we do several things:

1. add a namespace (non-refcounted) pointer to the struct shmid_kernel

2. during new shm object creation (newseg()/shmget syscall) we
   initialize this pointer by current task IPC ns

3. exit_shm() fully reworked such that it traverses over all shp's in
   task->sysvshm.shm_clist and gets IPC namespace not from current task
   as it was before but from shp's object itself, then call
   shm_destroy(shp, ns).

Note: We need to be really careful here, because as it was said before
(1), our pointer to IPC ns non-refcnt'ed.  To be on the safe side we
using special helper get_ipc_ns_not_zero() which allows to get IPC ns
refcounter only if IPC ns not in the "state of destruction".

Q/A

Q: Why can we access shp->ns memory using non-refcounted pointer?
A: Because shp object lifetime is always shorther than IPC namespace
   lifetime, so, if we get shp object from the task->sysvshm.shm_clist
   while holding task_lock(task) nobody can steal our namespace.

Q: Does this patch change semantics of unshare/setns/clone syscalls?
A: No. It's just fixes non-covered case when process may leave IPC
   namespace without getting task->sysvshm.shm_clist list cleaned up.

Link: https://lkml.kernel.org/r/67bb03e5-f79c-1815-e2bf-949c67047418@colorfullife.com
Link: https://lkml.kernel.org/r/20211109151501.4921-1-manfred@colorfullife.com
Fixes: ab602f79 ("shm: make exit_shm work proportional to task activity")
Co-developed-by: NManfred Spraul <manfred@colorfullife.com>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAlexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Vasily Averin <vvs@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

06f66958

fuse: release pipe buf after last use · 50d41d75

由 Miklos Szeredi 提交于 1月 27, 2022

stable inclusion
from linux-4.19.219
commit 22b814fdce55caf8db1cb550a32a64159e5b83a8

--------------------------------

commit 47344172 upstream.

Checking buf->flags should be done before the pipe_buf_release() is called
on the pipe buffer, since releasing the buffer might modify the flags.

This is exactly what page_cache_pipe_buf_release() does, and which results
in the same VM_BUG_ON_PAGE(PageLRU(page)) that the original patch was
trying to fix.
Reported-by: NJustin Forbes <jmforbes@linuxtx.org>
Fixes: 712a9510 ("fuse: fix page stealing")
Cc: <stable@vger.kernel.org> # v2.6.35
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

50d41d75

tracing: Check pid filtering when creating events · c899c69f

由 Steven Rostedt (VMware) 提交于 1月 27, 2022

stable inclusion
from linux-4.19.219
commit 2692931d92d83afaa636703e95203d53629ca7c1

--------------------------------

commit 6cb20650 upstream.

When pid filtering is activated in an instance, all of the events trace
files for that instance has the PID_FILTER flag set. This determines
whether or not pid filtering needs to be done on the event, otherwise the
event is executed as normal.

If pid filtering is enabled when an event is created (via a dynamic event
or modules), its flag is not updated to reflect the current state, and the
events are not filtered properly.

Cc: stable@vger.kernel.org
Fixes: 3fdaf80f ("tracing: Implement event pid filtering")
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c899c69f

ipv6: fix typos in __ip6_finish_output() · aa79869e

由 Eric Dumazet 提交于 1月 27, 2022

stable inclusion
from linux-4.19.219
commit 3b7c37106bcf08f154404f1f212a3713ebf37cec

--------------------------------

[ Upstream commit 19d36c5f ]

We deal with IPv6 packets, so we need to use IP6CB(skb)->flags and
IP6SKB_REROUTED, instead of IPCB(skb)->flags and IPSKB_REROUTED

Found by code inspection, please double check that fixing this bug
does not surface other bugs.

Fixes: 09ee9dba ("ipv6: Reinject IPv6 packets if IPsec policy matches after SNAT")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tobias Brunner <tobias@strongswan.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: David Ahern <dsahern@kernel.org>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Tested-by: NTobias Brunner <tobias@strongswan.org>
Acked-by: NTobias Brunner <tobias@strongswan.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

aa79869e

proc/vmcore: fix clearing user buffer by properly using clear_user() · 23e6f885

由 David Hildenbrand 提交于 1月 27, 2022

stable inclusion
from linux-4.19.219
commit 9ef384ed300d1bcfb23d0ab0b487d544444d4b52

--------------------------------

commit c1e63117 upstream.

To clear a user buffer we cannot simply use memset, we have to use
clear_user().  With a virtio-mem device that registers a vmcore_cb and
has some logically unplugged memory inside an added Linux memory block,
I can easily trigger a BUG by copying the vmcore via "cp":

  systemd[1]: Starting Kdump Vmcore Save Service...
  kdump[420]: Kdump is using the default log level(3).
  kdump[453]: saving to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
  kdump[458]: saving vmcore-dmesg.txt to /sysroot/var/crash/127.0.0.1-2021-11-11-14:59:22/
  kdump[465]: saving vmcore-dmesg.txt complete
  kdump[467]: saving vmcore
  BUG: unable to handle page fault for address: 00007f2374e01000
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0003) - permissions violation
  PGD 7a523067 P4D 7a523067 PUD 7a528067 PMD 7a525067 PTE 800000007048f867
  Oops: 0003 [#1] PREEMPT SMP NOPTI
  CPU: 0 PID: 468 Comm: cp Not tainted 5.15.0+ #6
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
  RIP: 0010:read_from_oldmem.part.0.cold+0x1d/0x86
  Code: ff ff ff e8 05 ff fe ff e9 b9 e9 7f ff 48 89 de 48 c7 c7 38 3b 60 82 e8 f1 fe fe ff 83 fd 08 72 3c 49 8d 7d 08 4c 89 e9 89 e8 <49> c7 45 00 00 00 00 00 49 c7 44 05 f8 00 00 00 00 48 83 e7 f81
  RSP: 0018:ffffc9000073be08 EFLAGS: 00010212
  RAX: 0000000000001000 RBX: 00000000002fd000 RCX: 00007f2374e01000
  RDX: 0000000000000001 RSI: 00000000ffffdfff RDI: 00007f2374e01008
  RBP: 0000000000001000 R08: 0000000000000000 R09: ffffc9000073bc50
  R10: ffffc9000073bc48 R11: ffffffff829461a8 R12: 000000000000f000
  R13: 00007f2374e01000 R14: 0000000000000000 R15: ffff88807bd421e8
  FS:  00007f2374e12140(0000) GS:ffff88807f000000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f2374e01000 CR3: 000000007a4aa000 CR4: 0000000000350eb0
  Call Trace:
   read_vmcore+0x236/0x2c0
   proc_reg_read+0x55/0xa0
   vfs_read+0x95/0x190
   ksys_read+0x4f/0xc0
   do_syscall_64+0x3b/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Some x86-64 CPUs have a CPU feature called "Supervisor Mode Access
Prevention (SMAP)", which is used to detect wrong access from the kernel
to user buffers like this: SMAP triggers a permissions violation on
wrong access.  In the x86-64 variant of clear_user(), SMAP is properly
handled via clac()+stac().

To fix, properly use clear_user() when we're dealing with a user buffer.

Link: https://lkml.kernel.org/r/20211112092750.6921-1-david@redhat.com
Fixes: 997c136f ("fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages")
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NBaoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

23e6f885

tracing: Fix pid filtering when triggers are attached · aefce778

由 Steven Rostedt (VMware) 提交于 1月 27, 2022

stable inclusion
from linux-4.19.219
commit bbaf4fe5ec34e7931af9a03a60b4071c2bc01ae0

--------------------------------

commit a55f224f upstream.

If a event is filtered by pid and a trigger that requires processing of
the event to happen is a attached to the event, the discard portion does
not take the pid filtering into account, and the event will then be
recorded when it should not have been.

Cc: stable@vger.kernel.org
Fixes: 3fdaf80f ("tracing: Implement event pid filtering")
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

aefce778

fuse: fix page stealing · fea5eeb2

由 Miklos Szeredi 提交于 1月 27, 2022

stable inclusion
from linux-4.19.219
commit 65f1f3eb09a3907c5f5edaff6c473b99e49e6020

--------------------------------

commit 712a9510 upstream.

It is possible to trigger a crash by splicing anon pipe bufs to the fuse
device.

The reason for this is that anon_pipe_buf_release() will reuse buf->page if
the refcount is 1, but that page might have already been stolen and its
flags modified (e.g. PG_lru added).

This happens in the unlikely case of fuse_dev_splice_write() getting around
to calling pipe_buf_release() after a page has been stolen, added to the
page cache and removed from the page cache.

Fix by calling pipe_buf_release() right after the page was inserted into
the page cache.  In this case the page has an elevated refcount so any
release function will know that the page isn't reusable.
Reported-by: NFrank Dinoff <fdinoff@google.com>
Link: https://lore.kernel.org/r/CAAmZXrsGg2xsP1CK+cbuEMumtrqdvD-NKnWzhNcvn71RV3c1yw@mail.gmail.com/
Fixes: dd3bb14f ("fuse: support splice() writing to fuse device")
Cc: <stable@vger.kernel.org> # v2.6.35
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

fea5eeb2

27 1月, 2022 1 次提交

ipmi_si: Phytium S2500 workaround for MMIO-based IPMI · 88f04d8b

由 Laibin Qiu 提交于 1月 26, 2022

phytium inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4RK58
CVE: NA

--------------------------------

The system would hang up when the Phytium S2500 communicates with
some BMCs after several rounds of transactions, unless we reset
the controller timeout counter manually by calling firmware through
SMC.
Signed-off-by: NWang Yinfeng <wangyinfeng@phytium.com.cn>
Signed-off-by: Chen Baozi <chenbaozi@phytium.com.cn> #openEuler_contributor
Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

88f04d8b

26 1月, 2022 3 次提交

etmem: Add a scan flag to support specified page swap-out · 353db299

由 liubo 提交于 1月 26, 2022

euleros inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4QVXW
CVE: NA

-------------------------------------------------
etmem, the memory vertical expansion technology,

The existing memory expansion tool etmem swaps out all pages that can be
swapped out for the process by default, unless the page is marked with
lock flag.

The function of swapping out specified pages is added. The process adds
VM_SWAPFLAG flags for pages to be swapped out. The etmem adds filters to
the scanning module and swaps out only these pages.
Signed-off-by: Nliubo <liubo254@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

353db299

etmem: add swapcache reclaim to etmem · d2869c60

由 liubo 提交于 1月 26, 2022

euleros inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4QVXW
CVE: NA

-------------------------------------------------
etmem, the memory vertical expansion technology,

In the current etmem process, memory page swapping is implemented by
invoking shrink_page_list. When this interface is invoked for the first
time, pages are added to the swap cache and written to disks.The swap
cache page is reclaimed only when this interface is invoked for the
second time and no process accesses the page.However, in the etmem
process, the user mode scans pages that have been accessed, and the
migration is not delivered to pages that are not accessed by processes.
Therefore, the swap cache may always be occupied.
To solve the preceding problem, add the logic for actively reclaiming
the swap cache.When the swap cache occupies a large amount of memory,
the system proactively scans the LRU linked list and reclaims the
swap cache to save memory within the specified range.
Signed-off-by: Nliubo <liubo254@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d2869c60

etmem: add original kernel swap enabled options · 44983705

由 liubo 提交于 1月 26, 2022

euleros inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4QVXW
CVE: NA

-------------------------------------------------

etmem, the memory vertical expansion technology,
uses DRAM and high-performance storage new media to form multi-level
memory storage.
By grading the stored data, etmem migrates the classified cold
storage data from the storage medium to the high-performance
storage medium,
so as to achieve the purpose of memory capacity expansion and
memory cost reduction.

When the memory expansion function etmem is running, the native
swap function of the kernel needs to be disabled in certain
scenarios to avoid the impact of kernel swap.

This feature provides the preceding functions.

The /sys/kernel/mm/swap/ directory provides the kernel_swap_enable
sys interface to enable or disable the native swap function
of the kernel.

The default value of /sys/kernel/mm/swap/kernel_swap_enable is true,
that is, kernel swap is enabled by default.

Turn on kernel swap:
	echo true > /sys/kernel/mm/swap/kernel_swap_enable

Turn off kernel swap:
	echo false > /sys/kernel/mm/swap/kernel_swap_enable
Signed-off-by: Nliubo <liubo254@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

44983705

25 1月, 2022 1 次提交

net: bridge: clear bridge's private skb space on xmit · 724c2ccf

由 Nikolay Aleksandrov 提交于 1月 24, 2022

mainline inclusion
from mainline-v5.9-rc1
commit fd65e5a9
category: bugfix
bugzilla: 186114
CVE: NA

--------------------------------

We need to clear all of the bridge private skb variables as they can be
stale due to the packet being recirculated through the stack and then
transmitted through the bridge device. Similar memset is already done on
bridge's input. We've seen cases where proxyarp_replied was 1 on routed
multicast packets transmitted through the bridge to ports with neigh
suppress which were getting dropped. Same thing can in theory happen with
the port isolation bit as well.

Fixes: 821f1b21 ("bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd flood")
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHuang Guobin <huangguobin4@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

724c2ccf

20 1月, 2022 1 次提交

audit: bugfix for infinite loop when flush the hold queue · 67ab712f

由 Cui GaoSheng 提交于 1月 20, 2022

hulk inclusion
category: bugfix
bugzilla: 186105, https://gitee.com/openeuler/kernel/issues/I4RGWS?from=project-issue
CVE: NA

-----------------------------------------------------------------

When we add "audit=1" to the cmdline, if we keep the audit_hold_queue
non-empty, flush the hold queue will fall into an infinite loop. So we
need to fix it by stoping flush the hold queue when netlink abnormal.

Fixes: 3413ddc9 ("audit: improve robustness of the audit queue handling")
Signed-off-by: NCui GaoSheng <cuigaosheng1@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

67ab712f

19 1月, 2022 2 次提交

blk-throttle: enable hierarchical throttle in cgroup v1 · a17cb07b

由 Yu Kuai 提交于 1月 19, 2022

hulk inclusion
category: feature
bugzilla: 186072, https://gitee.com/openeuler/kernel/issues/I4RH0V
CVE: NA

-----------------------------------------------

blkio subsytem is not under default hierarchy in cgroup v1 by default,
which means configurations will only be effective on current cgroup
for io throttle.

This patch introduces a new feature that enable default hierarchy for
io throttle, which means configurations will be effective on child cgroups.
Such feature is disabled by default, and can be enabled by adding
"blkcg_global_limit=1" or "blkcg_global_limit=Y" or "blkcg_global_limit=y"
in boot cmd.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a17cb07b

xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate · 2d646063

由 Darrick J. Wong 提交于 1月 19, 2022

mainline inclusion
from mainline-v5.16-rc5
commit 983d8e60
category: bugfix
bugzilla: 186083
CVE: CVE-2021-4155

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=983d8e60f50806f90534cc5373d0ce867e5aaf79

--------------------------------

The old ALLOCSP/FREESP ioctls in XFS can be used to preallocate space at
the end of files, just like fallocate and RESVSP.  Make the behavior
consistent with the other ioctls.
Reported-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2d646063

18 1月, 2022 1 次提交

ip_gre: validate csum_start only on pull · a68f6a67

由 Willem de Bruijn 提交于 1月 18, 2022

mainline inclusion
from mainline-v5.14
commit 8a0ed250
category: bugfix
bugzilla: NA
CVE: CVE-2021-39633

-------------------------------------------------

The GRE tunnel device can pull existing outer headers in ipge_xmit.
This is a rare path, apparently unique to this device. The below
commit ensured that pulling does not move skb->data beyond csum_start.

But it has a false positive if ip_summed is not CHECKSUM_PARTIAL and
thus csum_start is irrelevant.

Refine to exclude this. At the same time simplify and strengthen the
test.

Simplify, by moving the check next to the offending pull, making it
more self documenting and removing an unnecessary branch from other
code paths.

Strengthen, by also ensuring that the transport header is correct and
therefore the inner headers will be after skb_reset_inner_headers.
The transport header is set to csum_start in skb_partial_csum_set.

Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
Fixes: 1d011c48 ("ip_gre: add validation for csum_start")
Reported-by: NIdo Schimmel <idosch@idosch.org>
Suggested-by: NAlexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHuang Guobin <huangguobin4@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a68f6a67

17 1月, 2022 20 次提交

hugetlbfs: fix issue of preallocation of gigantic pages can't work · d168c42d

由 Zhenguo Yao 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16-rc5
commit 4178158e
category: bugfix
bugzilla: 186043
CVE: NA

--------------------------------

Preallocation of gigantic pages can't work bacause of commit
b5389086 ("hugetlbfs: extend the definition of hugepages parameter
to support node allocation").  When nid is NUMA_NO_NODE(-1),
alloc_bootmem_huge_page will always return without doing allocation.
Fix this by adding more check.

Link: https://lkml.kernel.org/r/20211129133803.15653-1-yaozhenguo1@gmail.com
Fixes: b5389086 ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Signed-off-by: NZhenguo Yao <yaozhenguo1@gmail.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Tested-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d168c42d

hugetlbfs: extend the definition of hugepages parameter to support node allocation · b0750f70

由 Zhenguo Yao 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16-rc1
commit b5389086
category: feature
bugzilla: 186043
CVE: NA

--------------------------------

We can specify the number of hugepages to allocate at boot.  But the
hugepages is balanced in all nodes at present.  In some scenarios, we
only need hugepages in one node.  For example: DPDK needs hugepages
which are in the same node as NIC.

If DPDK needs four hugepages of 1G size in node1 and system has 16 numa
nodes we must reserve 64 hugepages on the kernel cmdline.  But only four
hugepages are used.  The others should be free after boot.  If the
system memory is low(for example: 64G), it will be an impossible task.

So extend the hugepages parameter to support specifying hugepages on a
specific node.  For example add following parameter:

  hugepagesz=1G hugepages=0:1,1:3

It will allocate 1 hugepage in node0 and 3 hugepages in node1.

Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.comSigned-off-by: NZhenguo Yao <yaozhenguo1@gmail.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	Documentation/admin-guide/kernel-parameters.txt
	Documentation/admin-guide/mm/hugetlbpage.rst
	arch/powerpc/mm/hugetlbpage.c
	include/linux/hugetlb.h
	mm/hugetlb.c
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b0750f70

mm: remove sharepool sp_unshare_uva current->mm NULL check · 4ec99782

由 Guo Mengqi 提交于 1月 17, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ODJ6
CVE: NA

---------------------------

Remove the unnecessary current->mm NULL check in sp_unshare_uva, and
allow process to unshare kernel mapped addresses in do_exit().
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

4ec99782

share pool: use rwsem to protect sp group exit · 3aa4f0a7

由 Guo Mengqi 提交于 1月 17, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ODMN
CVE: NA

-------------------------------------------------
Fix following situation:

when the last process in a group exits, and a second process tries to add
to this group.

The second process may get a invalid spg. However the group's
use_count is increased by 1, which caused the first process failed to
free the group when it exits. And then second process called
sp_group_drop --> free_sp_group and cause a double request of rwsem.
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

3aa4f0a7

Add new module parameters:time out · a819b461

由 Yanling Song 提交于 1月 17, 2022

Ramaxel inclusion
category: features
bugzilla: https://gitee.com/openeuler/kernel/issues/I4ON8F
CVE: NA

Changes:
1. Split scmd_tmout_nonpt into two parameters:
   scmd_tmout_vd/scmd_tmout_rawdisk
2. Return -ETIME instead of -EINVAL when command is timeout.
3. Add one module parameters: max_io_force.
Signed-off-by: NYanling Song <songyl@ramaxel.com>
Reviewed-by: NJiang Yu <yujiang@ramaxel.com>
Reviewed-by: NZhang Lei <zhanglei48@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a819b461

virtio-blk: validate num_queues during probe · 9e4cd940

由 Jason Wang 提交于 1月 17, 2022

mainline inclusion
from mainline-5.16
commit 6ae6ff6f
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

If an untrusted device neogitates BLK_F_MQ but advertises a zero
num_queues, the driver may end up trying to allocating zero size
buffers where ZERO_SIZE_PTR is returned which may pass the checking
against the NULL. This will lead unexpected results.

Fixing this by failing the probe in this case.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-2-jasowang@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NWenchao Hao <haowenchao@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

9e4cd940

virtio-blk: Use blk_validate_block_size() to validate block size · 8910dce8

由 Xie Yongji 提交于 1月 17, 2022

mainline inclusion
from mainline-5.16
commit 57a13a5b
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

The block layer can't support a block size larger than
page size yet. And a block size that's too small or
not a power of two won't work either. If a misconfigured
device presents an invalid block size in configuration space,
it will result in the kernel crash something like below:

[  506.154324] BUG: kernel NULL pointer dereference, address: 0000000000000008
[  506.160416] RIP: 0010:create_empty_buffers+0x24/0x100
[  506.174302] Call Trace:
[  506.174651]  create_page_buffers+0x4d/0x60
[  506.175207]  block_read_full_page+0x50/0x380
[  506.175798]  ? __mod_lruvec_page_state+0x60/0xa0
[  506.176412]  ? __add_to_page_cache_locked+0x1b2/0x390
[  506.177085]  ? blkdev_direct_IO+0x4a0/0x4a0
[  506.177644]  ? scan_shadow_nodes+0x30/0x30
[  506.178206]  ? lru_cache_add+0x42/0x60
[  506.178716]  do_read_cache_page+0x695/0x740
[  506.179278]  ? read_part_sector+0xe0/0xe0
[  506.179821]  read_part_sector+0x36/0xe0
[  506.180337]  adfspart_check_ICS+0x32/0x320
[  506.180890]  ? snprintf+0x45/0x70
[  506.181350]  ? read_part_sector+0xe0/0xe0
[  506.181906]  bdev_disk_changed+0x229/0x5c0
[  506.182483]  blkdev_get_whole+0x6d/0x90
[  506.183013]  blkdev_get_by_dev+0x122/0x2d0
[  506.183562]  device_add_disk+0x39e/0x3c0
[  506.184472]  virtblk_probe+0x3f8/0x79b [virtio_blk]
[  506.185461]  virtio_dev_probe+0x15e/0x1d0 [virtio]

So let's use a block layer helper to validate the block size.

Conflict: origin patch used blk_cleanup_disk() which is introduced in
f525464a (block: add blk_alloc_disk and blk_cleanup_disk APIs) to
clean resource, this patch just call blk_cleanup_queue() to perform the
same operations.
Signed-off-by: NXie Yongji <xieyongji@bytedance.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20211026144015.188-5-xieyongji@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NWenchao Hao <haowenchao@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

8910dce8

block: Add a helper to validate the block size · 996af2e0

由 Xie Yongji 提交于 1月 17, 2022

mainline inclusion
from mainline-5.16
commit 570b1cac
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

There are some duplicated codes to validate the block
size in block drivers. This limitation actually comes
from block layer, so this patch tries to add a new block
layer helper for that.
Signed-off-by: NXie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20211026144015.188-2-xieyongji@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NWenchao Hao <haowenchao@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

996af2e0

Revert "virtio-blk: Add validation for block size in config space" · 0df07d96

由 Michael S. Tsirkin 提交于 1月 17, 2022

mainline inclusion
from mainline-5.15
commit ff631988
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

It turns out that access to config space before completing the feature
negotiation is broken for big endian guests at least with QEMU hosts up
to 6.1 inclusive.  This affects any device that accesses config space in
the validate callback: at the moment that is virtio-net with
VIRTIO_NET_F_MTU but since 82e89ea0 ("virtio-blk: Add validation for
block size in config space") that also started affecting virtio-blk with
VIRTIO_BLK_F_BLK_SIZE. Further, unlike VIRTIO_NET_F_MTU which is off by
default on QEMU, VIRTIO_BLK_F_BLK_SIZE is on by default, which resulted
in lots of people not being able to boot VMs on BE.

The spec is very clear that what we are doing is legal so QEMU needs to
be fixed, but given it's been broken for so many years and no one
noticed, we need to give QEMU a bit more time before applying this.

Further, this patch is incomplete (does not check blk size is a power
of two) and it duplicates the logic from nbd.

Revert for now, and we'll reapply a cleaner logic in the next release.

Cc: stable@vger.kernel.org
Fixes: 82e89ea0 ("virtio-blk: Add validation for block size in config space")
Cc: Xie Yongji <xieyongji@bytedance.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NWenchao Hao <haowenchao@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

0df07d96

scsi: virtio_scsi: Rescan the entire target on transport reset when LUN is 0 · f3728a9d

由 Matej Genci 提交于 1月 17, 2022

mainline inclusion
from mainline-5.10
commit beef6fd0
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

VirtIO 1.0 spec says:

    The removed and rescan events ... when sent for LUN 0, they MAY
    apply to the entire target so the driver can ask the initiator
    to rescan the target to detect this.

This change introduces the behaviour described above by scanning the entire
SCSI target when LUN is set to 0. This is both a functional and a
performance fix. It aligns the driver with the spec and allows control
planes to hotplug targets with large numbers of LUNs without having to
request a RESCAN for each one of them.

Link: https://lore.kernel.org/r/CY4PR02MB33354370E0A81E75DD9DFE74FB520@CY4PR02MB3335.namprd02.prod.outlook.comSuggested-by: NFelipe Franciosi <felipe@nutanix.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMatej Genci <matej.genci@nutanix.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NWenchao Hao <haowenchao@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f3728a9d

Revert "svm: Add support to get svm mpam configuration" · c3eecacf

由 Xingang Wang 提交于 1月 17, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I49RB2
CVE: NA

-------------------------------------------------

This reverts commit 0cc88dd8.
The commit "svm: Add support to get svm mpam configuration"
add and export interface in svm module, this makes the mpam depend on
the svm module, just revert this to avoid coupling.
Signed-off-by: NXingang Wang <wangxingang5@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c3eecacf

Revert "svm: Add support to set svm mpam configuration" · 9f47aa00

由 Xingang Wang 提交于 1月 17, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I49RB2
CVE: NA

-------------------------------------------------

This reverts commit 464f6990.
The commit "svm: Add support to set svm mpam configuration"
add and export interface in svm module, this makes the mpam depend on
the svm module, just revert this to avoid coupling.
Signed-off-by: NXingang Wang <wangxingang5@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

9f47aa00

Revert "svm: Add svm_set_user_mpam_en to enable/disable mpam for smmu" · 63518bb5

由 Xingang Wang 提交于 1月 17, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I49RB2
CVE: NA

-------------------------------------------------

This reverts commit c50ad40d.
The commit "svm: Add svm_set_user_mpam_en to enable/disable mpam for smmu"
add and export interface in svm module, this makes the mpam depend on
the svm module, just revert this to avoid coupling.
Signed-off-by: NXingang Wang <wangxingang5@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

63518bb5

cgroup: Use open-time cgroup namespace for process migration perm checks · 6b6c3fb5

由 Tejun Heo 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16
commit e5745764
category: bugfix
bugzilla: NA
CVE: CVE-2021-4197

------------------------------------------------------------------------

cgroup process migration permission checks are performed at write time as
whether a given operation is allowed or not is dependent on the content of
the write - the PID. This currently uses current's cgroup namespace which is
a potential security weakness as it may allow scenarios where a less
privileged process tricks a more privileged one into writing into a fd that
it created.

This patch makes cgroup remember the cgroup namespace at the time of open
and uses it for migration permission checks instad of current's. Note that
this only applies to cgroup2 as cgroup1 doesn't have namespace support.

This also fixes a use-after-free bug on cgroupns reported in

 https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com

Note that backporting this fix also requires the preceding patch.
Reported-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Suggested-by: NLinus Torvalds <torvalds@linuxfoundation.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Reported-by: syzbot+50f5cf33a284ce738b62@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/00000000000048c15c05d0083397@google.com
Fixes: 5136f636 ("cgroup: implement "nsdelegate" mount option")
Signed-off-by: NTejun Heo <tj@kernel.org>
Conflicts:
	kernel/cgroup/cgroup-internal.h
	kernel/cgroup/cgroup.c
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

6b6c3fb5

cgroup: Allocate cgroup_file_ctx for kernfs_open_file->priv · 75acfe71

由 Tejun Heo 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16
commit 0d2b5955
category: bugfix
bugzilla: NA
CVE: CVE-2021-4197

-------------------------------------------------------------------------

of->priv is currently used by each interface file implementation to store
private information. This patch collects the current two private data usages
into struct cgroup_file_ctx which is allocated and freed by the common path.
This allows generic private data which applies to multiple files, which will
be used to in the following patch.

Note that cgroup_procs iterator is now embedded as procs.iter in the new
cgroup_file_ctx so that it doesn't need to be allocated and freed
separately.

v2: union dropped from cgroup_file_ctx and the procs iterator is embedded in
    cgroup_file_ctx as suggested by Linus.

v3: Michal pointed out that cgroup1's procs pidlist uses of->priv too.
    Converted. Didn't change to embedded allocation as cgroup1 pidlists get
    stored for caching.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Conflicts:
	kernel/cgroup/cgroup-internal.h
	kernel/cgroup/cgroup.c
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

75acfe71

cgroup: Use open-time credentials for process migraton perm checks · d1bd89d1

由 Tejun Heo 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16
commit 1756d799
category: bugfix
bugzilla: NA
CVE: CVE-2021-4197

---------------------------------------------------

cgroup process migration permission checks are performed at write time as
whether a given operation is allowed or not is dependent on the content of
the write - the PID. This currently uses current's credentials which is a
potential security weakness as it may allow scenarios where a less
privileged process tricks a more privileged one into writing into a fd that
it created.

This patch makes both cgroup2 and cgroup1 process migration interfaces to
use the credentials saved at the time of open (file->f_cred) instead of
current's.
Reported-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Suggested-by: NLinus Torvalds <torvalds@linuxfoundation.org>
Fixes: 187fe840 ("cgroup: require write perm on common ancestor when moving processes on the default hierarchy")
Reviewed-by: NMichal Koutný <mkoutny@suse.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Conflicts:
	kernel/cgroup/cgroup.c
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d1bd89d1

NFC: add necessary privilege flags in netlink layer · ee2ad765

由 Lin Ma 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16-rc1
commit aedddb4e
category: bugfix
bugzilla: NA
CVE: CVE-2021-4202

--------------------------------

The CAP_NET_ADMIN checks are needed to prevent attackers faking a
device under NCIUARTSETDRIVER and exploit privileged commands.

This patch add GENL_ADMIN_PERM flags in genl_ops to fulfill the check.
Except for commands like NFC_CMD_GET_DEVICE, NFC_CMD_GET_TARGET,
NFC_CMD_LLC_GET_PARAMS, and NFC_CMD_GET_SE, which are mainly information-
read operations.
Signed-off-by: NLin Ma <linma@zju.edu.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Conflicts:
  net/nfc/netlink.c
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ee2ad765

NFC: add NCI_UNREG flag to eliminate the race · b77d2ff8

由 Lin Ma 提交于 1月 17, 2022

stable inclusion
from linux-4.19.219
commit 2350cffd71e74bf81dedc989fdec12aebe89a4a5
CVE: CVE-2021-4202

--------------------------------

commit 48b71a9e upstream.

There are two sites that calls queue_work() after the
destroy_workqueue() and lead to possible UAF.

The first site is nci_send_cmd(), which can happen after the
nci_close_device as below

nfcmrvl_nci_unregister_dev   |  nfc_genl_dev_up
  nci_close_device           |
    flush_workqueue          |
    del_timer_sync           |
  nci_unregister_device      |    nfc_get_device
    destroy_workqueue        |    nfc_dev_up
    nfc_unregister_device    |      nci_dev_up
      device_del             |        nci_open_device
                             |          __nci_request
                             |            nci_send_cmd
                             |              queue_work !!!

Another site is nci_cmd_timer, awaked by the nci_cmd_work from the
nci_send_cmd.

  ...                        |  ...
  nci_unregister_device      |  queue_work
    destroy_workqueue        |
    nfc_unregister_device    |  ...
      device_del             |  nci_cmd_work
                             |  mod_timer
                             |  ...
                             |  nci_cmd_timer
                             |    queue_work !!!

For the above two UAF, the root cause is that the nfc_dev_up can race
between the nci_unregister_device routine. Therefore, this patch
introduce NCI_UNREG flag to easily eliminate the possible race. In
addition, the mutex_lock in nci_close_device can act as a barrier.
Signed-off-by: NLin Ma <linma@zju.edu.cn>
Fixes: 6a2968aa ("NFC: basic NCI protocol implementation")
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20211116152732.19238-1-linma@zju.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b77d2ff8

NFC: reorder the logic in nfc_{un,}register_device · f985b762

由 Lin Ma 提交于 1月 17, 2022

stable inclusion
from linux-4.19.218
commit c45cea83e13699bdfd47842e04d09dd43af4c371
CVE: CVE-2021-4202

--------------------------------

[ Upstream commit 3e3b5dfc ]

There is a potential UAF between the unregistration routine and the NFC
netlink operations.

The race that cause that UAF can be shown as below:

 (FREE)                      |  (USE)
nfcmrvl_nci_unregister_dev   |  nfc_genl_dev_up
  nci_close_device           |
  nci_unregister_device      |    nfc_get_device
    nfc_unregister_device    |    nfc_dev_up
      rfkill_destory         |
      device_del             |      rfkill_blocked
  ...                        |    ...

The root cause for this race is concluded below:
1. The rfkill_blocked (USE) in nfc_dev_up is supposed to be placed after
the device_is_registered check.
2. Since the netlink operations are possible just after the device_add
in nfc_register_device, the nfc_dev_up() can happen anywhere during the
rfkill creation process, which leads to data race.

This patch reorder these actions to permit
1. Once device_del is finished, the nfc_dev_up cannot dereference the
rfkill object.
2. The rfkill_register need to be placed after the device_add of nfc_dev
because the parent device need to be created first. So this patch keeps
the order but inject device_lock to prevent the data race.
Signed-off-by: NLin Ma <linma@zju.edu.cn>
Fixes: be055b2f ("NFC: RFKILL support")
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NKrzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20211116152652.19217-1-linma@zju.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f985b762

NFC: reorganize the functions in nci_request · b302dc7f

由 Lin Ma 提交于 1月 17, 2022

stable inclusion
from linux-4.19.218
commit 62be2b1e7914b7340281f09412a7bbb62e6c8b67
CVE: CVE-2021-4202

--------------------------------

[ Upstream commit 86cdf8e3 ]

There is a possible data race as shown below:

thread-A in nci_request()       | thread-B in nci_close_device()
                                | mutex_lock(&ndev->req_lock);
test_bit(NCI_UP, &ndev->flags); |
...                             | test_and_clear_bit(NCI_UP, &ndev->flags)
mutex_lock(&ndev->req_lock);    |
                                |

This race will allow __nci_request() to be awaked while the device is
getting removed.

Similar to commit e2cb6b89 ("bluetooth: eliminate the potential race
condition when removing the HCI controller"). this patch alters the
function sequence in nci_request() to prevent the data races between the
nci_close_device().
Signed-off-by: NLin Ma <linma@zju.edu.cn>
Fixes: 6a2968aa ("NFC: basic NCI protocol implementation")
Link: https://lore.kernel.org/r/20211115145600.8320-1-linma@zju.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b302dc7f

13 1月, 2022 3 次提交

ext4: Fix BUG_ON in ext4_bread when write quota data · c113ae0d

由 Ye Bin 提交于 1月 13, 2022

mainline inclusion
from mainline-v5.17
commit ce85548ab4295234b4f8e63a0eea0c157d2f6b25
category: bugfix
bugzilla: 185930
CVE: NA

-----------------------------------------------

We got issue as follows when run syzkaller:
[  167.936972] EXT4-fs error (device loop0): __ext4_remount:6314: comm rep: Abort forced by user
[  167.938306] EXT4-fs (loop0): Remounting filesystem read-only
[  167.981637] Assertion failure in ext4_getblk() at fs/ext4/inode.c:847: '(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) || handle != NULL || create == 0'
[  167.983601] ------------[ cut here ]------------
[  167.984245] kernel BUG at fs/ext4/inode.c:847!
[  167.984882] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[  167.985624] CPU: 7 PID: 2290 Comm: rep Tainted: G    B             5.16.0-rc5-next-20211217+ #123
[  167.986823] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[  167.988590] RIP: 0010:ext4_getblk+0x17e/0x504
[  167.989189] Code: c6 01 74 28 49 c7 c0 a0 a3 5c 9b b9 4f 03 00 00 48 c7 c2 80 9c 5c 9b 48 c7 c6 40 b6 5c 9b 48 c7 c7 20 a4 5c 9b e8 77 e3 fd ff <0f> 0b 8b 04 244
[  167.991679] RSP: 0018:ffff8881736f7398 EFLAGS: 00010282
[  167.992385] RAX: 0000000000000094 RBX: 1ffff1102e6dee75 RCX: 0000000000000000
[  167.993337] RDX: 0000000000000001 RSI: ffffffff9b6e29e0 RDI: ffffed102e6dee66
[  167.994292] RBP: ffff88816a076210 R08: 0000000000000094 R09: ffffed107363fa09
[  167.995252] R10: ffff88839b1fd047 R11: ffffed107363fa08 R12: ffff88816a0761e8
[  167.996205] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000001
[  167.997158] FS:  00007f6a1428c740(0000) GS:ffff88839b000000(0000) knlGS:0000000000000000
[  167.998238] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  167.999025] CR2: 00007f6a140716c8 CR3: 0000000133216000 CR4: 00000000000006e0
[  167.999987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  168.000944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  168.001899] Call Trace:
[  168.002235]  <TASK>
[  168.007167]  ext4_bread+0xd/0x53
[  168.007612]  ext4_quota_write+0x20c/0x5c0
[  168.010457]  write_blk+0x100/0x220
[  168.010944]  remove_free_dqentry+0x1c6/0x440
[  168.011525]  free_dqentry.isra.0+0x565/0x830
[  168.012133]  remove_tree+0x318/0x6d0
[  168.014744]  remove_tree+0x1eb/0x6d0
[  168.017346]  remove_tree+0x1eb/0x6d0
[  168.019969]  remove_tree+0x1eb/0x6d0
[  168.022128]  qtree_release_dquot+0x291/0x340
[  168.023297]  v2_release_dquot+0xce/0x120
[  168.023847]  dquot_release+0x197/0x3e0
[  168.024358]  ext4_release_dquot+0x22a/0x2d0
[  168.024932]  dqput.part.0+0x1c9/0x900
[  168.025430]  __dquot_drop+0x120/0x190
[  168.025942]  ext4_clear_inode+0x86/0x220
[  168.026472]  ext4_evict_inode+0x9e8/0xa22
[  168.028200]  evict+0x29e/0x4f0
[  168.028625]  dispose_list+0x102/0x1f0
[  168.029148]  evict_inodes+0x2c1/0x3e0
[  168.030188]  generic_shutdown_super+0xa4/0x3b0
[  168.030817]  kill_block_super+0x95/0xd0
[  168.031360]  deactivate_locked_super+0x85/0xd0
[  168.031977]  cleanup_mnt+0x2bc/0x480
[  168.033062]  task_work_run+0xd1/0x170
[  168.033565]  do_exit+0xa4f/0x2b50
[  168.037155]  do_group_exit+0xef/0x2d0
[  168.037666]  __x64_sys_exit_group+0x3a/0x50
[  168.038237]  do_syscall_64+0x3b/0x90
[  168.038751]  entry_SYSCALL_64_after_hwframe+0x44/0xae

In order to reproduce this problem, the following conditions need to be met:
1. Ext4 filesystem with no journal;
2. Filesystem image with incorrect quota data;
3. Abort filesystem forced by user;
4. umount filesystem;

As in ext4_quota_write:
...
         if (EXT4_SB(sb)->s_journal && !handle) {
                 ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)"
                         " cancelled because transaction is not started",
                         (unsigned long long)off, (unsigned long long)len);
                 return -EIO;
         }
...
We only check handle if NULL when filesystem has journal. There is need
check handle if NULL even when filesystem has no journal.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20211223015506.297766-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c113ae0d

PM: hibernate: use correct mode for swsusp_close() · 2ae13e7a

由 Thomas Zeitlhofer 提交于 1月 13, 2022

stable inclusion
from linux-v4.19.219
commit 68945e943519df1532e598fafab16ac54488933f

---------------------------------------------------

[ Upstream commit cefcf24b ]

Commit 39fbef4b ("PM: hibernate: Get block device exclusively in
swsusp_check()") changed the opening mode of the block device to
(FMODE_READ | FMODE_EXCL).

In the corresponding calls to swsusp_close(), the mode is still just
FMODE_READ which triggers the warning in blkdev_flush_mapping() on
resume from hibernate.

So, use the mode (FMODE_READ | FMODE_EXCL) also when closing the
device.

Fixes: 39fbef4b ("PM: hibernate: Get block device exclusively in swsusp_check()")
Signed-off-by: NThomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

2ae13e7a

Revert "watchdog: Fix check_preemption_disabled() error" · c8f15bf5

由 Yang Yingliang 提交于 1月 13, 2022

hulk inclusion
category: bugfix
bugzilla: 173968, https://gitee.com/openeuler/kernel/issues/I3J87Y
CVE: NA

---------------------------

This reverts commit b2e484e9.

When CONFIG_LOCKDEP and CONFIG_DEBUG_LOCKDEP are enabled, it detects the following error:

[   10.145007] BUG: sleeping function called from invalid context at mm/slab.h:418
[   10.145394] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
[   10.145765] Preemption disabled at:
[   10.145978] [<ffff000008f8e7b4>] hardlockup_detector_perf_init+0x20/0x100
[   10.146770] CPU: 6 PID: 1 Comm: swapper/0 Not tainted 4.19.90+ #3
[   10.148242] Hardware name: linux,dummy-virt (DT)
[   10.148572] Call trace:
[   10.148667]  dump_backtrace+0x0/0x190
[   10.148765]  show_stack+0x24/0x30
[   10.148875]  dump_stack+0xa4/0xf8
[   10.148964]  ___might_sleep+0x150/0x180
[   10.149065]  __might_sleep+0x58/0x90
[   10.149199]  kmem_cache_alloc_trace+0x244/0x2b0
[   10.149308]  perf_event_alloc+0x74/0x680
[   10.149402]  perf_event_create_kernel_counter+0x2c/0x190
[   10.149516]  arch_probe_cpu_freq+0x84/0x1ac
[   10.149611]  hw_nmi_get_sample_period+0xb8/0x180
[   10.149713]  hardlockup_detector_event_create+0x28/0xfc
[   10.149827]  hardlockup_detector_perf_init+0x24/0x100
[   10.149943]  watchdog_nmi_probe+0x14/0x1c
[   10.150037]  lockup_detector_init+0x58/0x98
[   10.150173]  kernel_init_freeable+0x10c/0x1c4
[   10.150298]  kernel_init+0x18/0x110
[   10.150422]  ret_from_fork+0x10/0x18

In 'b2e484e9 ("watchdog: Fix check_preemption_disabled() error")', we
tried to fix check_preemption_disabled() error by disabling preemption in
hardlockup_detector_perf_init(), but missed that function
perf_event_create_kernel_counter() may sleep.

The preemption is always disabled, the problem that wanted be fixed is not
existed, so just revert this commit.

Fixes: b2e484e9 ("watchdog: Fix check_preemption_disabled() error")
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c8f15bf5

06 1月, 2022 1 次提交

arm64/mpam: fix mpam dts init arm_mpam_of_device_ids error · 7ea0c3fe

由 Xingang Wang 提交于 1月 06, 2022

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I49RB2
CVE: NA

---------------------------------------------------

[    0.596145] BUG: KASAN: global-out-of-bounds in __of_match_node.part.0+0xe0/0x110
[    0.596731] Read of size 1 at addr ffff2000099a8288 by task swapper/0/1
[    0.597247]
[    0.597372] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.90+ #34
[    0.597858] Hardware name: linux,dummy-virt (DT)
[    0.598243] Call trace:
[    0.598443]  dump_backtrace+0x0/0x360
[    0.598734]  show_stack+0x24/0x30
[    0.599004]  dump_stack+0xdc/0x128
[    0.599323]  print_address_description+0x184/0x278
[    0.599771]  kasan_report+0x204/0x330
[    0.600117]  __asan_report_load1_noabort+0x30/0x40
[    0.600566]  __of_match_node.part.0+0xe0/0x110
[    0.600980]  of_match_node+0x6c/0xa8
[    0.601316]  of_match_device+0x48/0x70
[    0.601669]  platform_match+0xa4/0x260
[    0.602037]  __driver_attach+0x68/0x128
[    0.602397]  bus_for_each_dev+0x118/0x198
[    0.602773]  driver_attach+0x48/0x60
[    0.603112]  bus_add_driver+0x330/0x658
[    0.603472]  driver_register+0x148/0x398
[    0.603839]  __platform_driver_register+0xd4/0x108
[    0.604288]  arm_mpam_driver_init+0x64/0x78
[    0.604680]  do_one_initcall+0xbc/0x488
[    0.605039]  kernel_init_freeable+0x604/0x6f8
[    0.605447]  kernel_init+0x18/0x130
[    0.605775]  ret_from_fork+0x10/0x18
[    0.606130]
[    0.606274] The buggy address belongs to the variable:
[    0.606754]  arm_mpam_of_device_ids+0xc8/0x380
[    0.607168]
[    0.607314] Memory state around the buggy address:
[    0.607762]  ffff2000099a8180: 00 00 00 fa fa fa fa fa 00 00 00 00 00 00 00 00
[    0.608429]  ffff2000099a8200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.609095] >ffff2000099a8280: 00 fa fa fa fa fa fa fa 05 fa fa fa fa fa fa fa
[    0.609760]                       ^
[    0.610101]  ffff2000099a8300: 00 00 07 fa fa fa fa fa 00 04 fa fa fa fa fa fa
[    0.610771]  ffff2000099a8380: 00 00 00 06 fa fa fa fa 00 01 fa fa fa fa fa fa

The arm_mpam_of_device_ids array has no end item, so the array access
might be out of bounds. When enable the KASAN config, the out of bounds
call trace occured. The add empty end item for arm_mpam_of_device_ids
array to fix this issue.

Fixes: b45bdb5a ("arm64/mpam: add device tree support for mpam initialization")
Signed-off-by: NXingang Wang <wangxingang5@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

7ea0c3fe

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功