提交 · 417ce49322c30dd85515b577ad2f9da83f53b78d · openeuler / Kernel

08 2月, 2021 40 次提交

io_uring: fix skipping disabling sqo on exec · 417ce493

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 186725a80c4e931b6fe31b94d66c989d5f2354c1
bugzilla: 47876

--------------------------------

[ Upstream commit 0b5cd6c3 ]

If there are no requests at the time __io_uring_task_cancel() is called,
tctx_inflight() returns zero and and it terminates not getting a chance
to go through __io_uring_files_cancel() and do
io_disable_sqo_submit(). And we absolutely want them disabled by the
time cancellation ends.

Cc: stable@vger.kernel.org # 5.5+
Reported-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

417ce493

io_uring: fix uring_flush in exit_files() warning · 8940e1a8

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 54b4c4f9aba9e5d1ef6877f42a57895b189107c9
bugzilla: 47876

--------------------------------

[ Upstream commit 4325cb49 ]

WARNING: CPU: 1 PID: 11100 at fs/io_uring.c:9096
	io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096
RIP: 0010:io_uring_flush+0x326/0x3a0 fs/io_uring.c:9096
Call Trace:
 filp_close+0xb4/0x170 fs/open.c:1280
 close_files fs/file.c:401 [inline]
 put_files_struct fs/file.c:416 [inline]
 put_files_struct+0x1cc/0x350 fs/file.c:413
 exit_files+0x7e/0xa0 fs/file.c:433
 do_exit+0xc22/0x2ae0 kernel/exit.c:820
 do_group_exit+0x125/0x310 kernel/exit.c:922
 get_signal+0x3e9/0x20a0 kernel/signal.c:2770
 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
 handle_signal_work kernel/entry/common.c:147 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:201
 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
 syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

An SQPOLL ring creator task may have gotten rid of its file note during
exit and called io_disable_sqo_submit(), but the io_uring is still left
referenced through fdtable, which will be put during close_files() and
cause a false positive warning.

First split the warning into two for more clarity when is hit, and the
add sqo_dead check to handle the described case.

Cc: stable@vger.kernel.org # 5.5+
Reported-by: syzbot+a32b546d58dde07875a1@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8940e1a8

io_uring: fix false positive sqo warning on flush · 79ee5666

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 0682759126bc761c325325ca809ae99c93fda2a0
bugzilla: 47876

--------------------------------

[ Upstream commit 6b393a1f ]

WARNING: CPU: 1 PID: 9094 at fs/io_uring.c:8884
	io_disable_sqo_submit+0x106/0x130 fs/io_uring.c:8884
Call Trace:
 io_uring_flush+0x28b/0x3a0 fs/io_uring.c:9099
 filp_close+0xb4/0x170 fs/open.c:1280
 close_fd+0x5c/0x80 fs/file.c:626
 __do_sys_close fs/open.c:1299 [inline]
 __se_sys_close fs/open.c:1297 [inline]
 __x64_sys_close+0x2f/0xa0 fs/open.c:1297
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

io_uring's final close() may be triggered by any task not only the
creator. It's well handled by io_uring_flush() including SQPOLL case,
though a warning in io_disable_sqo_submit() will fallaciously fire by
moving this warning out to the only call site that matters.

Cc: stable@vger.kernel.org # 5.5+
Reported-by: syzbot+2f5d1785dc624932da78@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

79ee5666

io_uring: do sqo disable on install_fd error · 43eac3ed

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 8cb6f4da831bc51145aee3a923f03114121dea6b
bugzilla: 47876

--------------------------------

[ Upstream commit 06585c49 ]

WARNING: CPU: 0 PID: 8494 at fs/io_uring.c:8717
	io_ring_ctx_wait_and_kill+0x4f2/0x600 fs/io_uring.c:8717
Call Trace:
 io_uring_release+0x3e/0x50 fs/io_uring.c:8759
 __fput+0x283/0x920 fs/file_table.c:280
 task_work_run+0xdd/0x190 kernel/task_work.c:140
 tracehook_notify_resume include/linux/tracehook.h:189 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
 exit_to_user_mode_prepare+0x249/0x250 kernel/entry/common.c:201
 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
 syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:302
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

failed io_uring_install_fd() is a special case, we don't do
io_ring_ctx_wait_and_kill() directly but defer it to fput, though still
need to io_disable_sqo_submit() before.

note: it doesn't fix any real problem, just a warning. That's because
sqring won't be available to the userspace in this case and so SQPOLL
won't submit anything.

Cc: stable@vger.kernel.org # 5.5+
Reported-by: syzbot+9c9c35374c0ecac06516@syzkaller.appspotmail.com
Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

43eac3ed

io_uring: fix null-deref in io_disable_sqo_submit · 8173ba66

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 0e3562e3b2aeb4a6aa4615185a8f59c51cade61b
bugzilla: 47876

--------------------------------

[ Upstream commit b4411616 ]

general protection fault, probably for non-canonical address
	0xdffffc0000000022: 0000 [#1] KASAN: null-ptr-deref
	in range [0x0000000000000110-0x0000000000000117]
RIP: 0010:io_ring_set_wakeup_flag fs/io_uring.c:6929 [inline]
RIP: 0010:io_disable_sqo_submit+0xdb/0x130 fs/io_uring.c:8891
Call Trace:
 io_uring_create fs/io_uring.c:9711 [inline]
 io_uring_setup+0x12b1/0x38e0 fs/io_uring.c:9739
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

io_disable_sqo_submit() might be called before user rings were
allocated, don't do io_ring_set_wakeup_flag() in those cases.

Cc: stable@vger.kernel.org # 5.5+
Reported-by: syzbot+ab412638aeb652ded540@syzkaller.appspotmail.com
Fixes: d9d05217 ("io_uring: stop SQPOLL submit on creator's death")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8173ba66

io_uring: stop SQPOLL submit on creator's death · 63238c74

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit a63d9157571b52f7339d6db4c2ab7bc3bfe527c0
bugzilla: 47876

--------------------------------

[ Upstream commit d9d05217 ]

When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want
its internals like ->files and ->mm to be poked by the SQPOLL task, it
have never been nice and recently got racy. That can happen when the
owner undergoes destruction and SQPOLL tasks tries to submit new
requests in parallel, and so calls io_sq_thread_acquire*().

That patch halts SQPOLL submissions when sqo_task dies by introducing
sqo_dead flag. Once set, the SQPOLL task must not do any submission,
which is synchronised by uring_lock as well as the new flag.

The tricky part is to make sure that disabling always happens, that
means either the ring is discovered by creator's do_exit() -> cancel,
or if the final close() happens before it's done by the creator. The
last is guaranteed by the fact that for SQPOLL the creator task and only
it holds exactly one file note, so either it pins up to do_exit() or
removed by the creator on the final put in flush. (see comments in
uring_flush() around file->f_count == 2).

One more place that can trigger io_sq_thread_acquire_*() is
__io_req_task_submit(). Shoot off requests on sqo_dead there, even
though actually we don't need to. That's because cancellation of
sqo_task should wait for the request before going any further.

note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the
caller would enter the ring to get an error, but it still doesn't
guarantee that the flag won't be cleared.

note 2: if final __userspace__ close happens not from the creator
task, the file note will pin the ring until the task dies.

Cc: stable@vger.kernel.org # 5.5+
Fixed: b1b6b5a3 ("kernel/io_uring: cancel io_uring before task works")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

63238c74

io_uring: add warn_once for io_uring_flush() · db9b4d68

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit da67631a33c342528245817cc61e36dd945665b0
bugzilla: 47876

--------------------------------

[ Upstream commit 6b5733eb]

files_cancel() should cancel all relevant requests and drop file notes,
so we should never have file notes after that, including on-exit fput
and flush. Add a WARN_ONCE to be sure.

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

db9b4d68

io_uring: inline io_uring_attempt_task_drop() · 8157becf

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 18f31594ee52ed1f364e376767fb839935fd899c
bugzilla: 47876

--------------------------------

[ Upstream commit 4f793dc4 ]

A simple preparation change inlining io_uring_attempt_task_drop() into
io_uring_flush().

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8157becf

kernel/io_uring: cancel io_uring before task works · 56ea41ef

由 Pavel Begunkov 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 7bf3fb6243a3b153ab1854b331ec19d67a4878bb
bugzilla: 47876

--------------------------------

[ Upstream commit b1b6b5a3 ]

For cancelling io_uring requests it needs either to be able to run
currently enqueued task_works or having it shut down by that moment.
Otherwise io_uring_cancel_files() may be waiting for requests that won't
ever complete.

Go with the first way and do cancellations before setting PF_EXITING and
so before putting the task_work infrastructure into a transition state
where task_work_run() would better not be called.

Cc: stable@vger.kernel.org # 5.5+
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

56ea41ef

iwlwifi: dbg: Don't touch the tlv data · 2506fa14

由 Takashi Iwai 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 08a922a6fdf8c3eaea7f6a3beb13c6eed75994dc
bugzilla: 47876

--------------------------------

commit a6616bc9 upstream.

The commit ba8f6f4a ("iwlwifi: dbg: add dumping special device
memory") added a termination of name string just to be sure, and this
seems causing a regression, a GPF triggered at firmware loading.
Basically we shouldn't modify the firmware data that may be provided
as read-only.

This patch drops the code that caused the regression and keep the tlv
data as is.

Fixes: ba8f6f4a ("iwlwifi: dbg: add dumping special device memory")
BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1180344
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=210733
Cc: stable@vger.kernel.org
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Acked-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20210112132449.22243-2-tiwai@suse.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2506fa14

RDMA/vmw_pvrdma: Fix network_hdr_type reported in WC · 4b3855f0

由 Bryan Tan 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 1fa2fa7932f9f2e695453ed7160d557eced07bb4
bugzilla: 47876

--------------------------------

commit 9f206f73 upstream.

The PVRDMA device HW interface defines network_hdr_type according to an
old definition of the internal kernel rdma_network_type enum that has
since changed, resulting in the wrong rdma_network_type being reported.

Fix this by explicitly defining the enum used by the PVRDMA device and
adding a function to convert the pvrdma_network_type to rdma_network_type
enum.

Cc: stable@vger.kernel.org # 5.10+
Fixes: 1c15b4f2 ("RDMA/core: Modify enum ib_gid_type and enum rdma_network_type")
Link: https://lore.kernel.org/r/1611026189-17943-1-git-send-email-bryantan@vmware.comReviewed-by: NAdit Ranadive <aditr@vmware.com>
Signed-off-by: NBryan Tan <bryantan@vmware.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4b3855f0

media: v4l2-subdev.h: BIT() is not available in userspace · fd02776b

由 Hans Verkuil 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 77727dfda786ce1d333ebc9c8777d821fe86466a
bugzilla: 47876

--------------------------------

commit a53e3c18 upstream.

The BIT macro is not available in userspace, so replace BIT(0) by
0x00000001.
Signed-off-by: NHans Verkuil <hverkuil-cisco@xs4all.nl>
Fixes: 6446ec6c ("media: v4l2-subdev: add VIDIOC_SUBDEV_QUERYCAP ioctl")
Cc: <stable@vger.kernel.org>
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fd02776b

media: Revert "media: videobuf2: Fix length check for single plane dmabuf queueing" · b1987804

由 Naushir Patuck 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 94fb5ff34544897982f7773868fa10530f13bb2d
bugzilla: 47876

--------------------------------

commit 95e9295d upstream.

The updated length check for dmabuf types broke existing usage in v4l2
userland clients.

Fixes: 961d3b27 ("media: videobuf2: Fix length check for single plane dmabuf queueing")
Cc: stable@vger.kernel.org
Signed-off-by: NNaushir Patuck <naush@raspberrypi.com>
Tested-by: NKieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: NKieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NHans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b1987804

HID: multitouch: Apply MT_QUIRK_CONFIDENCE quirk for multi-input devices · f67712fa

由 Kai-Heng Feng 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 0fa0a05b4089d136b9d2c1a32fc21320af05629b
bugzilla: 47876

--------------------------------

commit 794c6133 upstream.

Palm ejection stops working on some Elan and Synaptics touchpad after
commit 40d5bb87 ("HID: multitouch: enable multi-input as a quirk for
some devices").

The commit changes the mt_class from MT_CLS_WIN_8 to
MT_CLS_WIN_8_FORCE_MULTI_INPUT, so MT_QUIRK_CONFIDENCE isn't applied
anymore.

So also apply the quirk since MT_CLS_WIN_8_FORCE_MULTI_INPUT is
essentially MT_CLS_WIN_8.

Fixes: 40d5bb87 ("HID: multitouch: enable multi-input as a quirk for some devices")
Cc: stable@vger.kernel.org
Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: NBenjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f67712fa

HID: wacom: Correct NULL dereference on AES pen proximity · 86d382f5

由 Jason Gerecke 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit a7f6d4ab44344d71a64028f1dbdfbb1e8781572b
bugzilla: 47876

--------------------------------

commit 179e8e47 upstream.

The recent commit to fix a memory leak introduced an inadvertant NULL
pointer dereference. The `wacom_wac->pen_fifo` variable was never
intialized, resuling in a crash whenever functions tried to use it.
Since the FIFO is only used by AES pens (to buffer events from pen
proximity until the hardware reports the pen serial number) this would
have been easily overlooked without testing an AES device.

This patch converts `wacom_wac->pen_fifo` over to a pointer (since the
call to `devres_alloc` allocates memory for us) and ensures that we assign
it to point to the allocated and initalized `pen_fifo` before the function
returns.

Link: https://github.com/linuxwacom/input-wacom/issues/230
Fixes: 37309f47 ("HID: wacom: Fix memory leakage caused by kfifo_alloc")
CC: stable@vger.kernel.org # v4.19+
Signed-off-by: NJason Gerecke <jason.gerecke@wacom.com>
Tested-by: NPing Cheng <ping.cheng@wacom.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

86d382f5

futex: Handle faults correctly for PI futexes · 9f3063e9

由 Thomas Gleixner 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit e843e4f782582a10e5077b917948d1df943070f6
bugzilla: 47876

--------------------------------

commit 34b1a1ce upstream

fixup_pi_state_owner() tries to ensure that the state of the rtmutex,
pi_state and the user space value related to the PI futex are consistent
before returning to user space. In case that the user space value update
faults and the fault cannot be resolved by faulting the page in via
fault_in_user_writeable() the function returns with -EFAULT and leaves
the rtmutex and pi_state owner state inconsistent.

A subsequent futex_unlock_pi() operates on the inconsistent pi_state and
releases the rtmutex despite not owning it which can corrupt the RB tree of
the rtmutex and cause a subsequent kernel stack use after free.

It was suggested to loop forever in fixup_pi_state_owner() if the fault
cannot be resolved, but that results in runaway tasks which is especially
undesired when the problem happens due to a programming error and not due
to malice.

As the user space value cannot be fixed up, the proper solution is to make
the rtmutex and the pi_state consistent so both have the same owner. This
leaves the user space value out of sync. Any subsequent operation on the
futex will fail because the 10th rule of PI futexes (pi_state owner and
user space value are consistent) has been violated.

As a consequence this removes the inept attempts of 'fixing' the situation
in case that the current task owns the rtmutex when returning with an
unresolvable fault by unlocking the rtmutex which left pi_state::owner and
rtmutex::owner out of sync in a different and only slightly less dangerous
way.

Fixes: 1b7558e4 ("futexes: fix fault handling in futex_lock_pi")
Reported-by: gzobqq@gmail.com
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9f3063e9

futex: Simplify fixup_pi_state_owner() · c26d6da1

由 Thomas Gleixner 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit abc4dd792f8db54470a888ca825166fbba59ee78
bugzilla: 47876

--------------------------------

commit f2dac39d upstream

Too many gotos already and an upcoming fix would make it even more
unreadable.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c26d6da1

futex: Use pi_state_update_owner() in put_pi_state() · 219b2906

由 Thomas Gleixner 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit a597f12e971c3859fdcc503a25008b37a891f043
bugzilla: 47876

--------------------------------

commit 6ccc84f9 upstream

No point in open coding it. This way it gains the extra sanity checks.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

219b2906

rtmutex: Remove unused argument from rt_mutex_proxy_unlock() · 181d90b5

由 Thomas Gleixner 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 6d28ac502f9a0cf30d7ca2eeeefbf8a98c01fe82
bugzilla: 47876

--------------------------------

commit 2156ac19 upstream

Nothing uses the argument. Remove it as preparation to use
pi_state_update_owner().
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

181d90b5

futex: Provide and use pi_state_update_owner() · c1795954

由 Thomas Gleixner 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 5b2c5a9561c24f3bcdcb85bb897bdc3aa3375c49
bugzilla: 47876

--------------------------------

commit c5cade20 upstream

Updating pi_state::owner is done at several places with the same
code. Provide a function for it and use that at the obvious places.

This is also a preparation for a bug fix to avoid yet another copy of the
same code or alternatively introducing a completely unpenetratable mess of
gotos.
Originally-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c1795954

futex: Replace pointless printk in fixup_owner() · 9e9a9183

由 Thomas Gleixner 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 5ede8ee2cb16f4dd066a37b38ad46576dbf20d45
bugzilla: 47876

--------------------------------

commit 04b79c55 upstream

If that unexpected case of inconsistent arguments ever happens then the
futex state is left completely inconsistent and the printk is not really
helpful. Replace it with a warning and make the state consistent.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9e9a9183

futex: Ensure the correct return value from futex_lock_pi() · d2f31b9e

由 Thomas Gleixner 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit ab5e9a320e444fda64e5912f0e0f4f02021569ea
bugzilla: 47876

--------------------------------

commit 12bb3f7f upstream

In case that futex_lock_pi() was aborted by a signal or a timeout and the
task returned without acquiring the rtmutex, but is the designated owner of
the futex due to a concurrent futex_unlock_pi() fixup_owner() is invoked to
establish consistent state. In that case it invokes fixup_pi_state_owner()
which in turn tries to acquire the rtmutex again. If that succeeds then it
does not propagate this success to fixup_owner() and futex_lock_pi()
returns -EINTR or -ETIMEOUT despite having the futex locked.

Return success from fixup_pi_state_owner() in all cases where the current
task owns the rtmutex and therefore the futex and propagate it correctly
through fixup_owner(). Fixup the other callsite which does not expect a
positive return value.

Fixes: c1e2f0ea ("futex: Avoid violating the 10th rule of futex")
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d2f31b9e

Revert "mm/slub: fix a memory leak in sysfs_slab_add()" · a35991a9

由 Wang Hai 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit bf5eb7d21ab01c12c35df05dddd15f9f2ad5ba71
bugzilla: 47876

--------------------------------

commit 757fed1d upstream.

This reverts commit dde3c6b7.

syzbot report a double-free bug. The following case can cause this bug.

 - mm/slab_common.c: create_cache(): if the __kmem_cache_create() fails,
   it does:

	out_free_cache:
		kmem_cache_free(kmem_cache, s);

 - but __kmem_cache_create() - at least for slub() - will have done

	sysfs_slab_add(s)
		-> sysfs_create_group() .. fails ..
		-> kobject_del(&s->kobj); .. which frees s ...

We can't remove the kmem_cache_free() in create_cache(), because other
error cases of __kmem_cache_create() do not free this.

So, revert the commit dde3c6b7 ("mm/slub: fix a memory leak in
sysfs_slab_add()") to fix this.

Reported-by: syzbot+d0bd96b4696c1ef67991@syzkaller.appspotmail.com
Fixes: dde3c6b7 ("mm/slub: fix a memory leak in sysfs_slab_add()")
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Signed-off-by: NWang Hai <wanghai38@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a35991a9

gpio: mvebu: fix pwm .get_state period calculation · 7bea8744

由 Baruch Siach 提交于 2月 01, 2021

stable inclusion
from stable-5.10.12
commit 43f2e6077f441d681f0337ab91f7c4c2d4c62761
bugzilla: 47876

--------------------------------

commit e73b0101 upstream.

The period is the sum of on and off values. That is, calculate period as

  ($on + $off) / clkrate

instead of

  $off / clkrate - $on / clkrate

that makes no sense.
Reported-by: NRussell King <linux@armlinux.org.uk>
Reviewed-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Fixes: 757642f9 ("gpio: mvebu: Add limited PWM support")
Signed-off-by: NBaruch Siach <baruch@tkos.co.il>
Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
[baruch: backport to kernels <= v5.10]
Reviewed-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: NBaruch Siach <baruch@tkos.co.il>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7bea8744

PCI/AER: increments pci bus reference count in aer-inject process · 74fcb978

由 Xiongfeng Wang 提交于 1月 29, 2021

hulk inclusion
category: feature
bugzilla: 47454
CVE: NA

-------------------------------------------------------------------------

When I test 'aer-inject' with the following procedures:
1. inject a fatal error into a PCI device
2. remove the parent device by sysfs
3. execute command 'rmmod aer-inject'

I came across the following use-after-free.

[  297.581524] ==================================================================
[  297.581543] BUG: KASAN: use-after-free in pci_bus_set_ops+0xb4/0xb8
[  297.581545] Read of size 8 at addr ffff802edbde80e0 by task rmmod/21839

[  297.581552] CPU: 119 PID: 21839 Comm: rmmod Kdump: loaded Not tainted 4.19.36 #1
[  297.581554] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.05 09/18/2019
[  297.581556] Call trace:
[  297.581561]  dump_backtrace+0x0/0x360
[  297.581563]  show_stack+0x24/0x30
[  297.581569]  dump_stack+0xd8/0x104
[  297.581576]  print_address_description+0x68/0x278
[  297.581578]  kasan_report+0x204/0x330
[  297.581580]  __asan_report_load8_noabort+0x30/0x40
[  297.581582]  pci_bus_set_ops+0xb4/0xb8
[  297.581591]  aer_inject_exit+0x198/0x334 [aer_inject]
[  297.581595]  __arm64_sys_delete_module+0x310/0x490
[  297.581601]  el0_svc_common+0xfc/0x278
[  297.581603]  el0_svc_handler+0x50/0xc0
[  297.581605]  el0_svc+0x8/0xc

[  297.581608] Allocated by task 1:
[  297.581611]  kasan_kmalloc+0xe0/0x190
[  297.581614]  kmem_cache_alloc_trace+0x104/0x218
[  297.581616]  pci_alloc_bus+0x50/0x2e0
[  297.581618]  pci_add_new_bus+0xa8/0xe08
[  297.581620]  pci_scan_bridge_extend+0x884/0xb28
[  297.581623]  pci_scan_child_bus_extend+0x350/0x628
[  297.581625]  pci_scan_child_bus+0x24/0x30
[  297.581627]  pci_scan_bridge_extend+0x3b8/0xb28
[  297.581629]  pci_scan_child_bus_extend+0x350/0x628
[  297.581631]  pci_scan_child_bus+0x24/0x30
[  297.581635]  acpi_pci_root_create+0x558/0x888
[  297.581640]  pci_acpi_scan_root+0x198/0x330
[  297.581641]  acpi_pci_root_add+0x7bc/0xbb0
[  297.581646]  acpi_bus_attach+0x2f4/0x728
[  297.581647]  acpi_bus_attach+0x1b0/0x728
[  297.581649]  acpi_bus_attach+0x1b0/0x728
[  297.581651]  acpi_bus_scan+0xa0/0x110
[  297.581657]  acpi_scan_init+0x20c/0x500
[  297.581659]  acpi_init+0x54c/0x5d4
[  297.581661]  do_one_initcall+0xbc/0x480
[  297.581665]  kernel_init_freeable+0x5fc/0x6ac
[  297.581670]  kernel_init+0x18/0x128
[  297.581671]  ret_from_fork+0x10/0x18

[  297.581673] Freed by task 19270:
[  297.581675]  __kasan_slab_free+0x120/0x228
[  297.581677]  kasan_slab_free+0x10/0x18
[  297.581678]  kfree+0x80/0x1f8
[  297.581680]  release_pcibus_dev+0x54/0x68
[  297.581686]  device_release+0xd4/0x1c0
[  297.581689]  kobject_put+0x12c/0x400
[  297.581691]  device_unregister+0x30/0xc0
[  297.581693]  pci_remove_bus+0xe8/0x1c0
[  297.581695]  pci_remove_bus_device+0xd0/0x2f0
[  297.581697]  pci_stop_and_remove_bus_device_locked+0x2c/0x40
[  297.581701]  remove_store+0x1b8/0x1d0
[  297.581703]  dev_attr_store+0x60/0x80
[  297.581708]  sysfs_kf_write+0x104/0x170
[  297.581710]  kernfs_fop_write+0x23c/0x430
[  297.581713]  __vfs_write+0xec/0x4e0
[  297.581714]  vfs_write+0x12c/0x3d0
[  297.581715]  ksys_write+0xd0/0x190
[  297.581716]  __arm64_sys_write+0x70/0xa0
[  297.581718]  el0_svc_common+0xfc/0x278
[  297.581720]  el0_svc_handler+0x50/0xc0
[  297.581721]  el0_svc+0x8/0xc

[  297.581724] The buggy address belongs to the object at ffff802edbde8000
                which belongs to the cache kmalloc-2048 of size 2048
[  297.581726] The buggy address is located 224 bytes inside of
                2048-byte region [ffff802edbde8000, ffff802edbde8800)
[  297.581727] The buggy address belongs to the page:
[  297.581730] page:ffff7e00bb6f7a00 count:1 mapcount:0 mapping:ffff8026de810780 index:0x0 compound_mapcount: 0
[  297.591520] flags: 0x2ffffe0000008100(slab|head)
[  297.596121] raw: 2ffffe0000008100 ffff7e00bb6f5008 ffff7e00bb6ff608 ffff8026de810780
[  297.596123] raw: 0000000000000000 00000000000f000f 00000001ffffffff 0000000000000000
[  297.596124] page dumped because: kasan: bad access detected

[  297.596126] Memory state around the buggy address:
[  297.596128]  ffff802edbde7f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  297.596129]  ffff802edbde8000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  297.596131] >ffff802edbde8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  297.596132]                                                        ^
[  297.596133]  ffff802edbde8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  297.596135]  ffff802edbde8180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  297.596135] ==================================================================

It is because when we unload the module and restore the member 'pci_ops'
of 'pci_bus', the 'pci_bus' has been freed. This patch increments the
reference count of 'pci_bus' when we modify its member 'pci_ops' and
decrements the reference count after we have restored its member.
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>

Conflicts:
	drivers/pci/pcie/aer/aer_inject.c
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

74fcb978

PCI: add a member in 'struct pci_bus' to record the original 'pci_ops' · 8dd667f5

由 Xiongfeng Wang 提交于 1月 29, 2021

hulk inclusion
category: bugfix
bugzilla: 47453
CVE: NA

-------------------------------------------------------------------------

When I test 'aer-inject' with the following procedures:
1. inject a fatal error into a upstream PCI bridge
2. remove the upstream bridge by sysfs
3. rescan the PCI tree by 'echo 1 > /sys/bus/pci/rescan'
4. execute command 'rmmod aer-inject'
5. remove the upstream bridge by sysfs again

I came across the following Oops.

[  799.713238] Internal error: Oops: 96000007 [#1] SMP
[  799.718099] Process bash (pid: 10683, stack limit = 0x00000000125a3b1b)
[  799.724686] CPU: 108 PID: 10683 Comm: bash Kdump: loaded Not tainted 4.19.36 #2
[  799.731962] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 1.05 09/18/2019
[  799.739325] pstate: 40400009 (nZcv daif +PAN -UAO)
[  799.744104] pc : pci_remove_bus+0xc0/0x1c0
[  799.748182] lr : pci_remove_bus+0x94/0x1c0
[  799.752260] sp : ffffa02e335df940
[  799.755560] x29: ffffa02e335df940 x28: ffff2000088216a8
[  799.760849] x27: 1ffff405c66bbfbc x26: ffff20000a9518c0
[  799.766139] x25: ffffa02dea6ec418 x24: 1ffff405bd4dd883
[  799.771427] x23: ffffa02e72576628 x22: 1ffff405ce4aecc0
[  799.776715] x21: ffffa02e72576608 x20: ffff200002e75080
[  799.782003] x19: ffffa02e72576600 x18: 0000000000000000
[  799.787291] x17: 0000000000000000 x16: 0000000000000000
[  799.792578] x15: 0000000000000001 x14: dfff200000000000
[  799.797866] x13: ffff20000a6dfaf0 x12: 0000000000000000
[  799.803154] x11: 1fffe4000159b217 x10: ffff04000159b217
[  799.808442] x9 : dfff200000000000 x8 : ffff20000acd90bf
[  799.813730] x7 : 0000000000000000 x6 : 0000000000000000
[  799.819017] x5 : 0000000000000001 x4 : 0000000000000000
[  799.824306] x3 : 1ffff405dbe62603 x2 : 1fffe400005cea11
[  799.829593] x1 : dfff200000000000 x0 : ffff200002e75088
[  799.834882] Call trace:
[  799.837323]  pci_remove_bus+0xc0/0x1c0
[  799.841056]  pci_remove_bus_device+0xd0/0x2f0
[  799.845392]  pci_stop_and_remove_bus_device_locked+0x2c/0x40
[  799.851028]  remove_store+0x1b8/0x1d0
[  799.854679]  dev_attr_store+0x60/0x80
[  799.858330]  sysfs_kf_write+0x104/0x170
[  799.862149]  kernfs_fop_write+0x23c/0x430
[  799.866143]  __vfs_write+0xec/0x4e0
[  799.869615]  vfs_write+0x12c/0x3d0
[  799.873001]  ksys_write+0xd0/0x190
[  799.876389]  __arm64_sys_write+0x70/0xa0
[  799.880298]  el0_svc_common+0xfc/0x278
[  799.884030]  el0_svc_handler+0x50/0xc0
[  799.887764]  el0_svc+0x8/0xc
[  799.890634] Code: d2c40001 f2fbffe1 91002280 d343fc02 (38e16841)
[  799.896700] kernel fault(0x1) notification starting on CPU 108

It is because when we alloc a new bus in rescanning process, the
'pci_ops' of the newly allocced 'pci_bus' is inherited from its parent
pci bus. Whereas, the 'pci_ops' of the parent bus may be changed to
'aer_inj_pci_ops' in 'aer_inject()'. When we unload the module
'aer_inject', we only restore the 'pci_ops' for the pci bus of the
error-injected device and the root port in 'aer_inject_exit'. After we
have unloaded the module, the 'pci_ops' of the newly allocced pci bus is
still 'aer_inj_pci_ops'. When we access it, an Oops happened.

This patch add a member 'backup_ops' in 'struct pci_bus' to record the
original 'ops'. When we alloc a child pci bus, we assign the
'backup_ops' of the parent bus to the 'ops' of the child bus.

Maybe the best way is to not modify the 'pci_ops' in 'struct pci_bus',
but this will refactor the 'aer_inject' framework a lot. I haven't found
a better way to handle it.
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>

Conflicts:
	drivers/pci/probe.c
	include/linux/pci.h
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8dd667f5

sched, rt: fix isolated CPUs leaving task_group indefinitely throttled · e82ef69c

由 Cheng Jian 提交于 1月 30, 2021

hulk inclusion
category: bugfix
Bugzilla: 47618
CVE: NA

----------------------------------------

e221d028 ("sched,rt: fix isolated CPUs leaving
root_task_group indefinitely throttled") only fixes
isolated CPUs leaving root_task_group, and not fix
all other ordinary task_groutask_group.

In some scenarios where we need attach task bind to
isolated CPUs in task_group, the same problem will occur.

Isolated CPUs and non-isolate CPUs are not in the same
root_domain. and the hrtimer only check the cpumask of
this_rq's root_domain. so when the handler of RT_BANDWIDTH
hrtimer is running on the isolated CPU, it will leaved
the non-isolated CPUs indefinitely throttled. Because
bandwidth period hrtimer can't resume them. and viceversa.

Let the bandwidth timer check all the rt_rq of cpu_online_mask.
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nxiu jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e82ef69c

cgroup: wait for cgroup destruction to complete when umount · f7ab6d4d

由 Zefan Li 提交于 1月 30, 2021

hulk inclusion
category: bugfix
bugzilla: 47617
CVE: N/A

-------------------------------------------------

Since commit 3c606d35 ("cgroup: prevent mount hang due to memory
controller lifetime"), a cgroup root won't be destroyed if there are any
child cgroups, dead or alive.

This introduced a small regression.

    # cat test.sh
    mount -t cgroup -o cpuset xxx /cgroup
    mkdir /cgroup/tmp
    rmdir /cgroup/tmp
    umount /cgroup

After running this script, you'll probably find the cgroup hierarchy
is still active.

    # cat /proc/cgroups | grep cpuset
    #subsys_name    hierarchy       num_cgroups     enabled
    cpuset  1       1       1
    ...

Fix this by waiting for a while when umount. Now run the script again
and you'll see:

    # cat /proc/cgroups | grep cpuset
    #subsys_name    hierarchy       num_cgroups     enabled
    cpuset  0       1       1
    ...

Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: NZefan Li <lizefan@huawei.com>
Tested-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NHanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: NChangchun Yu <yuchangchun1@huawei.com>
Reviewed-by: NZefan Li <lizefan@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>

Conflicts:
	kernel/cgroup/cgroup.c
Reviewed-by: Nxiu jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f7ab6d4d

cgroup: check if cgroup root is alive in cgroupstats_show() · 8f1f7f47

由 Zefan Li 提交于 1月 30, 2021

euler inclusion
category: bugfix
bugzilla: 47460
CVE: N/A

-------------------------------------------------

If a cgroup root is dying, show its hierarchy_id and num_cgroups
as 0.
Signed-off-by: NZefan Li <lizefan@huawei.com>
Tested-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NHanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: NChangchun Yu <yuchangchun1@huawei.com>
Reviewed-by: NZefan Li <lizefan@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nxiu jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8f1f7f47

mtd:avoid blktrans_open/release race and avoid insmod ftl.ko deadlock · d4e8b34c

由 Gu Zheng 提交于 1月 28, 2021

hulk inclusion
category: bugfix
Bugzilla: 47615
CVE: N/A

-----------------------------------------------

add the new functions mtd_table_mutex_lock/unlock to instead the
mutex_lock(&mtd_table_mutex)/mutex_unlock(&mtd_table_mutex),this
modification can avoid the deadlock when insmod ftl.ko

the deadlock is caused by the commit 857814ee65db ("mtd: fix: avoid
race condition when accessing mtd->usecount")

the process is as follows:
init_ftl
register_mtd_blktrans
mutex_lock(&mtd_table_mutex) //mtd_table_mutex locked
ftl_add_mtd
add_mtd_blktrans_dev
device_add_disk
register_disk
blkdev_get
__blkdev_get
blktrans_open
mutex_lock(&mtd_table_mutex) //dead lock

so we add the mtd_table_mutex_owner to record current process.
if the lock is locked before , it can jump the lock where will deadlock.
it solved the above issue,also can prevent some mtd_table_mutex
deadlock undiscovered.
Signed-off-by: NGu Zheng <guzheng1@huawei.com>
Acked-by: NMiao Xie <miaoxie@huawei.com>
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NHou Tao <houta1@huawei.com>

conflict:
        drivers/mtd/mtdcore.c
        drivers/mtd/mtdcore.h
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d4e8b34c

jffs2: move jffs2_init_inode_info() just after allocating inode · 3f9d6b86

由 zhangyi (F) 提交于 1月 22, 2021

hulk inclusion
category: bugfix
bugzilla: 47443
CVE: NA
---------------------------

After commit 4fdcfab5 ("jffs2: fix use-after-free on symlink
traversal"), it expose a freeing uninitialized memory problem due to
this commit move the operaion of freeing f->target to
jffs2_i_callback(), which may not be initialized in some error path of
allocating jffs2 inode (eg: jffs2_iget()->iget_locked()->
destroy_inode()->..->jffs2_i_callback()->kfree(f->target)).

Fix this by initialize the jffs2_inode_info just after allocating it.
Reported-by: NGuohua Zhong <zhongguohua1@huawei.com>
Reported-by: NHuaijie Yi <yihuaijie@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
[backport from hulk-4.4]
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3f9d6b86

jffs2: protect no-raw-node-ref check of inocache by erase_completion_lock · a7d48da7

由 Hou Tao 提交于 1月 22, 2021

euler inclusion
category: bugfix
bugzilla: 47446
CVE: NA
--------------------------------------------------

In jffs2_do_clear_inode(), we will check whether or not there is any
jffs2_raw_node_ref associated with the current inocache. If there
is no raw-node-ref, the inocache could be freed. And if there are
still some jffs2_raw_node_ref linked in inocache->nodes, the inocache
could not be freed and its free will be decided by
jffs2_remove_node_refs_from_ino_list().

However there is a race between jffs2_do_clear_inode() and
jffs2_remove_node_refs_from_ino_list() as shown in the following
scenario:

CPU 0                   CPU 1
in sys_unlink()         in jffs2_garbage_collect_pass()

jffs2_do_unlink
  f->inocache->pino_nlink = 0
  set_nlink(inode, 0)

                        // contains all raw-node-refs of the unlinked inode
                        start GC a jeb

iput_final
jffs2_evict_inode
jffs2_do_clear_inode
  acquire f->sem
    mark all refs as obsolete

                        GC complete
                        jeb is moved to erase_pending_list
                        jffs2_erase_pending_blocks
                          jffs2_free_jeb_node_refs
                            jffs2_remove_node_refs_from_ino_list

    f->inocache = INO_STATE_CHECKEDABSENT

                              // no raw-node-ref is associated with the
                              // inocache of the unlinked inode
                              ic->nodes == (void *)ic && ic->pino_nlink == 0
                                jffs2_del_ino_cache

    f->inodecache->nodes == f->nodes
      // double-free occurs
      jffs2_del_ino_cache

Double-free of inocache will lead to all kinds of weired behaviours. The
following BUG_ON is one case in which two active inodes are used the same
inocache (the freed inocache is reused by a new inode, then the inocache
is double-freed and reused by another new inode):

  jffs2: Raw node at 0x006c6000 wasn't in node lists for ino #662249
  ------------[ cut here ]------------
  kernel BUG at fs/jffs2/gc.c:645!
  invalid opcode: 0000 [#1] PREEMPT SMP
  Modules linked in: nandsim
  CPU: 0 PID: 15837 Comm: cp Not tainted 4.4.172 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  RIP: [<ffffffff816f1256>] jffs2_garbage_collect_live+0x1578/0x1593
  Call Trace:
   [<ffffffff8154b8aa>] jffs2_garbage_collect_pass+0xf6a/0x15d0
   [<ffffffff81541bbd>] jffs2_reserve_space+0x2bd/0x8a0
   [<ffffffff81546a62>] jffs2_do_create+0x52/0x480
   [<ffffffff8153c9f2>] jffs2_create+0xe2/0x2a0
   [<ffffffff8133bed7>] vfs_create+0xe7/0x220
   [<ffffffff81340ab4>] path_openat+0x11f4/0x1c00
   [<ffffffff81343635>] do_filp_open+0xa5/0x140
   [<ffffffff813288ed>] do_sys_open+0x19d/0x320
   [<ffffffff81328a96>] SyS_open+0x26/0x30
   [<ffffffff81c3f8f8>] entry_SYSCALL_64_fastpath+0x18/0x73
  ---[ end trace dd5c02f1653e8cac ]---

Fix it by protecting no-raw-node-ref check by erase_completion_lock.
And also need to move the call of jffs2_set_inocache_state() under
erase_completion_lock, else the inocache may be leaked because
jffs2_del_ino_cache() invoked by jffs2_remove_node_refs_from_ino_list()
may find the state of inocache is still INO_STATE_CHECKING and will
not free the inocache.

Link: http://lists.infradead.org/pipermail/linux-mtd/2019-February/087764.htmlSigned-off-by: NHou Tao <houtao1@huawei.com>
Reviewed-by: NWei Fang <fangwei1@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
[cherry-pick from hulk-4.4]
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a7d48da7

jffs2: handle INO_STATE_CLEARING in jffs2_do_read_inode() · 1b482fce

由 Hou Tao 提交于 1月 22, 2021

hulk inclusion
category: bugfix
bugzilla: 47446
CVE: NA
--------------------------

For inode that fails to be created midway, GC procedure may
try to GC its dnode, and in the following case BUG() will be
triggered:

CPU 0                       CPU 1
in jffs2_do_create()        in jffs2_garbage_collect_pass()

jffs2_write_dnode succeed
// for dirent
jffs2_reserve_space fail

			    inum = ic->ino
			    nlink = ic->pino_nlink (> 0)

iget_failed
  make_bad_inode
    remove_inode_hash
  iput
    jffs2_evict_inode
      jffs2_do_clear_inode
        jffs2_set_inocache_state(INO_STATE_CLEARING)

			    jffs2_gc_fetch_inode
			      jffs2_iget
			        // a new inode is created because
			        // the old inode had been unhashed
			        iget_locked
			      jffs2_do_read_inode
			        jffs2_get_ino_cache
				// assert BUG()
				f->inocache->state = INO_STATE_CLEARING

Fix it by waiting for its state changes to INO_STATE_CHECKEDABSENT.

Link: http://lists.infradead.org/pipermail/linux-mtd/2019-February/087762.htmlSigned-off-by: NHou Tao <houtao1@huawei.com>
Reviewed-by: NWei Fang <fangwei1@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
[cherry-pick from hulk-4.4]
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1b482fce

jffs2: reset pino_nlink to 0 when inode creation failed · 97a21e98

由 Hou Tao 提交于 1月 22, 2021

hulk inclusion
category: bugfix
bugzilla: 47446
CVE: NA
-------------------------------------------------

So jffs2_do_clear_inode() could mark all flash nodes used by
the inode as obsolete and GC procedure will reclaim these
flash nodes, else these flash spaces will not be reclaimable
forever.

Link: http://lists.infradead.org/pipermail/linux-mtd/2019-February/087763.htmlSigned-off-by: NHou Tao <houtao1@huawei.com>
Reviewed-by: NWei Fang <fangwei1@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

97a21e98

jffs2: GC deadlock reading a page that is used in jffs2_write_begin() · 5c27edc5

由 Kyeong Yoo 提交于 1月 22, 2021

hulk inclusion
category: bugfix
bugzilla: 47446
CVE: NA
-------------------------------------------------

GC task can deadlock in read_cache_page() because it may attempt
to release a page that is actually allocated by another task in
jffs2_write_begin().
The reason is that in jffs2_write_begin() there is a small window
a cache page is allocated for use but not set Uptodate yet.

This ends up with a deadlock between two tasks:
1) A task (e.g. file copy)
   - jffs2_write_begin() locks a cache page
   - jffs2_write_end() tries to lock "alloc_sem" from
	 jffs2_reserve_space() <-- STUCK
2) GC task (jffs2_gcd_mtd3)
   - jffs2_garbage_collect_pass() locks "alloc_sem"
   - try to lock the same cache page in read_cache_page() <-- STUCK

So to avoid this deadlock, hold "alloc_sem" in jffs2_write_begin()
while reading data in a cache page.
Signed-off-by: NKyeong Yoo <kyeong.yoo@alliedtelesis.co.nz>
Link: http://lists.infradead.org/pipermail/linux-mtd/2017-July/075581.htmlSigned-off-by: NHou Tao <houtao1@huawei.com>
Reviewed-by: NWei Fang <fangwei1@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>

[backport from hulk-4.4]
Conflicts:
	fs/jffs2/file.c
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5c27edc5

jffs2: make the overwritten xattr invisible after remount · 939350e6

由 Hou Tao 提交于 1月 22, 2021

euler inclusion
category: bugfix
bugzilla: 47447
CVE: NA
-------------------------------------------------

For xattr modification, we do not write a new jffs2_raw_xref with
delete marker into flash, so if a xattr is modified then removed,
and the old xref & xdatum are not erased by GC, after reboot or
remount, the new xattr xref will be dead but the old xattr xref
will be alive, and we will get the overwritten xattr instead of
non-existent error when reading the removed xattr.

Fix it by writing the deletion mark for xattr overwrite.

Fixes: 8a13695c ("[JFFS2][XATTR] rid unnecessary writing of delete marker.")
Signed-off-by: NHou Tao <houtao1@huawei.com>
Acked-by: NMiao Xie <miaoxie@huawei.com>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
[cherry-pick from hulk-4.4]
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

939350e6

Revert "mm: fix initialization of struct page for holes in memory layout" · b5528c77

由 Linus Torvalds 提交于 1月 28, 2021

stable inclusion
from stable-5.10.11
commit 1daa298a04181a6acb26050f06c9c367dab66836
bugzilla: 47621

--------------------------------

commit 377bf660 upstream.

This reverts commit d3921cb8.

Chris Wilson reports that it causes boot problems:

 "We have half a dozen or so different machines in CI that are silently
  failing to boot, that we believe is bisected to this patch"

and the CI team confirmed that a revert fixed the issues.

The cause is unknown for now, so let's revert it.

Link: https://lore.kernel.org/lkml/161160687463.28991.354987542182281928@build.alporthouse.com/Reported-and-tested-by: NChris Wilson <chris@chris-wilson.co.uk>
Acked-by: NMike Rapoport <rppt@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

b5528c77

mm: fix initialization of struct page for holes in memory layout · 612fe2d2

由 Mike Rapoport 提交于 1月 28, 2021

stable inclusion
from stable-5.10.11
commit f2a79851c776a5345643e0234957f98528ada168
bugzilla: 47621

--------------------------------

commit d3921cb8 upstream.

There could be struct pages that are not backed by actual physical
memory.  This can happen when the actual memory bank is not a multiple
of SECTION_SIZE or when an architecture does not register memory holes
reserved by the firmware as memblock.memory.

Such pages are currently initialized using init_unavailable_mem()
function that iterates through PFNs in holes in memblock.memory and if
there is a struct page corresponding to a PFN, the fields if this page
are set to default values and the page is marked as Reserved.

init_unavailable_mem() does not take into account zone and node the page
belongs to and sets both zone and node links in struct page to zero.

On a system that has firmware reserved holes in a zone above ZONE_DMA,
for instance in a configuration below:

	# grep -A1 E820 /proc/iomem
	7a17b000-7a216fff : Unknown E820 type
	7a217000-7bffffff : System RAM

unset zone link in struct page will trigger

	VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);

because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link
in struct page) in the same pageblock.

Update init_unavailable_mem() to use zone constraints defined by an
architecture to properly setup the zone link and use node ID of the
adjacent range in memblock.memory to set the node link.

Link: https://lkml.kernel.org/r/20210111194017.22696-3-rppt@kernel.org
Fixes: 73a6e474 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

612fe2d2

Commit ("tty: implement write_iter") converted the tty layer to... · d534b825

由 Sami Tolvanen 提交于 1月 28, 2021

Commit 9bb48c82 ("tty: implement write_iter") converted the tty layer to use write_iter. Fix the redirected_tty_write declaration also in n_tty and change the comparisons to use write_iter instead of write. also in n_tty and change the comparisons to use write_iter instead of write.

stable inclusion
from stable-5.10.11
commit 5405cb30db87e027281f3b62202c207f1d5a1163
bugzilla: 47621

--------------------------------

commit 9f12e37c upstream.

[ Also moved the declaration of redirected_tty_write() to the proper
  location in a header file. The reason for the bug was the bogus extern
  declaration in n_tty.c silently not matching the changed definition in
  tty_io.c, and because it wasn't in a shared header file, there was no
  cross-checking of the declaration.

  Sami noticed because Clang's Control Flow Integrity checking ended up
  incidentally noticing the inconsistent declaration.    - Linus ]

Fixes: 9bb48c82 ("tty: implement write_iter")
Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

d534b825

fs/pipe: allow sendfile() to pipe again · 893baf1e

由 Johannes Berg 提交于 1月 28, 2021

stable inclusion
from stable-5.10.11
commit e8572713897eb9e4bfaef90bf15d5dd00d7126fc
bugzilla: 47621

--------------------------------

commit f8ad8187 upstream.

After commit 36e2c742 ("fs: don't allow splice read/write
without explicit ops") sendfile() could no longer send data
from a real file to a pipe, breaking for example certain cgit
setups (e.g. when running behind fcgiwrap), because in this
case cgit will try to do exactly this: sendfile() to a pipe.

Fix this by using iter_file_splice_write for the splice_write
method of pipes, as suggested by Christoph.

Cc: stable@vger.kernel.org
Fixes: 36e2c742 ("fs: don't allow splice read/write without explicit ops")
Suggested-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

893baf1e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功