提交 · b0c7dc7382b23c4be1867185f87a7e0c8e6bd234 · openeuler / Kernel

09 3月, 2022 18 次提交

fuse: rename FUSE_WRITE_KILL_PRIV to FUSE_WRITE_KILL_SUIDGID · b0c7dc73

由 Miklos Szeredi 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.11-rc1
commit 10c52c84
category: perf
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SIR8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10c52c84e3f4872689a64ac7666b34d67e630691

--------------------------------

Kernel has:
ATTR_KILL_PRIV -> clear "security.capability"
ATTR_KILL_SUID -> clear S_ISUID
ATTR_KILL_SGID -> clear S_ISGID if executable

Fuse has:
FUSE_WRITE_KILL_PRIV -> clear S_ISUID and S_ISGID if executable

So FUSE_WRITE_KILL_PRIV implies the complement of ATTR_KILL_PRIV, which is
somewhat confusing.  Also PRIV implies all privileges, including
"security.capability".

Change the name to FUSE_WRITE_KILL_SUIDGID and make FUSE_WRITE_KILL_PRIV an
alias to perserve API compatibility
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b0c7dc73

fuse: introduce the notion of FUSE_HANDLE_KILLPRIV_V2 · a077e800

由 Vivek Goyal 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.11-rc1
commit 63f9909f
category: perf
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SIR8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=63f9909ff602082597849f684655e93336c50b11

--------------------------------

We already have FUSE_HANDLE_KILLPRIV flag that says that file server will
remove suid/sgid/caps on truncate/chown/write. But that's little different
from what Linux VFS implements.

To be consistent with Linux VFS behavior what we want is.

- caps are always cleared on chown/write/truncate
- suid is always cleared on chown, while for truncate/write it is cleared
only if caller does not have CAP_FSETID.
- sgid is always cleared on chown, while for truncate/write it is cleared
only if caller does not have CAP_FSETID as well as file has group execute
permission.

As previous flag did not provide above semantics. Implement a V2 of the
protocol with above said constraints.

Server does not know if caller has CAP_FSETID or not. So for the case
of write()/truncate(), client will send information in special flag to
indicate whether to kill priviliges or not. These changes are in subsequent
patches.

FUSE_HANDLE_KILLPRIV_V2 relies on WRITE being sent to server to clear
suid/sgid/security.capability. But with ->writeback_cache, WRITES are
cached in guest. So it is not recommended to use FUSE_HANDLE_KILLPRIV_V2
and writeback_cache together. Though it probably might be good enough
for lot of use cases.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a077e800

xfs: remove dead stale buf unpin handling code · 55619cc7

由 Brian Foster 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.13-rc4
commit e53d3aa0
category: bugfix
bugzilla: 185862 https://gitee.com/openeuler/kernel/issues/I4KIAO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e53d3aa0b605c49d780e1b2fd0b49dba4154f32b

--------------------------------

This code goes back to a time when transaction commits wrote
directly to iclogs. The associated log items were pinned, written to
the log, and then "uncommitted" if some part of the log write had
failed. This uncommit sequence called an ->iop_unpin_remove()
handler that was eventually folded into ->iop_unpin() via the remove
parameter. The log subsystem has since changed significantly in that
transactions commit to the CIL instead of direct to iclogs, though
log items must still be aborted in the event of an eventual log I/O
error. However, the context for a log item abort is now asynchronous
from transaction commit, which means the committing transaction has
been freed by this point in time and the transaction uncommit
sequence of events is no longer relevant.

Further, since stale buffers remain locked at transaction commit
through unpin, we can be certain that the buffer is not associated
with any transaction when the unpin callback executes. Remove this
unused hunk of code and replace it with an assertion that the buffer
is disassociated from transaction context.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
Reviewed-by: NLihong Kou <koulihong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

55619cc7

xfs: hold buffer across unpin and potential shutdown processing · baa590d3

由 Brian Foster 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.13-rc4
commit 84d8949e
category: bugfix
bugzilla: 185862 https://gitee.com/openeuler/kernel/issues/I4KIAO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=84d8949e770745b16a7e8a68dcb1d0f3687bdee9

--------------------------------

The special processing used to simulate a buffer I/O failure on fs
shutdown has a difficult to reproduce race that can result in a use
after free of the associated buffer. Consider a buffer that has been
committed to the on-disk log and thus is AIL resident. The buffer
lands on the writeback delwri queue, but is subsequently locked,
committed and pinned by another transaction before submitted for
I/O. At this point, the buffer is stuck on the delwri queue as it
cannot be submitted for I/O until it is unpinned. A log checkpoint
I/O failure occurs sometime later, which aborts the bli. The unpin
handler is called with the aborted log item, drops the bli reference
count, the pin count, and falls into the I/O failure simulation
path.

The potential problem here is that once the pin count falls to zero
in ->iop_unpin(), xfsaild is free to retry delwri submission of the
buffer at any time, before the unpin handler even completes. If
delwri queue submission wins the race to the buffer lock, it
observes the shutdown state and simulates the I/O failure itself.
This releases both the bli and delwri queue holds and frees the
buffer while xfs_buf_item_unpin() sits on xfs_buf_lock() waiting to
run through the same failure sequence. This problem is rare and
requires many iterations of fstest generic/019 (which simulates disk
I/O failures) to reproduce.

To avoid this problem, grab a hold on the buffer before the log item
is unpinned if the associated item has been aborted and will require
a simulated I/O failure. The hold is already required for the
simulated I/O failure, so the ordering simply guarantees the unpin
handler access to the buffer before it is unpinned and thus
processed by the AIL. This particular ordering is required so long
as the AIL does not acquire a reference on the bli, which is the
long term solution to this problem.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
Reviewed-by: NLihong Kou <koulihong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

baa590d3

xfs: fix an ABBA deadlock in xfs_rename · 644dc676

由 Darrick J. Wong 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.11-rc4
commit 6da1b4b1
category: bugfix
bugzilla: 185867 https://gitee.com/openeuler/kernel/issues/I4KIAO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6da1b4b1ab36d80a3994fd4811c8381de10af604

--------------------------------

When overlayfs is running on top of xfs and the user unlinks a file in
the overlay, overlayfs will create a whiteout inode and ask xfs to
"rename" the whiteout file atop the one being unlinked.  If the file
being unlinked loses its one nlink, we then have to put the inode on the
unlinked list.

This requires us to grab the AGI buffer of the whiteout inode to take it
off the unlinked list (which is where whiteouts are created) and to grab
the AGI buffer of the file being deleted.  If the whiteout was created
in a higher numbered AG than the file being deleted, we'll lock the AGIs
in the wrong order and deadlock.

Therefore, grab all the AGI locks we think we'll need ahead of time, and
in order of increasing AG number per the locking rules.
Reported-by: Nwenli xie <wlxie7296@gmail.com>
Fixes: 93597ae8 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()")
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
Reviewed-by: NLihong Kou <koulihong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

644dc676

Revert "efi/libstub: arm64: Relax 2M alignment again for relocatable kernels" · daa17ae4

由 Yang Yingliang 提交于 3月 09, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4VSGH
CVE: NA

--------------------------------

This reverts commit c6d2a109.

I got the following messages when booting kernel:

  EFI stub: Booting Linux Kernel...
  EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled
  EFI stub: Using DTB from configuration table
  EFI stub: Exiting boot services and installing virtual address map...

  ...

  [ 0.000000] CPU features: kernel page table isolation forced ON by KASLR
  [ 0.000000] CPU features: detected: Kernel page table isolation (KPTI)
  [ 3.393380] KASLR disabled due to lack of seed

KPTI is forced on by KASLR, but in fact KASLR is not enabled, it's
because kaslr_offset() returns non-zero in kaslr_requires_kpti().

To avoid this problem, when efi kaslr is disabled, make image
MIN_KIMG_ALIGN align which is used to get KASLR offset in
primary_entry(), so kaslr_offset() will returns 0.
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

daa17ae4

crypto: hisilicon/qm - fix memset during queues clearing · 8b0fe7a4

由 Kai Ye 提交于 3月 09, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ

----------------------------------------------------------------------

Due to that extra page addr is used as a qp error flag when the device
resetting. So it not should to clear this qp flag in userspace.
Signed-off-by: NKai Ye <yekai13@huawei.com>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8b0fe7a4

crypto: hisilicon/qm - modify device status check parameter · 6e44486d

由 Weili Qian 提交于 3月 09, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ

----------------------------------------------------------------------

If the device master ooo is blocked, there is
no need to empty the queue. Only the PF can obtain the
status of the device. If the VF runs on the host,
the device status can be obtained by PF.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6e44486d

crypto: hisilicon/qm - remove redundant cache writeback · ed3b55da

由 Weili Qian 提交于 3月 09, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ

----------------------------------------------------------------------

Currently, the memory of the queue's sqe is freed when
the driver is removed, not the put queue. Therefore, it is only
necessary to write back the data in the hardware cache to
memory before removing the driver.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ed3b55da

crypto: hisilicon/qm - disable queue when 'CQ' error · 39d29e2f

由 Weili Qian 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 696645d2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ
CVE: NA

--------------------------------

If the hardware reports the 'CQ' overflow or 'CQE' error by the abnormal
interrupt, disable the queue and stop tasks send to hardware.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

39d29e2f

crypto: hisilicon/qm - reset function if event queue overflows · 7d8d68c8

由 Weili Qian 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 95f0b6d5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ
CVE: NA

--------------------------------

If the hardware reports the event queue overflow by the abnormal interrupt,
the driver needs to reset the function and re-enable the event queue
interrupt and abnormal interrupt.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7d8d68c8

crypto: hisilicon/qm - use request_threaded_irq instead · 21085e5a

由 Weili Qian 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.17-rc1
commit a0a9486b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ
CVE: NA

--------------------------------

The abnormal interrupt method needs to be changed, and the changed method
needs to be locked in order to maintain atomicity. Therefore,
replace request_irq() with request_threaded_irq().
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

21085e5a

crypto: hisilicon/qm - modify the handling method after abnormal interruption · ad7d5d93

由 Weili Qian 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 145dcedd
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ
CVE: NA

--------------------------------

After processing an interrupt event and the interrupt function is
enabled by writing the QM_DOORBELL_CMD_AEQ register, the hardware
may generate new interrupt events due to processing other user's task
when the subsequent interrupt events have not been processed. The new
interrupt event will disrupt the current normal processing flow and
cause other problems.

Therefore, the operation of writing the QM_DOORBELL_CMD_AEQ doorbell
register needs to be placed after all interrupt events processing
are completed.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ad7d5d93

crypto: hisilicon/qm - code movement · 18f4481c

由 Weili Qian 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 9ee401ea
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ
CVE: NA

--------------------------------

This patch does not change any code, just code movement. Preparing for
next patch.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

18f4481c

crypto: hisilicon/qm - remove unnecessary device memory reset · 6ecd5db9

由 Weili Qian 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.17-rc1
commit f123e66d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ
CVE: NA

--------------------------------

The internal memory of the device needs to be reset only when
the device is globally initialized. Other scenarios, such as
function reset, do not need to perform reset.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6ecd5db9

crypto: hisilicon/qm - fix deadlock for remove driver · 9a3b668e

由 Yang Shen 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.16-rc1
commit fc6c01f0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3EG
CVE: NA

--------------------------------

When remove the driver and executing the task occur at the same time,
the following deadlock will be triggered:

Chain exists of:
    sva_lock --> uacce_mutex --> &qm->qps_lock
    Possible unsafe locking scenario:
		CPU0                    CPU1
		----                    ----
	lock(&qm->qps_lock);
					lock(uacce_mutex);
					lock(&qm->qps_lock);
	lock(sva_lock);

And the lock 'qps_lock' is used to protect qp. Therefore, it's reasonable
cycle is to continue until the qp memory is released. So move the release
lock infront of 'uacce_remove'.
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9a3b668e

crypto: hisilicon/sec - add some comments for soft fallback · 09741c21

由 Kai Ye 提交于 3月 09, 2022

mainline inclusion
from mainline-crypto-master
commit e764d81d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WC3O
CVE: NA

--------------------------------

Modify the print of information that might lead to user misunderstanding.
Currently only XTS mode need the fallback tfm when using 192bit key.
Others algs not need soft fallback tfm. So others algs can return
directly.
Signed-off-by: NKai Ye <yekai13@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

09741c21

crypto: hisilicon/sec - fix the aead software fallback for engine · 7abba214

由 Kai Ye 提交于 3月 09, 2022

mainline inclusion
from mainline-crypto-master
commit 0a2a464f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W4WU
CVE: NA

--------------------------------

Due to the subreq pointer misuse the private context memory. The aead
soft crypto occasionally casues the OS panic as setting the 64K page.
Here is fix it.

Fixes: 6c46a329 ("crypto: hisilicon/sec - add fallback tfm...")
Signed-off-by: NKai Ye <yekai13@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7abba214

08 3月, 2022 22 次提交

blk-throttle: Set BIO_THROTTLED when bio has been throttled · 2c7b5509

由 Laibin Qiu 提交于 3月 08, 2022

hulk inclusion
category: bugfix
bugzilla: 185779, https://gitee.com/openeuler/kernel/issues/I4WFIY
CVE: NA

-------------------------------------------------

1.In current process, all bio will set the BIO_THROTTLED flag
after __blk_throtl_bio().

2.If bio needs to be throttled, it will start the timer and
stop submit bio directly. Bio will submit in
blk_throtl_dispatch_work_fn() when the timer expires.But in
the current process, if bio is throttled. The BIO_THROTTLED
will be set to bio after timer start. If the bio has been
completed, it may cause use-after-free blow.

BUG: KASAN: use-after-free in blk_throtl_bio+0x12f0/0x2c70
Read of size 2 at addr ffff88801b8902d4 by task fio/26380

 dump_stack+0x9b/0xce
 print_address_description.constprop.6+0x3e/0x60
 kasan_report.cold.9+0x22/0x3a
 blk_throtl_bio+0x12f0/0x2c70
 submit_bio_checks+0x701/0x1550
 submit_bio_noacct+0x83/0xc80
 submit_bio+0xa7/0x330
 mpage_readahead+0x380/0x500
 read_pages+0x1c1/0xbf0
 page_cache_ra_unbounded+0x471/0x6f0
 do_page_cache_ra+0xda/0x110
 ondemand_readahead+0x442/0xae0
 page_cache_async_ra+0x210/0x300
 generic_file_buffered_read+0x4d9/0x2130
 generic_file_read_iter+0x315/0x490
 blkdev_read_iter+0x113/0x1b0
 aio_read+0x2ad/0x450
 io_submit_one+0xc8e/0x1d60
 __se_sys_io_submit+0x125/0x350
 do_syscall_64+0x2d/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Allocated by task 26380:
 kasan_save_stack+0x19/0x40
 __kasan_kmalloc.constprop.2+0xc1/0xd0
 kmem_cache_alloc+0x146/0x440
 mempool_alloc+0x125/0x2f0
 bio_alloc_bioset+0x353/0x590
 mpage_alloc+0x3b/0x240
 do_mpage_readpage+0xddf/0x1ef0
 mpage_readahead+0x264/0x500
 read_pages+0x1c1/0xbf0
 page_cache_ra_unbounded+0x471/0x6f0
 do_page_cache_ra+0xda/0x110
 ondemand_readahead+0x442/0xae0
 page_cache_async_ra+0x210/0x300
 generic_file_buffered_read+0x4d9/0x2130
 generic_file_read_iter+0x315/0x490
 blkdev_read_iter+0x113/0x1b0
 aio_read+0x2ad/0x450
 io_submit_one+0xc8e/0x1d60
 __se_sys_io_submit+0x125/0x350
 do_syscall_64+0x2d/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 0:
 kasan_save_stack+0x19/0x40
 kasan_set_track+0x1c/0x30
 kasan_set_free_info+0x1b/0x30
 __kasan_slab_free+0x111/0x160
 kmem_cache_free+0x94/0x460
 mempool_free+0xd6/0x320
 bio_free+0xe0/0x130
 bio_put+0xab/0xe0
 bio_endio+0x3a6/0x5d0
 blk_update_request+0x590/0x1370
 scsi_end_request+0x7d/0x400
 scsi_io_completion+0x1aa/0xe50
 scsi_softirq_done+0x11b/0x240
 blk_mq_complete_request+0xd4/0x120
 scsi_mq_done+0xf0/0x200
 virtscsi_vq_done+0xbc/0x150
 vring_interrupt+0x179/0x390
 __handle_irq_event_percpu+0xf7/0x490
 handle_irq_event_percpu+0x7b/0x160
 handle_irq_event+0xcc/0x170
 handle_edge_irq+0x215/0xb20
 common_interrupt+0x60/0x120
 asm_common_interrupt+0x1e/0x40

Fix this by move BIO_THROTTLED set into the queue_lock.
Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2c7b5509

bpf, selftests: Add ringbuf memory type confusion test · 6b1836a9

由 Daniel Borkmann 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 37c8d480
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WT90
CVE: CVE-2021-4204

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=37c8d4807d1b8b521b30310dce97f6695dc2c2c6

--------------------------------

Add two tests, one which asserts that ring buffer memory can be passed to
other helpers for populating its entry area, and another one where verifier
rejects different type of memory passed to bpf_ringbuf_submit().
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6b1836a9

bpf/selftests: Test bpf_d_path on rdonly_mem. · 76f59b4d

由 Hao Luo 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 44bab87d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WT90
CVE: CVE-2021-4204

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=44bab87d8ca6f0544a9f8fc97bdf33aa5b3c899e

--------------------------------

The second parameter of bpf_d_path() can only accept writable
memories. Rdonly_mem obtained from bpf_per_cpu_ptr() can not
be passed into bpf_d_path for modification. This patch adds
a selftest to verify this behavior.
Signed-off-by: NHao Luo <haoluo@google.com>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220106205525.2116218-1-haoluo@google.comSigned-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

76f59b4d

bpf, selftests: Add various ringbuf tests with invalid offset · 017bd71f

由 Daniel Borkmann 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 722e4db3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WT90
CVE: CVE-2021-4204

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=722e4db3ae0d52b2e3801280afbe19cf2d188e91

--------------------------------

Assert that the verifier is rejecting invalid offsets on the ringbuf entries:

  # ./test_verifier | grep ring
  #947/u ringbuf: invalid reservation offset 1 OK
  #947/p ringbuf: invalid reservation offset 1 OK
  #948/u ringbuf: invalid reservation offset 2 OK
  #948/p ringbuf: invalid reservation offset 2 OK
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

017bd71f

selftests/bpf: Add verifier test for PTR_TO_MEM spill · c1046afc

由 Gilad Reti 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.11-rc5
commit 4237e9f4
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WT90
CVE: CVE-2021-4204

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4237e9f4a96228ccc8a7abe5e4b30834323cd353

--------------------------------

Add a test to check that the verifier is able to recognize spilling of
PTR_TO_MEM registers, by reserving a ringbuf buffer, forcing the spill
of a pointer holding the buffer address to the stack, filling it back
in from the stack and writing to the memory area pointed by it.

The patch was partially contributed by CyberArk Software, Inc.
Signed-off-by: NGilad Reti <gilad.reti@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Acked-by: NKP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210113053810.13518-2-gilad.reti@gmail.comSigned-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c1046afc

bpf: Fix ringbuf memory type confusion when passing to helpers · e1601ef1

由 Daniel Borkmann 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit a672b2e3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WT90
CVE: CVE-2021-4204

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a672b2e36a648afb04ad3bda93b6bda947a479a5

--------------------------------

The bpf_ringbuf_submit() and bpf_ringbuf_discard() have ARG_PTR_TO_ALLOC_MEM
in their bpf_func_proto definition as their first argument, and thus both expect
the result from a prior bpf_ringbuf_reserve() call which has a return type of
RET_PTR_TO_ALLOC_MEM_OR_NULL.

While the non-NULL memory from bpf_ringbuf_reserve() can be passed to other
helpers, the two sinks (bpf_ringbuf_submit(), bpf_ringbuf_discard()) right now
only enforce a register type of PTR_TO_MEM.

This can lead to potential type confusion since it would allow other PTR_TO_MEM
memory to be passed into the two sinks which did not come from bpf_ringbuf_reserve().

Add a new MEM_ALLOC composable type attribute for PTR_TO_MEM, and enforce that:

 - bpf_ringbuf_reserve() returns NULL or PTR_TO_MEM | MEM_ALLOC
 - bpf_ringbuf_submit() and bpf_ringbuf_discard() only take PTR_TO_MEM | MEM_ALLOC
   but not plain PTR_TO_MEM arguments via ARG_PTR_TO_ALLOC_MEM
 - however, other helpers might treat PTR_TO_MEM | MEM_ALLOC as plain PTR_TO_MEM
   to populate the memory area when they use ARG_PTR_TO_{UNINIT_,}MEM in their
   func proto description

Fixes: 457f4436 ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e1601ef1

bpf: Fix out of bounds access for ringbuf helpers · 65c1d0b1

由 Daniel Borkmann 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 64620e0a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WT90
CVE: CVE-2021-4204

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=64620e0a1e712a778095bd35cbb277dc2259281f

--------------------------------

Both bpf_ringbuf_submit() and bpf_ringbuf_discard() have ARG_PTR_TO_ALLOC_MEM
in their bpf_func_proto definition as their first argument. They both expect
the result from a prior bpf_ringbuf_reserve() call which has a return type of
RET_PTR_TO_ALLOC_MEM_OR_NULL.

Meaning, after a NULL check in the code, the verifier will promote the register
type in the non-NULL branch to a PTR_TO_MEM and in the NULL branch to a known
zero scalar. Generally, pointer arithmetic on PTR_TO_MEM is allowed, so the
latter could have an offset.

The ARG_PTR_TO_ALLOC_MEM expects a PTR_TO_MEM register type. However, the non-
zero result from bpf_ringbuf_reserve() must be fed into either bpf_ringbuf_submit()
or bpf_ringbuf_discard() but with the original offset given it will then read
out the struct bpf_ringbuf_hdr mapping.

The verifier missed to enforce a zero offset, so that out of bounds access
can be triggered which could be used to escalate privileges if unprivileged
BPF was enabled (disabled by default in kernel).

Fixes: 457f4436 ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: <tr3e.wang@gmail.com> (SecCoder Security Lab)
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

65c1d0b1

bpf: Generally fix helper register offset check · e13a7ac9

由 Daniel Borkmann 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 6788ab23
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WT90
CVE: CVE-2021-4204

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6788ab23508bddb0a9d88e104284922cb2c22b77

--------------------------------

Right now the assertion on check_ptr_off_reg() is only enforced for register
types PTR_TO_CTX (and open coded also for PTR_TO_BTF_ID), however, this is
insufficient since many other PTR_TO_* register types such as PTR_TO_FUNC do
not handle/expect register offsets when passed to helper functions.

Given this can slip-through easily when adding new types, make this an explicit
allow-list and reject all other current and future types by default if this is
encountered.

Also, extend check_ptr_off_reg() to handle PTR_TO_BTF_ID as well instead of
duplicating it. For PTR_TO_BTF_ID, reg->off is used for BTF to match expected
BTF ids if struct offset is used. This part still needs to be allowed, but the
dynamic off from the tnum must be rejected.

Fixes: 69c087ba ("bpf: Add bpf_for_each_map_elem() helper")
Fixes: eaa6bcb7 ("bpf: Introduce bpf_per_cpu_ptr()")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e13a7ac9

bpf: Mark PTR_TO_FUNC register initially with zero offset · cafeab3c

由 Daniel Borkmann 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit d400a6cf
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WT90
CVE: CVE-2021-4204

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d400a6cf1c8a57cdf10f35220ead3284320d85ff

--------------------------------

Similar as with other pointer types where we use ldimm64, clear the register
content to zero first, and then populate the PTR_TO_FUNC type and subprogno
number. Currently this is not done, and leads to reuse of stale register
tracking data.

Given for special ldimm64 cases we always clear the register offset, make it
common for all cases, so it won't be forgotten in future.

Fixes: 69c087ba ("bpf: Add bpf_for_each_map_elem() helper")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cafeab3c

bpf: Generalize check_ctx_reg for reuse with other types · c3f64b90

由 Daniel Borkmann 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit be80a1d3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WT90
CVE: CVE-2021-4204

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be80a1d3f9dbe5aee79a325964f7037fe2d92f30

--------------------------------

Generalize the check_ctx_reg() helper function into a more generic named one
so that it can be reused for other register types as well to check whether
their offset is non-zero. No functional change.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Conflicts:
	include/linux/bpf_verifier.h
	kernel/bpf/btf.c
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c3f64b90

bpf/selftests: Test PTR_TO_RDONLY_MEM · d952048c

由 Hao Luo 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 9497c458
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WRPV
CVE: CVE-2022-0500

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9497c458c10b049438ef6e6ddda898edbc3ec6a8

--------------------------------

This test verifies that a ksym of non-struct can not be directly
updated.
Signed-off-by: NHao Luo <haoluo@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20211217003152.48334-10-haoluo@google.com
Conflicts:
	tools/testing/selftests/bpf/prog_tests/ksyms_btf.c
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d952048c

bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem. · 68e07357

由 Hao Luo 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 216e3cd2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WRPV
CVE: CVE-2022-0500

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=216e3cd2f28dbbf1fe86848e0e29e6693b9f0a20

--------------------------------

Some helper functions may modify its arguments, for example,
bpf_d_path, bpf_get_stack etc. Previously, their argument types
were marked as ARG_PTR_TO_MEM, which is compatible with read-only
mem types, such as PTR_TO_RDONLY_BUF. Therefore it's legitimate,
but technically incorrect, to modify a read-only memory by passing
it into one of such helper functions.

This patch tags the bpf_args compatible with immutable memory with
MEM_RDONLY flag. The arguments that don't have this flag will be
only compatible with mutable memory types, preventing the helper
from modifying a read-only memory. The bpf_args that have
MEM_RDONLY are compatible with both mutable memory and immutable
memory.
Signed-off-by: NHao Luo <haoluo@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211217003152.48334-9-haoluo@google.com
Conflicts:
	kernel/bpf/btf.c
	kernel/bpf/helpers.c
	kernel/bpf/syscall.c
	kernel/trace/bpf_trace.c
	net/core/filter.c
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

68e07357

bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM. · 1e1f28a5

由 Hao Luo 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 34d3a78c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WRPV
CVE: CVE-2022-0500

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34d3a78c681e8e7844b43d1a2f4671a04249c821

--------------------------------

Tag the return type of {per, this}_cpu_ptr with RDONLY_MEM. The
returned value of this pair of helpers is kernel object, which
can not be updated by bpf programs. Previously these two helpers
return PTR_OT_MEM for kernel objects of scalar type, which allows
one to directly modify the memory. Now with RDONLY_MEM tagging,
the verifier will reject programs that write into RDONLY_MEM.

Fixes: 63d9b80d ("bpf: Introducte bpf_this_cpu_ptr()")
Fixes: eaa6bcb7 ("bpf: Introduce bpf_per_cpu_ptr()")
Fixes: 4976b718 ("bpf: Introduce pseudo_btf_id")
Signed-off-by: NHao Luo <haoluo@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211217003152.48334-8-haoluo@google.comSigned-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1e1f28a5

bpf: Convert PTR_TO_MEM_OR_NULL to composable types. · ac5104f8

由 Hao Luo 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit cf9f2f8d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WRPV
CVE: CVE-2022-0500

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cf9f2f8d62eca810afbd1ee6cc0800202b000e57

--------------------------------

Remove PTR_TO_MEM_OR_NULL and replace it with PTR_TO_MEM combined with
flag PTR_MAYBE_NULL.
Signed-off-by: NHao Luo <haoluo@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211217003152.48334-7-haoluo@google.com
Conflicts:
	kernel/bpf/btf.c
	kernel/bpf/verifier.c
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ac5104f8

bpf: Introduce MEM_RDONLY flag · bceb8062

由 Hao Luo 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 20b2aff4
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WRPV
CVE: CVE-2022-0500

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=20b2aff4bc15bda809f994761d5719827d66c0b4

--------------------------------

This patch introduce a flag MEM_RDONLY to tag a reg value
pointing to read-only memory. It makes the following changes:

1. PTR_TO_RDWR_BUF -> PTR_TO_BUF
2. PTR_TO_RDONLY_BUF -> PTR_TO_BUF | MEM_RDONLY
Signed-off-by: NHao Luo <haoluo@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211217003152.48334-6-haoluo@google.com
Conflicts:
	kernel/bpf/verifier.c
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bceb8062

bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL · 3dae5870

由 Hao Luo 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit c25b2ae1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WRPV
CVE: CVE-2022-0500

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c25b2ae136039ffa820c26138ed4a5e5f3ab3841

--------------------------------

We have introduced a new type to make bpf_reg composable, by
allocating bits in the type to represent flags.

One of the flags is PTR_MAYBE_NULL which indicates a pointer
may be NULL. This patch switches the qualified reg_types to
use this flag. The reg_types changed in this patch include:

1. PTR_TO_MAP_VALUE_OR_NULL
2. PTR_TO_SOCKET_OR_NULL
3. PTR_TO_SOCK_COMMON_OR_NULL
4. PTR_TO_TCP_SOCK_OR_NULL
5. PTR_TO_BTF_ID_OR_NULL
6. PTR_TO_MEM_OR_NULL
7. PTR_TO_RDONLY_BUF_OR_NULL
8. PTR_TO_RDWR_BUF_OR_NULL
Signed-off-by: NHao Luo <haoluo@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/r/20211217003152.48334-5-haoluo@google.com
Conflicts:
	include/linux/bpf.h
	include/linux/bpf_verifier.h
	kernel/bpf/verifier.c
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3dae5870

bpf: Replace RET_XXX_OR_NULL with RET_XXX | PTR_MAYBE_NULL · ce6bd518

由 Hao Luo 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 3c480732
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WRPV
CVE: CVE-2022-0500

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c4807322660d4290ac9062c034aed6b87243861

--------------------------------

We have introduced a new type to make bpf_ret composable, by
reserving high bits to represent flags.

One of the flag is PTR_MAYBE_NULL, which indicates a pointer
may be NULL. When applying this flag to ret_types, it means
the returned value could be a NULL pointer. This patch
switches the qualified arg_types to use this flag.
The ret_types changed in this patch include:

1. RET_PTR_TO_MAP_VALUE_OR_NULL
2. RET_PTR_TO_SOCKET_OR_NULL
3. RET_PTR_TO_TCP_SOCK_OR_NULL
4. RET_PTR_TO_SOCK_COMMON_OR_NULL
5. RET_PTR_TO_ALLOC_MEM_OR_NULL
6. RET_PTR_TO_MEM_OR_BTF_ID_OR_NULL
7. RET_PTR_TO_BTF_ID_OR_NULL

This patch doesn't eliminate the use of these names, instead
it makes them aliases to 'RET_PTR_TO_XXX | PTR_MAYBE_NULL'.
Signed-off-by: NHao Luo <haoluo@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211217003152.48334-4-haoluo@google.com
Conflicts:
	kernel/bpf/verifier.c
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ce6bd518

bpf: Replace ARG_XXX_OR_NULL with ARG_XXX | PTR_MAYBE_NULL · 1810f9e0

由 Hao Luo 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 48946bd6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WRPV
CVE: CVE-2022-0500

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=48946bd6a5d695c50b34546864b79c1f910a33c1

--------------------------------

We have introduced a new type to make bpf_arg composable, by
reserving high bits of bpf_arg to represent flags of a type.

One of the flags is PTR_MAYBE_NULL which indicates a pointer
may be NULL. When applying this flag to an arg_type, it means
the arg can take NULL pointer. This patch switches the
qualified arg_types to use this flag. The arg_types changed
in this patch include:

1. ARG_PTR_TO_MAP_VALUE_OR_NULL
2. ARG_PTR_TO_MEM_OR_NULL
3. ARG_PTR_TO_CTX_OR_NULL
4. ARG_PTR_TO_SOCKET_OR_NULL
5. ARG_PTR_TO_ALLOC_MEM_OR_NULL
6. ARG_PTR_TO_STACK_OR_NULL

This patch does not eliminate the use of these arg_types, instead
it makes them an alias to the 'ARG_XXX | PTR_MAYBE_NULL'.
Signed-off-by: NHao Luo <haoluo@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211217003152.48334-3-haoluo@google.com
Conflicts:
	include/linux/bpf.h
	kernel/bpf/verifier.c
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1810f9e0

bpf: Introduce composable reg, ret and arg types. · e3efa53e

由 Hao Luo 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit d639b9d1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WRPV
CVE: CVE-2022-0500

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d639b9d13a39cf15639cbe6e8b2c43eb60148a73

--------------------------------

There are some common properties shared between bpf reg, ret and arg
values. For instance, a value may be a NULL pointer, or a pointer to
a read-only memory. Previously, to express these properties, enumeration
was used. For example, in order to test whether a reg value can be NULL,
reg_type_may_be_null() simply enumerates all types that are possibly
NULL. The problem of this approach is that it's not scalable and causes
a lot of duplication. These properties can be combined, for example, a
type could be either MAYBE_NULL or RDONLY, or both.

This patch series rewrites the layout of reg_type, arg_type and
ret_type, so that common properties can be extracted and represented as
composable flag. For example, one can write

ARG_PTR_TO_MEM | PTR_MAYBE_NULL

which is equivalent to the previous

ARG_PTR_TO_MEM_OR_NULL

The type ARG_PTR_TO_MEM are called "base type" in this patch. Base
types can be extended with flags. A flag occupies the higher bits while
base types sits in the lower bits.

This patch in particular sets up a set of macro for this purpose. The
following patches will rewrite arg_types, ret_types and reg_types
respectively.
Signed-off-by: NHao Luo <haoluo@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211217003152.48334-2-haoluo@google.com
Conflicts:
include/linux/bpf.h
include/linux/bpf_verifier.h
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e3efa53e

bpf: Fix out of bounds access from invalid *_or_null type verification · 026193f7

由 Daniel Borkmann 提交于 3月 08, 2022

stable inclusion
from stable-v5.10.92
commit 35ab8c9085b0af847df7fac9571ccd26d9f0f513
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WRPV

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=35ab8c9085b0af847df7fac9571ccd26d9f0f513

--------------------------------

[ no upstream commit given implicitly fixed through the larger refactoring
  in c25b2ae1 ]

While auditing some other code, I noticed missing checks inside the pointer
arithmetic simulation, more specifically, adjust_ptr_min_max_vals(). Several
*_OR_NULL types are not rejected whereas they are _required_ to be rejected
given the expectation is that they get promoted into a 'real' pointer type
for the success case, that is, after an explicit != NULL check.

One case which stands out and is accessible from unprivileged (iff enabled
given disabled by default) is BPF ring buffer. From crafting a PoC, the NULL
check can be bypassed through an offset, and its id marking will then lead
to promotion of mem_or_null to a mem type.

bpf_ringbuf_reserve() helper can trigger this case through passing of reserved
flags, for example.

  func#0 @0
  0: R1=ctx(id=0,off=0,imm=0) R10=fp0
  0: (7a) *(u64 *)(r10 -8) = 0
  1: R1=ctx(id=0,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm
  1: (18) r1 = 0x0
  3: R1_w=map_ptr(id=0,off=0,ks=0,vs=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm
  3: (b7) r2 = 8
  4: R1_w=map_ptr(id=0,off=0,ks=0,vs=0,imm=0) R2_w=invP8 R10=fp0 fp-8_w=mmmmmmmm
  4: (b7) r3 = 0
  5: R1_w=map_ptr(id=0,off=0,ks=0,vs=0,imm=0) R2_w=invP8 R3_w=invP0 R10=fp0 fp-8_w=mmmmmmmm
  5: (85) call bpf_ringbuf_reserve#131
  6: R0_w=mem_or_null(id=2,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
  6: (bf) r6 = r0
  7: R0_w=mem_or_null(id=2,ref_obj_id=2,off=0,imm=0) R6_w=mem_or_null(id=2,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
  7: (07) r0 += 1
  8: R0_w=mem_or_null(id=2,ref_obj_id=2,off=1,imm=0) R6_w=mem_or_null(id=2,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
  8: (15) if r0 == 0x0 goto pc+4
   R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
  9: R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
  9: (62) *(u32 *)(r6 +0) = 0
   R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
  10: R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
  10: (bf) r1 = r6
  11: R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R1_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
  11: (b7) r2 = 0
  12: R0_w=mem(id=0,ref_obj_id=0,off=0,imm=0) R1_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R2_w=invP0 R6_w=mem(id=0,ref_obj_id=2,off=0,imm=0) R10=fp0 fp-8_w=mmmmmmmm refs=2
  12: (85) call bpf_ringbuf_submit#132
  13: R6=invP(id=0) R10=fp0 fp-8=mmmmmmmm
  13: (b7) r0 = 0
  14: R0_w=invP0 R6=invP(id=0) R10=fp0 fp-8=mmmmmmmm
  14: (95) exit

  from 8 to 13: safe
  processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 0
  OK

All three commits, that is b121b341 ("bpf: Add PTR_TO_BTF_ID_OR_NULL support"),
457f4436 ("bpf: Implement BPF ring buffer and verifier support for it"), and the
afbf21dc ("bpf: Support readonly/readwrite buffers in verifier") suffer the same
cause and their *_OR_NULL type pendants must be rejected in adjust_ptr_min_max_vals().

Make the test more robust by reusing reg_type_may_be_null() helper such that we catch
all *_OR_NULL types we have today and in future.

Note that pointer arithmetic on PTR_TO_BTF_ID, PTR_TO_RDONLY_BUF, and PTR_TO_RDWR_BUF
is generally allowed.

Fixes: b121b341 ("bpf: Add PTR_TO_BTF_ID_OR_NULL support")
Fixes: 457f4436 ("bpf: Implement BPF ring buffer and verifier support for it")
Fixes: afbf21dc ("bpf: Support readonly/readwrite buffers in verifier")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

026193f7

blk-mq: decrease pending_queues when it expires · 28796f6d

由 Yu Kuai 提交于 3月 08, 2022

hulk inclusion
category: performance
bugzilla: https://gitee.com/openeuler/kernel/issues/I4S8DW

---------------------------

If pending_queues is increased once, it will only be decreased when
nr_active is zero, and that will lead to the under-utilization of
host tags because pending_queues is non-zero and the available
tags for the queue will be max(host tags / active_queues, 4)
instead of the needed tags of the queue.

Fix it by adding an expiration time for the increasement of pending_queues,
and decrease it when it expires, so pending_queues will be decreased
to zero if there is no tag allocation failure, and the available tags
for the queue will be the whole host tags.
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

28796f6d

blk-mq: add debugfs to print information for blk_mq_tag_set · ff4b4572

由 Yu Kuai 提交于 3月 08, 2022

hulk inclusion
category: performance
bugzilla: https://gitee.com/openeuler/kernel/issues/I4S8DW

---------------------------
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ff4b4572

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功