提交 · 5977c106fa5f2074a45c7888b45be3f2e7aa546b · openeuler / Kernel

09 3月, 2022 34 次提交

xfs: move xlog_commit_record to xfs_log_cil.c · 5977c106

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit 2ce82b72
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ce82b722de980deef809438603b7e95156d3818

-------------------------------------------------

It is only used by the CIL checkpoints, and is the counterpart to
start record formatting and writing that is already local to
xfs_log_cil.c.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5977c106

xfs: log head and tail aren't reliable during shutdown · 41769f8e

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit 2562c322
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2562c322404d81ee5fa82f3cf601a2e27393ab57

-------------------------------------------------

I'm seeing assert failures from xlog_space_left() after a shutdown
has begun that look like:

XFS (dm-0): log I/O error -5
XFS (dm-0): xfs_do_force_shutdown(0x2) called from line 1338 of file fs/xfs/xfs_log.c. Return address = xlog_ioend_work+0x64/0xc0
XFS (dm-0): Log I/O Error Detected.
XFS (dm-0): Shutting down filesystem. Please unmount the filesystem and rectify the problem(s)
XFS (dm-0): xlog_space_left: head behind tail
XFS (dm-0):   tail_cycle = 6, tail_bytes = 2706944
XFS (dm-0):   GH   cycle = 6, GH   bytes = 1633867
XFS: Assertion failed: 0, file: fs/xfs/xfs_log.c, line: 1310
------------[ cut here ]------------
Call Trace:
 xlog_space_left+0xc3/0x110
 xlog_grant_push_threshold+0x3f/0xf0
 xlog_grant_push_ail+0x12/0x40
 xfs_log_reserve+0xd2/0x270
 ? __might_sleep+0x4b/0x80
 xfs_trans_reserve+0x18b/0x260
.....

There are two things here. Firstly, after a shutdown, the log head
and tail can be out of whack as things abort and release (or don't
release) resources, so checking them for sanity doesn't make much
sense. Secondly, xfs_log_reserve() can race with shutdown and so it
can still fail like this even though it has already checked for a
log shutdown before calling xlog_grant_push_ail().

So, before ASSERT failing in xlog_space_left(), make sure we haven't
already shut down....
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

41769f8e

xfs: don't run shutdown callbacks on active iclogs · 2654e8f6

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit 502a01fa
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=502a01fac0983406e7c312764d7a03e06b3d0748

-------------------------------------------------

When the log is shutdown, it currently walks all the iclogs and runs
callbacks that are attached to the iclogs, regardless of whether the
iclog is queued for IO completion or not. This creates a problem for
contexts attaching callbacks to iclogs in that a racing shutdown can
run the callbacks even before the attaching context has finished
processing the iclog and releasing it for IO submission.

If the callback processing of the iclog frees the structure that is
attached to the iclog, then this leads to an UAF scenario that can
only be protected against by holding the icloglock from the point
callbacks are attached through to the release of the iclog. While we
currently do this, it is not practical or sustainable.

Hence we need to make shutdown processing the responsibility of the
context that holds active references to the iclog. We know that the
contexts attaching callbacks to the iclog must have active
references to the iclog, and that means they must be in either
ACTIVE or WANT_SYNC states. xlog_state_do_callback() will skip over
iclogs in these states -except- when the log is shut down.

xlog_state_do_callback() checks the state of the iclogs while
holding the icloglock, therefore the reference count/state change
that occurs in xlog_state_release_iclog() after the callbacks are
atomic w.r.t. shutdown processing.

We can't push the responsibility of callback cleanup onto the CIL
context because we can have ACTIVE iclogs that have callbacks
attached that have already been released. Hence we really need to
internalise the cleanup of callbacks into xlog_state_release_iclog()
processing.

Indeed, we already have that internalisation via:

xlog_state_release_iclog
drop last reference
->SYNCING
xlog_sync
xlog_write_iclog
if (log_is_shutdown)
xlog_state_done_syncing()
xlog_state_do_callback()
<process shutdown on iclog that is now in SYNCING state>

The problem is that xlog_state_release_iclog() aborts before doing
anything if the log is already shut down. It assumes that the
callbacks have already been cleaned up, and it doesn't need to do
any cleanup.

Hence the fix is to remove the xlog_is_shutdown() check from
xlog_state_release_iclog() so that reference counts are correctly
released from the iclogs, and when the reference count is zero we
always transition to SYNCING if the log is shut down. Hence we'll
always enter the xlog_sync() path in a shutdown and eventually end
up erroring out the iclog IO and running xlog_state_do_callback() to
process the callbacks attached to the iclog.

This allows us to stop processing referenced ACTIVE/WANT_SYNC iclogs
directly in the shutdown code, and in doing so gets rid of the UAF
vector that currently exists. This then decouples the adding of
callbacks to the iclogs from xlog_state_release_iclog() as we
guarantee that xlog_state_release_iclog() will process the callbacks
if the log has been shut down before xlog_state_release_iclog() has
been called.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2654e8f6

xfs: separate out log shutdown callback processing · 6cc9892d

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit aad7272a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=aad7272a920869b950d937b87562e494af72523c

-------------------------------------------------

The iclog callback processing done during a forced log shutdown has
different logic to normal runtime IO completion callback processing.
Separate out the shutdown callbacks into their own function and call
that from the shutdown code instead.

We don't need this shutdown specific logic in the normal runtime
completion code - we'll always run the shutdown version on shutdown,
and it will do what shutdown needs regardless of whether there are
racing IO completion callbacks scheduled or in progress. Hence we
can also simplify the normal IO completion callpath and only abort
if shutdown occurred while we actively were processing callbacks.

Further, separating out the IO completion logic from the shutdown
logic avoids callback race conditions from being triggered by log IO
completion after a shutdown. IO completion will now only run
callbacks on iclogs that are in the correct state for a callback to
be run, avoiding the possibility of running callbacks on a
referenced iclog that hasn't yet been submitted for IO.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6cc9892d

xfs: rework xlog_state_do_callback() · e66c6279

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit 8bb92005
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8bb92005b0e4682a6e5dad131c5f3636c7d56dc1

-------------------------------------------------

Clean it up a bit by factoring and rearranging some of the code.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e66c6279

xfs: make forced shutdown processing atomic · e4976011

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit b36d4651
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b36d4651e1650082d27fa477318183c4a7210e30

-------------------------------------------------

The running of a forced shutdown is a bit of a mess. It does racy
checks for XFS_MOUNT_SHUTDOWN in xfs_do_force_shutdown(), then
does more racy checks in xfs_log_force_unmount() before finally
setting XFS_MOUNT_SHUTDOWN and XLOG_IO_ERROR under the
log->icloglock.

Move the checking and setting of XFS_MOUNT_SHUTDOWN into
xfs_do_force_shutdown() so we only process a shutdown once and once
only. Serialise this with the mp->m_sb_lock spinlock so that the
state change is atomic and won't race. Move all the mount specific
shutdown state changes from xfs_log_force_unmount() to
xfs_do_force_shutdown() so they are done atomically with setting
XFS_MOUNT_SHUTDOWN.

Then get rid of the racy xlog_is_shutdown() check from
xlog_force_shutdown(), and gate the log shutdown on the
test_and_set_bit(XLOG_IO_ERROR) test under the icloglock. This
means that the log is shutdown once and once only, and code that
needs to prevent races with shutdown can do so by holding the
icloglock and checking the return value of xlog_is_shutdown().

This results in a predictable shutdown execution process - we set the
shutdown flags once and process the shutdown once rather than the
current "as many concurrent shutdowns as can race to the flag
setting" situation we have now.

Also, now that shutdown is atomic, alway emit a stack trace when the
error level for the filesystem is high enough. This means that we
always get a stack trace when trying to diagnose the cause of
shutdowns in the field, rather than just for SHUTDOWN_CORRUPT_INCORE
cases.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e4976011

xfs: convert log flags to an operational state field · d5ca9d5d

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit e1d06e5f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e1d06e5f668a403f48538f0d6b163edfd4342adf

-------------------------------------------------

log->l_flags doesn't actually contain "flags" as such, it contains
operational state information that can change at runtime. For the
shutdown state, this at least should be an atomic bit because
it is read without holding locks in many places and so using atomic
bitops for the state field modifications makes sense.

This allows us to use things like test_and_set_bit() on state
changes (e.g. setting XLOG_TAIL_WARN) to avoid races in setting the
state when we aren't holding locks.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d5ca9d5d

xfs: move recovery needed state updates to xfs_log_mount_finish · 32a8357e

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit fd67d8a0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fd67d8a07208ab06560287b7b9334c2d50b7d6d7

-------------------------------------------------

xfs_log_mount_finish() needs to know if recovery is needed or not to
make decisions on whether to flush the log and AIL.  Move the
handling of the NEED_RECOVERY state out to this function rather than
needing a temporary variable to store this state over the call to
xlog_recover_finish().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

32a8357e

xfs: XLOG_STATE_IOERROR must die · 4d20b959

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit 5112e206
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5112e2067bd94bd56aace4c7e4d45ff13b9152f8

-------------------------------------------------

We don't need an iclog state field to tell us the log has been shut
down. We can just check the xlog_is_shutdown() instead. The avoids
the need to have shutdown overwrite the current iclog state while
being active used by the log code and so having to ensure that every
iclog state check handles XLOG_STATE_IOERROR appropriately.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4d20b959

xfs: convert XLOG_FORCED_SHUTDOWN() to xlog_is_shutdown() · 8804a686

由 Dave Chinner 提交于 3月 09, 2022

mainline-inclusion
from mainline-v5.14-rc4
commit 2039a272
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2039a272300b949c05888428877317b834c0b1fb

-------------------------------------------------

Make it less shouty and a static inline before adding more calls
through the log code.

Also convert internal log code that uses XFS_FORCED_SHUTDOWN(mount)
to use xlog_is_shutdown(log) as well.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLihong Kou <koulihong@huawei.com>
Reviewed-by: Nguoxuenan <guoxuenan@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8804a686

Revert "nfs: ensure correct writeback errors are returned on close()" · d2e650b8

由 ChenXiaoSong 提交于 3月 09, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4T2WV
CVE: NA

--------------------------------

This reverts commit 67dd23f9.

filemap_sample_wb_err() will return 0 if nobody has seen the error yet,
then filemap_check_wb_err() will return the unchanged writeback error.

Reproducer:
        nfs server               |       nfs client
 --------------------------------|----------------------------------------------
 # No space left on server       |
 fallocate -l 100G /server/nospc |
                                 |
                                 | mount -t nfs $nfs_server_ip:/ /mnt
                                 |
                                 | # Expected error: No space left on device
                                 | dd if=/dev/zero of=/mnt/file count=1 ibs=1K
                                 |
                                 | # Release space on mountpoint
                                 | rm /mnt/nospc
                                 |
                                 | # Unexpected error: No space left on device
                                 | dd if=/dev/zero of=/mnt/file count=1 ibs=1K
Signed-off-by: NChenXiaoSong <chenxiaosong2@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d2e650b8

fuse: support SB_NOSEC flag to improve write performance · 08dca36f

由 Vivek Goyal 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.11-rc1
commit 9d769e6a
category: perf
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SIR8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9d769e6aa2524e1762e3b8681e0ed78f8acf6cad

--------------------------------

Virtiofs can be slow with small writes if xattr are enabled and we are
doing cached writes (No direct I/O). Ganesh Mahalingam noticed this.

Some debugging showed that file_remove_privs() is called in cached write
path on every write. And everytime it calls security_inode_need_killpriv()
which results in call to __vfs_getxattr(XATTR_NAME_CAPS). And this goes to
file server to fetch xattr. This extra round trip for every write slows
down writes tremendously.

Normally to avoid paying this penalty on every write, vfs has the notion of
caching this information in inode (S_NOSEC). So vfs sets S_NOSEC, if
filesystem opted for it using super block flag SB_NOSEC. And S_NOSEC is
cleared when setuid/setgid bit is set or when security xattr is set on
inode so that next time a write happens, we check inode again for clearing
setuid/setgid bits as well clear any security.capability xattr.

This seems to work well for local file systems but for remote file systems
it is possible that VFS does not have full picture and a different client
sets setuid/setgid bit or security.capability xattr on file and that means
VFS information about S_NOSEC on another client will be stale. So for
remote filesystems SB_NOSEC was disabled by default.

Commit 9e1f1de0 ("more conservative S_NOSEC handling") mentioned that
these filesystems can still make use of SB_NOSEC as long as they clear
S_NOSEC when they are refreshing inode attriutes from server.

So this patch tries to enable SB_NOSEC on fuse (regular fuse as well as
virtiofs). And clear SB_NOSEC when we are refreshing inode attributes.

This is enabled only if server supports FUSE_HANDLE_KILLPRIV_V2. This says
that server will clear setuid/setgid/security.capability on
chown/truncate/write as apporpriate.

This should provide tighter coherency because now suid/sgid/
security.capability will be cleared even if fuse client cache has not seen
these attrs.

Basic idea is that fuse client will trigger suid/sgid/security.capability
clearing based on its attr cache. But even if cache has gone stale, it is
fine because FUSE_HANDLE_KILLPRIV_V2 will make sure WRITE clear
suid/sgid/security.capability.

We make this change only if server supports FUSE_HANDLE_KILLPRIV_V2. This
should make sure that existing filesystems which might be relying on
seucurity.capability always being queried from server are not impacted.

This tighter coherency relies on WRITE showing up on server (and not being
cached in guest). So writeback_cache mode will not provide that tight
coherency and it is not recommended to use two together. Having said that
it might work reasonably well for lot of use cases.

This change improves random write performance very significantly. Running
virtiofsd with cache=auto and following fio command:

fio --ioengine=libaio --direct=1 --name=test --filename=/mnt/virtiofs/random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randwrite

Bandwidth increases from around 50MB/s to around 250MB/s as a result of
applying this patch. So improvement is very significant.

Link: https://github.com/kata-containers/runtime/issues/2815Reported-by: N"Mahalingam, Ganesh" <ganesh.mahalingam@intel.com>
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

08dca36f

fuse: add a flag FUSE_OPEN_KILL_SUIDGID for open() request · 2779839f

由 Vivek Goyal 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.11-rc1
commit 643a666a
category: perf
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SIR8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=643a666a89c358ef588d2b3ef9f2dc1efc421e61

--------------------------------

With FUSE_HANDLE_KILLPRIV_V2 support, server will need to kill suid/sgid/
security.capability on open(O_TRUNC), if server supports
FUSE_ATOMIC_O_TRUNC.

But server needs to kill suid/sgid only if caller does not have CAP_FSETID.
Given server does not have this information, client needs to send this info
to server.

So add a flag FUSE_OPEN_KILL_SUIDGID to fuse_open_in request which tells
server to kill suid/sgid (only if group execute is set).

This flag is added to the FUSE_OPEN request, as well as the FUSE_CREATE
request if the create was non-exclusive, since that might result in an
existing file being opened/truncated.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2779839f

fuse: don't send ATTR_MODE to kill suid/sgid for handle_killpriv_v2 · 718e25b6

由 Vivek Goyal 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.11-rc1
commit 8981bdfd
category: perf
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SIR8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8981bdfda7445af5d5a8c277c923bf91873a0c98

--------------------------------

If client does a write() on a suid/sgid file, VFS will first call
fuse_setattr() with ATTR_KILL_S[UG]ID set. This requires sending setattr
to file server with ATTR_MODE set to kill suid/sgid. But to do that client
needs to know latest mode otherwise it is racy.

To reduce the race window, current code first call fuse_do_getattr() to get
latest ->i_mode and then resets suid/sgid bits and sends rest to server
with setattr(ATTR_MODE). This does not reduce the race completely but
narrows race window significantly.

With fc->handle_killpriv_v2 enabled, it should be possible to remove this
race completely. Do not kill suid/sgid with ATTR_MODE at all. It will be
killed by server when WRITE request is sent to server soon. This is
similar to fc->handle_killpriv logic. V2 is just more refined version of
protocol. Hence this patch does not send ATTR_MODE to kill suid/sgid if
fc->handle_killpriv_v2 is enabled.

This creates an issue if fc->writeback_cache is enabled. In that case
WRITE can be cached in guest and server might not see WRITE request and
hence will not kill suid/sgid. Miklos suggested that in such cases, we
should fallback to a writethrough WRITE instead and that will generate
WRITE request and kill suid/sgid. This patch implements that too.

But this relies on client seeing the suid/sgid set. If another client sets
suid/sgid and this client does not see it immideately, then we will not
fallback to writethrough WRITE. So this is one limitation with both
fc->handle_killpriv_v2 and fc->writeback_cache enabled. Both the options
are not fully compatible. But might be good enough for many use cases.

Note: This patch is not checking whether security.capability is set or not
when falling back to writethrough path. If suid/sgid is not set and
only security.capability is set, that will be taken care of by
file_remove_privs() call in ->writeback_cache path.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

718e25b6

fuse: setattr should set FATTR_KILL_SUIDGID · d9f7a979

由 Vivek Goyal 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.11-rc1
commit 31792161
category: perf
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SIR8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3179216135ec09825d7c7875580951a6e69dc5df

--------------------------------

If fc->handle_killpriv_v2 is enabled, we expect file server to clear
suid/sgid/security.capbility upon chown/truncate/write as appropriate.

Upon truncate (ATTR_SIZE), suid/sgid are cleared only if caller does not
have CAP_FSETID.  File server does not know whether caller has CAP_FSETID
or not.  Hence set FATTR_KILL_SUIDGID upon truncate to let file server know
that caller does not have CAP_FSETID and it should kill suid/sgid as
appropriate.

On chown (ATTR_UID/ATTR_GID) suid/sgid need to be cleared irrespective of
capabilities of calling process, so set FATTR_KILL_SUIDGID unconditionally
in that case.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d9f7a979

fuse: set FUSE_WRITE_KILL_SUIDGID in cached write path · 7baffd4f

由 Vivek Goyal 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.11-rc1
commit b8667395
category: perf
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SIR8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b866739596ae3c3c60c43f1cf04a516c5aa20fd1

--------------------------------

With HANDLE_KILLPRIV_V2, server will need to kill suid/sgid if caller does
not have CAP_FSETID.  We already have a flag FUSE_WRITE_KILL_SUIDGID in
WRITE request and we already set it in direct I/O path.

To make it work in cached write path also, start setting
FUSE_WRITE_KILL_SUIDGID in this path too.

Set it only if fc->handle_killpriv_v2 is set.  Otherwise client is
responsible for kill suid/sgid.

In case of direct I/O we set FUSE_WRITE_KILL_SUIDGID unconditionally
because we don't call file_remove_privs() in that path (with cache=none
option).
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7baffd4f

fuse: rename FUSE_WRITE_KILL_PRIV to FUSE_WRITE_KILL_SUIDGID · b0c7dc73

由 Miklos Szeredi 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.11-rc1
commit 10c52c84
category: perf
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SIR8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=10c52c84e3f4872689a64ac7666b34d67e630691

--------------------------------

Kernel has:
ATTR_KILL_PRIV -> clear "security.capability"
ATTR_KILL_SUID -> clear S_ISUID
ATTR_KILL_SGID -> clear S_ISGID if executable

Fuse has:
FUSE_WRITE_KILL_PRIV -> clear S_ISUID and S_ISGID if executable

So FUSE_WRITE_KILL_PRIV implies the complement of ATTR_KILL_PRIV, which is
somewhat confusing.  Also PRIV implies all privileges, including
"security.capability".

Change the name to FUSE_WRITE_KILL_SUIDGID and make FUSE_WRITE_KILL_PRIV an
alias to perserve API compatibility
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b0c7dc73

fuse: introduce the notion of FUSE_HANDLE_KILLPRIV_V2 · a077e800

由 Vivek Goyal 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.11-rc1
commit 63f9909f
category: perf
bugzilla: https://gitee.com/openeuler/kernel/issues/I4SIR8

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=63f9909ff602082597849f684655e93336c50b11

--------------------------------

We already have FUSE_HANDLE_KILLPRIV flag that says that file server will
remove suid/sgid/caps on truncate/chown/write. But that's little different
from what Linux VFS implements.

To be consistent with Linux VFS behavior what we want is.

- caps are always cleared on chown/write/truncate
- suid is always cleared on chown, while for truncate/write it is cleared
only if caller does not have CAP_FSETID.
- sgid is always cleared on chown, while for truncate/write it is cleared
only if caller does not have CAP_FSETID as well as file has group execute
permission.

As previous flag did not provide above semantics. Implement a V2 of the
protocol with above said constraints.

Server does not know if caller has CAP_FSETID or not. So for the case
of write()/truncate(), client will send information in special flag to
indicate whether to kill priviliges or not. These changes are in subsequent
patches.

FUSE_HANDLE_KILLPRIV_V2 relies on WRITE being sent to server to clear
suid/sgid/security.capability. But with ->writeback_cache, WRITES are
cached in guest. So it is not recommended to use FUSE_HANDLE_KILLPRIV_V2
and writeback_cache together. Though it probably might be good enough
for lot of use cases.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a077e800

xfs: remove dead stale buf unpin handling code · 55619cc7

由 Brian Foster 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.13-rc4
commit e53d3aa0
category: bugfix
bugzilla: 185862 https://gitee.com/openeuler/kernel/issues/I4KIAO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e53d3aa0b605c49d780e1b2fd0b49dba4154f32b

--------------------------------

This code goes back to a time when transaction commits wrote
directly to iclogs. The associated log items were pinned, written to
the log, and then "uncommitted" if some part of the log write had
failed. This uncommit sequence called an ->iop_unpin_remove()
handler that was eventually folded into ->iop_unpin() via the remove
parameter. The log subsystem has since changed significantly in that
transactions commit to the CIL instead of direct to iclogs, though
log items must still be aborted in the event of an eventual log I/O
error. However, the context for a log item abort is now asynchronous
from transaction commit, which means the committing transaction has
been freed by this point in time and the transaction uncommit
sequence of events is no longer relevant.

Further, since stale buffers remain locked at transaction commit
through unpin, we can be certain that the buffer is not associated
with any transaction when the unpin callback executes. Remove this
unused hunk of code and replace it with an assertion that the buffer
is disassociated from transaction context.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
Reviewed-by: NLihong Kou <koulihong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

55619cc7

xfs: hold buffer across unpin and potential shutdown processing · baa590d3

由 Brian Foster 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.13-rc4
commit 84d8949e
category: bugfix
bugzilla: 185862 https://gitee.com/openeuler/kernel/issues/I4KIAO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=84d8949e770745b16a7e8a68dcb1d0f3687bdee9

--------------------------------

The special processing used to simulate a buffer I/O failure on fs
shutdown has a difficult to reproduce race that can result in a use
after free of the associated buffer. Consider a buffer that has been
committed to the on-disk log and thus is AIL resident. The buffer
lands on the writeback delwri queue, but is subsequently locked,
committed and pinned by another transaction before submitted for
I/O. At this point, the buffer is stuck on the delwri queue as it
cannot be submitted for I/O until it is unpinned. A log checkpoint
I/O failure occurs sometime later, which aborts the bli. The unpin
handler is called with the aborted log item, drops the bli reference
count, the pin count, and falls into the I/O failure simulation
path.

The potential problem here is that once the pin count falls to zero
in ->iop_unpin(), xfsaild is free to retry delwri submission of the
buffer at any time, before the unpin handler even completes. If
delwri queue submission wins the race to the buffer lock, it
observes the shutdown state and simulates the I/O failure itself.
This releases both the bli and delwri queue holds and frees the
buffer while xfs_buf_item_unpin() sits on xfs_buf_lock() waiting to
run through the same failure sequence. This problem is rare and
requires many iterations of fstest generic/019 (which simulates disk
I/O failures) to reproduce.

To avoid this problem, grab a hold on the buffer before the log item
is unpinned if the associated item has been aborted and will require
a simulated I/O failure. The hold is already required for the
simulated I/O failure, so the ordering simply guarantees the unpin
handler access to the buffer before it is unpinned and thus
processed by the AIL. This particular ordering is required so long
as the AIL does not acquire a reference on the bli, which is the
long term solution to this problem.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
Reviewed-by: NLihong Kou <koulihong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

baa590d3

xfs: fix an ABBA deadlock in xfs_rename · 644dc676

由 Darrick J. Wong 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.11-rc4
commit 6da1b4b1
category: bugfix
bugzilla: 185867 https://gitee.com/openeuler/kernel/issues/I4KIAO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6da1b4b1ab36d80a3994fd4811c8381de10af604

--------------------------------

When overlayfs is running on top of xfs and the user unlinks a file in
the overlay, overlayfs will create a whiteout inode and ask xfs to
"rename" the whiteout file atop the one being unlinked.  If the file
being unlinked loses its one nlink, we then have to put the inode on the
unlinked list.

This requires us to grab the AGI buffer of the whiteout inode to take it
off the unlinked list (which is where whiteouts are created) and to grab
the AGI buffer of the file being deleted.  If the whiteout was created
in a higher numbered AG than the file being deleted, we'll lock the AGIs
in the wrong order and deadlock.

Therefore, grab all the AGI locks we think we'll need ahead of time, and
in order of increasing AG number per the locking rules.
Reported-by: Nwenli xie <wlxie7296@gmail.com>
Fixes: 93597ae8 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()")
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
Reviewed-by: NLihong Kou <koulihong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

644dc676

Revert "efi/libstub: arm64: Relax 2M alignment again for relocatable kernels" · daa17ae4

由 Yang Yingliang 提交于 3月 09, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4VSGH
CVE: NA

--------------------------------

This reverts commit c6d2a109.

I got the following messages when booting kernel:

  EFI stub: Booting Linux Kernel...
  EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled
  EFI stub: Using DTB from configuration table
  EFI stub: Exiting boot services and installing virtual address map...

  ...

  [ 0.000000] CPU features: kernel page table isolation forced ON by KASLR
  [ 0.000000] CPU features: detected: Kernel page table isolation (KPTI)
  [ 3.393380] KASLR disabled due to lack of seed

KPTI is forced on by KASLR, but in fact KASLR is not enabled, it's
because kaslr_offset() returns non-zero in kaslr_requires_kpti().

To avoid this problem, when efi kaslr is disabled, make image
MIN_KIMG_ALIGN align which is used to get KASLR offset in
primary_entry(), so kaslr_offset() will returns 0.
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

daa17ae4

crypto: hisilicon/qm - fix memset during queues clearing · 8b0fe7a4

由 Kai Ye 提交于 3月 09, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ

----------------------------------------------------------------------

Due to that extra page addr is used as a qp error flag when the device
resetting. So it not should to clear this qp flag in userspace.
Signed-off-by: NKai Ye <yekai13@huawei.com>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8b0fe7a4

crypto: hisilicon/qm - modify device status check parameter · 6e44486d

由 Weili Qian 提交于 3月 09, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ

----------------------------------------------------------------------

If the device master ooo is blocked, there is
no need to empty the queue. Only the PF can obtain the
status of the device. If the VF runs on the host,
the device status can be obtained by PF.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6e44486d

crypto: hisilicon/qm - remove redundant cache writeback · ed3b55da

由 Weili Qian 提交于 3月 09, 2022

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ

----------------------------------------------------------------------

Currently, the memory of the queue's sqe is freed when
the driver is removed, not the put queue. Therefore, it is only
necessary to write back the data in the hardware cache to
memory before removing the driver.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ed3b55da

crypto: hisilicon/qm - disable queue when 'CQ' error · 39d29e2f

由 Weili Qian 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 696645d2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ
CVE: NA

--------------------------------

If the hardware reports the 'CQ' overflow or 'CQE' error by the abnormal
interrupt, disable the queue and stop tasks send to hardware.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

39d29e2f

crypto: hisilicon/qm - reset function if event queue overflows · 7d8d68c8

由 Weili Qian 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 95f0b6d5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ
CVE: NA

--------------------------------

If the hardware reports the event queue overflow by the abnormal interrupt,
the driver needs to reset the function and re-enable the event queue
interrupt and abnormal interrupt.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7d8d68c8

crypto: hisilicon/qm - use request_threaded_irq instead · 21085e5a

由 Weili Qian 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.17-rc1
commit a0a9486b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ
CVE: NA

--------------------------------

The abnormal interrupt method needs to be changed, and the changed method
needs to be locked in order to maintain atomicity. Therefore,
replace request_irq() with request_threaded_irq().
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

21085e5a

crypto: hisilicon/qm - modify the handling method after abnormal interruption · ad7d5d93

由 Weili Qian 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 145dcedd
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ
CVE: NA

--------------------------------

After processing an interrupt event and the interrupt function is
enabled by writing the QM_DOORBELL_CMD_AEQ register, the hardware
may generate new interrupt events due to processing other user's task
when the subsequent interrupt events have not been processed. The new
interrupt event will disrupt the current normal processing flow and
cause other problems.

Therefore, the operation of writing the QM_DOORBELL_CMD_AEQ doorbell
register needs to be placed after all interrupt events processing
are completed.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ad7d5d93

crypto: hisilicon/qm - code movement · 18f4481c

由 Weili Qian 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 9ee401ea
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ
CVE: NA

--------------------------------

This patch does not change any code, just code movement. Preparing for
next patch.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

18f4481c

crypto: hisilicon/qm - remove unnecessary device memory reset · 6ecd5db9

由 Weili Qian 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.17-rc1
commit f123e66d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3OQ
CVE: NA

--------------------------------

The internal memory of the device needs to be reset only when
the device is globally initialized. Other scenarios, such as
function reset, do not need to perform reset.
Signed-off-by: NWeili Qian <qianweili@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6ecd5db9

crypto: hisilicon/qm - fix deadlock for remove driver · 9a3b668e

由 Yang Shen 提交于 3月 09, 2022

mainline inclusion
from mainline-v5.16-rc1
commit fc6c01f0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W3EG
CVE: NA

--------------------------------

When remove the driver and executing the task occur at the same time,
the following deadlock will be triggered:

Chain exists of:
    sva_lock --> uacce_mutex --> &qm->qps_lock
    Possible unsafe locking scenario:
		CPU0                    CPU1
		----                    ----
	lock(&qm->qps_lock);
					lock(uacce_mutex);
					lock(&qm->qps_lock);
	lock(sva_lock);

And the lock 'qps_lock' is used to protect qp. Therefore, it's reasonable
cycle is to continue until the qp memory is released. So move the release
lock infront of 'uacce_remove'.
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9a3b668e

crypto: hisilicon/sec - add some comments for soft fallback · 09741c21

由 Kai Ye 提交于 3月 09, 2022

mainline inclusion
from mainline-crypto-master
commit e764d81d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WC3O
CVE: NA

--------------------------------

Modify the print of information that might lead to user misunderstanding.
Currently only XTS mode need the fallback tfm when using 192bit key.
Others algs not need soft fallback tfm. So others algs can return
directly.
Signed-off-by: NKai Ye <yekai13@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

09741c21

crypto: hisilicon/sec - fix the aead software fallback for engine · 7abba214

由 Kai Ye 提交于 3月 09, 2022

mainline inclusion
from mainline-crypto-master
commit 0a2a464f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4W4WU
CVE: NA

--------------------------------

Due to the subreq pointer misuse the private context memory. The aead
soft crypto occasionally casues the OS panic as setting the 64K page.
Here is fix it.

Fixes: 6c46a329 ("crypto: hisilicon/sec - add fallback tfm...")
Signed-off-by: NKai Ye <yekai13@huawei.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NYang Shen <shenyang39@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7abba214

08 3月, 2022 6 次提交

blk-throttle: Set BIO_THROTTLED when bio has been throttled · 2c7b5509

由 Laibin Qiu 提交于 3月 08, 2022

hulk inclusion
category: bugfix
bugzilla: 185779, https://gitee.com/openeuler/kernel/issues/I4WFIY
CVE: NA

-------------------------------------------------

1.In current process, all bio will set the BIO_THROTTLED flag
after __blk_throtl_bio().

2.If bio needs to be throttled, it will start the timer and
stop submit bio directly. Bio will submit in
blk_throtl_dispatch_work_fn() when the timer expires.But in
the current process, if bio is throttled. The BIO_THROTTLED
will be set to bio after timer start. If the bio has been
completed, it may cause use-after-free blow.

BUG: KASAN: use-after-free in blk_throtl_bio+0x12f0/0x2c70
Read of size 2 at addr ffff88801b8902d4 by task fio/26380

 dump_stack+0x9b/0xce
 print_address_description.constprop.6+0x3e/0x60
 kasan_report.cold.9+0x22/0x3a
 blk_throtl_bio+0x12f0/0x2c70
 submit_bio_checks+0x701/0x1550
 submit_bio_noacct+0x83/0xc80
 submit_bio+0xa7/0x330
 mpage_readahead+0x380/0x500
 read_pages+0x1c1/0xbf0
 page_cache_ra_unbounded+0x471/0x6f0
 do_page_cache_ra+0xda/0x110
 ondemand_readahead+0x442/0xae0
 page_cache_async_ra+0x210/0x300
 generic_file_buffered_read+0x4d9/0x2130
 generic_file_read_iter+0x315/0x490
 blkdev_read_iter+0x113/0x1b0
 aio_read+0x2ad/0x450
 io_submit_one+0xc8e/0x1d60
 __se_sys_io_submit+0x125/0x350
 do_syscall_64+0x2d/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Allocated by task 26380:
 kasan_save_stack+0x19/0x40
 __kasan_kmalloc.constprop.2+0xc1/0xd0
 kmem_cache_alloc+0x146/0x440
 mempool_alloc+0x125/0x2f0
 bio_alloc_bioset+0x353/0x590
 mpage_alloc+0x3b/0x240
 do_mpage_readpage+0xddf/0x1ef0
 mpage_readahead+0x264/0x500
 read_pages+0x1c1/0xbf0
 page_cache_ra_unbounded+0x471/0x6f0
 do_page_cache_ra+0xda/0x110
 ondemand_readahead+0x442/0xae0
 page_cache_async_ra+0x210/0x300
 generic_file_buffered_read+0x4d9/0x2130
 generic_file_read_iter+0x315/0x490
 blkdev_read_iter+0x113/0x1b0
 aio_read+0x2ad/0x450
 io_submit_one+0xc8e/0x1d60
 __se_sys_io_submit+0x125/0x350
 do_syscall_64+0x2d/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 0:
 kasan_save_stack+0x19/0x40
 kasan_set_track+0x1c/0x30
 kasan_set_free_info+0x1b/0x30
 __kasan_slab_free+0x111/0x160
 kmem_cache_free+0x94/0x460
 mempool_free+0xd6/0x320
 bio_free+0xe0/0x130
 bio_put+0xab/0xe0
 bio_endio+0x3a6/0x5d0
 blk_update_request+0x590/0x1370
 scsi_end_request+0x7d/0x400
 scsi_io_completion+0x1aa/0xe50
 scsi_softirq_done+0x11b/0x240
 blk_mq_complete_request+0xd4/0x120
 scsi_mq_done+0xf0/0x200
 virtscsi_vq_done+0xbc/0x150
 vring_interrupt+0x179/0x390
 __handle_irq_event_percpu+0xf7/0x490
 handle_irq_event_percpu+0x7b/0x160
 handle_irq_event+0xcc/0x170
 handle_edge_irq+0x215/0xb20
 common_interrupt+0x60/0x120
 asm_common_interrupt+0x1e/0x40

Fix this by move BIO_THROTTLED set into the queue_lock.
Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2c7b5509

bpf, selftests: Add ringbuf memory type confusion test · 6b1836a9

由 Daniel Borkmann 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 37c8d480
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WT90
CVE: CVE-2021-4204

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=37c8d4807d1b8b521b30310dce97f6695dc2c2c6

--------------------------------

Add two tests, one which asserts that ring buffer memory can be passed to
other helpers for populating its entry area, and another one where verifier
rejects different type of memory passed to bpf_ringbuf_submit().
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6b1836a9

bpf/selftests: Test bpf_d_path on rdonly_mem. · 76f59b4d

由 Hao Luo 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 44bab87d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WT90
CVE: CVE-2021-4204

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=44bab87d8ca6f0544a9f8fc97bdf33aa5b3c899e

--------------------------------

The second parameter of bpf_d_path() can only accept writable
memories. Rdonly_mem obtained from bpf_per_cpu_ptr() can not
be passed into bpf_d_path for modification. This patch adds
a selftest to verify this behavior.
Signed-off-by: NHao Luo <haoluo@google.com>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220106205525.2116218-1-haoluo@google.comSigned-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

76f59b4d

bpf, selftests: Add various ringbuf tests with invalid offset · 017bd71f

由 Daniel Borkmann 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 722e4db3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WT90
CVE: CVE-2021-4204

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=722e4db3ae0d52b2e3801280afbe19cf2d188e91

--------------------------------

Assert that the verifier is rejecting invalid offsets on the ringbuf entries:

  # ./test_verifier | grep ring
  #947/u ringbuf: invalid reservation offset 1 OK
  #947/p ringbuf: invalid reservation offset 1 OK
  #948/u ringbuf: invalid reservation offset 2 OK
  #948/p ringbuf: invalid reservation offset 2 OK
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

017bd71f

selftests/bpf: Add verifier test for PTR_TO_MEM spill · c1046afc

由 Gilad Reti 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.11-rc5
commit 4237e9f4
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WT90
CVE: CVE-2021-4204

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4237e9f4a96228ccc8a7abe5e4b30834323cd353

--------------------------------

Add a test to check that the verifier is able to recognize spilling of
PTR_TO_MEM registers, by reserving a ringbuf buffer, forcing the spill
of a pointer holding the buffer address to the stack, filling it back
in from the stack and writing to the memory area pointed by it.

The patch was partially contributed by CyberArk Software, Inc.
Signed-off-by: NGilad Reti <gilad.reti@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Acked-by: NKP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210113053810.13518-2-gilad.reti@gmail.comSigned-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c1046afc

bpf: Fix ringbuf memory type confusion when passing to helpers · e1601ef1

由 Daniel Borkmann 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc1
commit a672b2e3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4WT90
CVE: CVE-2021-4204

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a672b2e36a648afb04ad3bda93b6bda947a479a5

--------------------------------

The bpf_ringbuf_submit() and bpf_ringbuf_discard() have ARG_PTR_TO_ALLOC_MEM
in their bpf_func_proto definition as their first argument, and thus both expect
the result from a prior bpf_ringbuf_reserve() call which has a return type of
RET_PTR_TO_ALLOC_MEM_OR_NULL.

While the non-NULL memory from bpf_ringbuf_reserve() can be passed to other
helpers, the two sinks (bpf_ringbuf_submit(), bpf_ringbuf_discard()) right now
only enforce a register type of PTR_TO_MEM.

This can lead to potential type confusion since it would allow other PTR_TO_MEM
memory to be passed into the two sinks which did not come from bpf_ringbuf_reserve().

Add a new MEM_ALLOC composable type attribute for PTR_TO_MEM, and enforce that:

 - bpf_ringbuf_reserve() returns NULL or PTR_TO_MEM | MEM_ALLOC
 - bpf_ringbuf_submit() and bpf_ringbuf_discard() only take PTR_TO_MEM | MEM_ALLOC
   but not plain PTR_TO_MEM arguments via ARG_PTR_TO_ALLOC_MEM
 - however, other helpers might treat PTR_TO_MEM | MEM_ALLOC as plain PTR_TO_MEM
   to populate the memory area when they use ARG_PTR_TO_{UNINIT_,}MEM in their
   func proto description

Fixes: 457f4436 ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NPu Lehui <pulehui@huawei.com>
Reviewed-by: NKuohai Xu <xukuohai@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e1601ef1

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功