提交 · 8c3f9cd1603d0e4af6c50ebc6d974ab7bdd03cf4 · openeuler / Kernel

12 4月, 2021 6 次提交

io_uring: use better types for cflags · 8c3f9cd1

由 Pavel Begunkov 提交于 2月 28, 2021

__io_cqring_fill_event() takes cflags as long to squeeze it into u32 in
an CQE, awhile all users pass int or unsigned. Replace it with unsigned
int and store it as u32 in struct io_completion to match CQE.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8c3f9cd1

io_uring: refactor provide/remove buffer locking · 9fb8cb49

由 Pavel Begunkov 提交于 2月 28, 2021

Always complete request holding the mutex instead of doing that strange
dancing with conditional ordering.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9fb8cb49

io_uring: add a helper failing not issued requests · f41db273

由 Pavel Begunkov 提交于 2月 28, 2021

Add a simple helper doing CQE posting, marking request for link-failure,
and putting both submission and completion references.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f41db273

io_uring: further deduplicate file slot selection · dafecf19

由 Pavel Begunkov 提交于 2月 28, 2021

io_fixed_file_slot() and io_file_from_index() behave pretty similarly,
DRY and call one from another.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dafecf19

io_uring: reuse io_req_task_queue_fail() · 2c4b8eb6

由 Pavel Begunkov 提交于 2月 28, 2021

Use io_req_task_queue_fail() on the fail path of io_req_task_queue().
It's unlikely to happen, so don't care about additional overhead, but
allows to keep all the req->result invariant in a single function.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2c4b8eb6

io_uring: avoid taking ctx refs for task-cancel · e83acd7d

由 Pavel Begunkov 提交于 2月 28, 2021

Don't bother to take a ctx->refs for io_req_task_cancel() because it
take uring_lock before putting a request, and the context is promised to
stay alive until unlock happens.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e83acd7d

10 4月, 2021 3 次提交

btrfs: zoned: move superblock logging zone location · 53b74fa9

由 Naohiro Aota 提交于 4月 08, 2021

Moves the location of the superblock logging zones. The new locations of
the logging zones are now determined based on fixed block addresses
instead of on fixed zone numbers.

The old placement method based on fixed zone numbers causes problems when
one needs to inspect a file system image without access to the drive zone
information. In such case, the super block locations cannot be reliably
determined as the zone size is unknown. By locating the superblock logging
zones using fixed addresses, we can scan a dumped file system image without
the zone information since a super block copy will always be present at or
after the fixed known locations.

Introduce the following three pairs of zones containing fixed offset
locations, regardless of the device zone size.

  - primary superblock: offset   0B (and the following zone)
  - first copy:         offset 512G (and the following zone)
  - Second copy:        offset   4T (4096G, and the following zone)

If a logging zone is outside of the disk capacity, we do not record the
superblock copy.

The first copy position is much larger than for a non-zoned filesystem,
which is at 64M.  This is to avoid overlapping with the log zones for
the primary superblock. This higher location is arbitrary but allows
supporting devices with very large zone sizes, plus some space around in
between.

Such large zone size is unrealistic and very unlikely to ever be seen in
real devices. Currently, SMR disks have a zone size of 256MB, and we are
expecting ZNS drives to be in the 1-4GB range, so this limit gives us
room to breathe. For now, we only allow zone sizes up to 8GB. The
maximum zone size that would still fit in the space is 256G.

The fixed location addresses are somewhat arbitrary, with the intent of
maintaining superblock reliability for smaller and larger devices, with
the preference for the latter. For this reason, there are two superblocks
under the first 1T. This should cover use cases for physical devices and
for emulated/device-mapper devices.

The superblock logging zones are reserved for superblock logging and
never used for data or metadata blocks. Note that we only reserve the
two zones per primary/copy actually used for superblock logging. We do
not reserve the ranges of zones possibly containing superblocks with the
largest supported zone size (0-16GB, 512G-528GB, 4096G-4112G).

The zones containing the fixed location offsets used to store
superblocks on a non-zoned volume are also reserved to avoid confusion.
Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

53b74fa9

fs: direct-io: fix missing sdio->boundary · df41872b

由 Jack Qiu 提交于 4月 09, 2021

I encountered a hung task issue, but not a performance one.  I run DIO
on a device (need lba continuous, for example open channel ssd), maybe
hungtask in below case:

  DIO:						Checkpoint:
  get addr A(at boundary), merge into BIO,
  no submit because boundary missing
						flush dirty data(get addr A+1), wait IO(A+1)
						writeback timeout, because DIO(A) didn't submit
  get addr A+2 fail, because checkpoint is doing

dio_send_cur_page() may clear sdio->boundary, so prevent it from missing
a boundary.

Link: https://lkml.kernel.org/r/20210322042253.38312-1-jack.qiu@huawei.com
Fixes: b1058b98 ("direct-io: submit bio after boundary buffer is added to it")
Signed-off-by: NJack Qiu <jack.qiu@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df41872b

ocfs2: fix deadlock between setattr and dio_end_io_write · 90bd070a

由 Wengang Wang 提交于 4月 09, 2021

The following deadlock is detected:

  truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write).

  PID: 14827  TASK: ffff881686a9af80  CPU: 20  COMMAND: "ora_p005_hrltd9"
   #0  __schedule at ffffffff818667cc
   #1  schedule at ffffffff81866de6
   #2  inode_dio_wait at ffffffff812a2d04
   #3  ocfs2_setattr at ffffffffc05f322e [ocfs2]
   #4  notify_change at ffffffff812a5a09
   #5  do_truncate at ffffffff812808f5
   #6  do_sys_ftruncate.constprop.18 at ffffffff81280cf2
   #7  sys_ftruncate at ffffffff81280d8e
   #8  do_syscall_64 at ffffffff81003949
   #9  entry_SYSCALL_64_after_hwframe at ffffffff81a001ad

dio completion path is going to complete one direct IO (decrement
inode->i_dio_count), but before that it hung at locking inode->i_rwsem:

   #0  __schedule+700 at ffffffff818667cc
   #1  schedule+54 at ffffffff81866de6
   #2  rwsem_down_write_failed+536 at ffffffff8186aa28
   #3  call_rwsem_down_write_failed+23 at ffffffff8185a1b7
   #4  down_write+45 at ffffffff81869c9d
   #5  ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2]
   #6  ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2]
   #7  dio_complete+140 at ffffffff812c873c
   #8  dio_aio_complete_work+25 at ffffffff812c89f9
   #9  process_one_work+361 at ffffffff810b1889
  #10  worker_thread+77 at ffffffff810b233d
  #11  kthread+261 at ffffffff810b7fd5
  #12  ret_from_fork+62 at ffffffff81a0035e

Thus above forms ABBA deadlock.  The same deadlock was mentioned in
upstream commit 28f5a8a7 ("ocfs2: should wait dio before inode lock
in ocfs2_setattr()").  It seems that that commit only removed the
cluster lock (the victim of above dead lock) from the ABBA deadlock
party.

End-user visible effects: Process hang in truncate -> ocfs2_setattr path
and other processes hang at ocfs2_dio_end_io_write path.

This is to fix the deadlock itself.  It removes inode_lock() call from
dio completion path to remove the deadlock and add ip_alloc_sem lock in
setattr path to synchronize the inode modifications.

[wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested]
  Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com

Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.comSigned-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

90bd070a

09 4月, 2021 2 次提交

io-wq: cancel unbounded works on io-wq destroy · c60eb049

由 Pavel Begunkov 提交于 4月 08, 2021

WARNING: CPU: 5 PID: 227 at fs/io_uring.c:8578 io_ring_exit_work+0xe6/0x470
RIP: 0010:io_ring_exit_work+0xe6/0x470
Call Trace:
 process_one_work+0x206/0x400
 worker_thread+0x4a/0x3d0
 kthread+0x129/0x170
 ret_from_fork+0x22/0x30

INFO: task lfs-openat:2359 blocked for more than 245 seconds.
task:lfs-openat      state:D stack:    0 pid: 2359 ppid:     1 flags:0x00000004
Call Trace:
 ...
 wait_for_completion+0x8b/0xf0
 io_wq_destroy_manager+0x24/0x60
 io_wq_put_and_exit+0x18/0x30
 io_uring_clean_tctx+0x76/0xa0
 __io_uring_files_cancel+0x1b9/0x2e0
 do_exit+0xc0/0xb40
 ...

Even after io-wq destroy has been issued io-wq worker threads will
continue executing all left work items as usual, and may hang waiting
for I/O that won't ever complete (aka unbounded).

[<0>] pipe_read+0x306/0x450
[<0>] io_iter_do_read+0x1e/0x40
[<0>] io_read+0xd5/0x330
[<0>] io_issue_sqe+0xd21/0x18a0
[<0>] io_wq_submit_work+0x6c/0x140
[<0>] io_worker_handle_work+0x17d/0x400
[<0>] io_wqe_worker+0x2c0/0x330
[<0>] ret_from_fork+0x22/0x30

Cancel all unbounded I/O instead of executing them. This changes the
user visible behaviour, but that's inevitable as io-wq is not per task.
Suggested-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cd4b543154154cba055cf86f351441c2174d7f71.1617842918.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

c60eb049

io_uring: fix rw req completion · 97284637

由 Pavel Begunkov 提交于 4月 08, 2021

WARNING: at fs/io_uring.c:8578 io_ring_exit_work.cold+0x0/0x18

As reissuing is now passed back by REQ_F_REISSUE and kiocb_done()
internally uses __io_complete_rw(), it may stop after setting the flag
so leaving a dangling request.

There are tricky edge cases, e.g. reading beyound file, boundary, so
the easiest way is to hand code reissue in kiocb_done() as
__io_complete_rw() was doing for us before.

Fixes: 230d50d4 ("io_uring: move reissue into regular IO path")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f602250d292f8a84cca9a01d747744d1e797be26.1617842918.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

97284637

08 4月, 2021 4 次提交

io_uring: clear F_REISSUE right after getting it · 6ad7f233

由 Pavel Begunkov 提交于 4月 08, 2021

There are lots of ways r/w request may continue its path after getting
REQ_F_REISSUE, it's not necessarily io-wq and can be, e.g. apoll,
and submitted via io_async_task_func() -> __io_req_task_submit()

Clear the flag right after getting it, so the next attempt is well
prepared regardless how the request will be executed.

Fixes: 230d50d4 ("io_uring: move reissue into regular IO path")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/11dcead939343f4e27cab0074d34afcab771bfa4.1617842918.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

6ad7f233

cifs: escape spaces in share names · 0fc9322a

由 Maciek Borzecki 提交于 4月 06, 2021

Commit 653a5efb ("cifs: update super_operations to show_devname")
introduced the display of devname for cifs mounts. However, when mounting
a share which has a whitespace in the name, that exact share name is also
displayed in mountinfo. Make sure that all whitespace is escaped.
Signed-off-by: NMaciek Borzecki <maciek.borzecki@gmail.com>
CC: <stable@vger.kernel.org> # 5.11+
Reviewed-by: NShyam Prasad N <sprasad@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

0fc9322a

fs: cifs: Remove unnecessary struct declaration · d135be0a

由 Wan Jiabing 提交于 4月 01, 2021

struct cifs_readdata is declared twice. One is declared
at 208th line.
And struct cifs_readdata is defined blew.
The declaration here is not needed. Remove the duplicate.
Signed-off-by: NWan Jiabing <wanjiabing@vivo.com>
Reviewed-by: NShyam Prasad N <sprasad@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

d135be0a

cifs: On cifs_reconnect, resolve the hostname again. · 4e456b30

由 Shyam Prasad N 提交于 3月 31, 2021

On cifs_reconnect, make sure that DNS resolution happens again.
It could be the cause of connection to go dead in the first place.

This also contains the fix for a build issue identified by Intel bot.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NShyam Prasad N <sprasad@microsoft.com>
Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
CC: <stable@vger.kernel.org> # 5.11+
Signed-off-by: NSteve French <stfrench@microsoft.com>

4e456b30

07 4月, 2021 2 次提交

LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late · 4f0ed93f

由 Al Viro 提交于 4月 06, 2021

That (and traversals in case of umount .) should be done before
complete_walk().  Either a braino or mismerge damage on queue
reorders - either way, I should've spotted that much earlier.
Fucked-up-by: NAl Viro <viro@zeniv.linux.org.uk>
X-Paperbag: Brown
Fixes: 161aff1d "LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()"
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4f0ed93f

Make sure nd->path.mnt and nd->path.dentry are always valid pointers · 7d01ef75

由 Al Viro 提交于 4月 06, 2021

Initialize them in set_nameidata() and make sure that terminate_walk() clears them
once the pointers become potentially invalid (i.e. we leave RCU mode or drop them
in non-RCU one).  Currently we have "path_init() always initializes them and nobody
accesses them outside of path_init()/terminate_walk() segments", which is asking
for trouble.

With that change we would have nd->path.{mnt,dentry}
	1) always valid - NULL or pointing to currently allocated objects.
	2) non-NULL while we are successfully walking
	3) NULL when we are not walking at all
	4) contributing to refcounts whenever non-NULL outside of RCU mode.

Fixes: 6c6ec2b0 ("fs: add support for LOOKUP_CACHED")
Reported-by: syzbot+c88a7030da47945a3cc3@syzkaller.appspotmail.com
Tested-by: NChristian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7d01ef75

03 4月, 2021 1 次提交

io_uring: fix !CONFIG_BLOCK compilation failure · e82ad485

由 Jens Axboe 提交于 4月 02, 2021

kernel test robot correctly pinpoints a compilation failure if
CONFIG_BLOCK isn't set:

fs/io_uring.c: In function '__io_complete_rw':
>> fs/io_uring.c:2509:48: error: implicit declaration of function 'io_rw_should_reissue'; did you mean 'io_rw_reissue'? [-Werror=implicit-function-declaration]
    2509 |  if ((res == -EAGAIN || res == -EOPNOTSUPP) && io_rw_should_reissue(req)) {
         |                                                ^~~~~~~~~~~~~~~~~~~~
         |                                                io_rw_reissue
    cc1: some warnings being treated as errors

Ensure that we have a stub declaration of io_rw_should_reissue() for
!CONFIG_BLOCK.

Fixes: 230d50d4 ("io_uring: move reissue into regular IO path")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e82ad485

02 4月, 2021 3 次提交

io_uring: move reissue into regular IO path · 230d50d4

由 Jens Axboe 提交于 4月 01, 2021

It's non-obvious how retry is done for block backed files, when it happens
off the kiocb done path. It also makes it tricky to deal with the iov_iter
handling.

Just mark the req as needing a reissue, and handling it from the
submission path instead. This makes it directly obvious that we're not
re-importing the iovec from userspace past the submit point, and it means
that we can just reuse our usual -EAGAIN retry path from the read/write
handling.

At some point in the future, we'll gain the ability to always reliably
return -EAGAIN through the stack. A previous attempt on the block side
didn't pan out and got reverted, hence the need to check for this
information out-of-band right now.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

230d50d4

block: don't ignore REQ_NOWAIT for direct IO · f8b78caf

由 Pavel Begunkov 提交于 11月 20, 2020

If IOCB_NOWAIT is set on submission, then that needs to get propagated to
REQ_NOWAIT on the block side. Otherwise we completely lose this
information, and any issuer of IOCB_NOWAIT IO will potentially end up
blocking on eg request allocation on the storage side.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f8b78caf

file: fix close_range() for unshare+cloexec · 9b5b8722

由 Christian Brauner 提交于 4月 02, 2021

syzbot reported a bug when putting the last reference to a tasks file
descriptor table. Debugging this showed we didn't recalculate the
current maximum fd number for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC
after we unshared the file descriptors table. So max_fd could exceed the
current fdtable maximum causing us to set excessive bits. As a concrete
example, let's say the user requested everything from fd 4 to ~0UL to be
closed and their current fdtable size is 256 with their highest open fd
being 4. With CLOSE_RANGE_UNSHARE the caller will end up with a new
fdtable which has room for 64 file descriptors since that is the lowest
fdtable size we accept. But now max_fd will still point to 255 and needs
to be adjusted. Fix this by retrieving the correct maximum fd value in
__range_cloexec().

Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com
Fixes: 582f1fb6 ("fs, close_range: add flag CLOSE_RANGE_CLOEXEC")
Fixes: fec8a6a6 ("close_range: unshare all fds for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Giuseppe Scrivano <gscrivan@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

9b5b8722

01 4月, 2021 3 次提交

io_uring: fix EIOCBQUEUED iter revert · 07204f21

由 Pavel Begunkov 提交于 4月 01, 2021

iov_iter_revert() is done in completion handlers that happensf before
read/write returns -EIOCBQUEUED, no need to repeat reverting afterwards.
Moreover, even though it may appear being just a no-op, it's actually
races with 1) user forging a new iovec of a different size 2) reissue,
that is done via io-wq continues completely asynchronously.

Fixes: 3e6a0d3c ("io_uring: fix -EAGAIN retry with IOPOLL")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

07204f21

io_uring/io-wq: protect against sprintf overflow · 696ee88a

由 Pavel Begunkov 提交于 4月 01, 2021

task_pid may be large enough to not fit into the left space of
TASK_COMM_LEN-sized buffers and overflow in sprintf. We not so care
about uniqueness, so replace it with safer snprintf().
Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1702c6145d7e1c46fbc382f28334c02e1a3d3994.1617267273.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

696ee88a

io_uring: don't mark S_ISBLK async work as unbounded · 4b982bd0

由 Jens Axboe 提交于 4月 01, 2021

S_ISBLK is marked as unbounded work for async preparation, because it
doesn't match S_ISREG. That is incorrect, as any read/write to a block
device is also a bounded operation. Fix it up and ensure that S_ISBLK
isn't marked unbounded.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4b982bd0

31 3月, 2021 2 次提交

reiserfs: update reiserfs_xattrs_initialized() condition · 5e46d1b7

由 Tetsuo Handa 提交于 3月 21, 2021

syzbot is reporting NULL pointer dereference at reiserfs_security_init()
[1], for commit ab17c4f0 ("reiserfs: fixup xattr_root caching")
is assuming that REISERFS_SB(s)->xattr_root != NULL in
reiserfs_xattr_jcreate_nblocks() despite that commit made
REISERFS_SB(sb)->priv_root != NULL && REISERFS_SB(s)->xattr_root == NULL
case possible.

I guess that commit 6cb4aff0 ("reiserfs: fix oops while creating
privroot with selinux enabled") wanted to check xattr_root != NULL
before reiserfs_xattr_jcreate_nblocks(), for the changelog is talking
about the xattr root.

  The issue is that while creating the privroot during mount
  reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which
  dereferences the xattr root. The xattr root doesn't exist, so we get
  an oops.

Therefore, update reiserfs_xattrs_initialized() to check both the
privroot and the xattr root.

Link: https://syzkaller.appspot.com/bug?id=8abaedbdeb32c861dc5340544284167dd0e46cde # [1]
Reported-and-tested-by: Nsyzbot <syzbot+690cb1e51970435f9775@syzkaller.appspotmail.com>
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 6cb4aff0 ("reiserfs: fix oops while creating privroot with selinux enabled")
Acked-by: NJeff Mahoney <jeffm@suse.com>
Acked-by: NJan Kara <jack@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5e46d1b7

io_uring: drop sqd lock before handling signals for SQPOLL · 82734c5b

由 Jens Axboe 提交于 3月 29, 2021

Don't call into get_signal() with the sqd mutex held, it'll fail if we're
freezing the task and we'll get complaints on locks still being held:

====================================
WARNING: iou-sqp-8386/8387 still has locks held!
5.12.0-rc4-syzkaller #0 Not tainted
------------------------------------
1 lock held by iou-sqp-8386/8387:
 #0: ffff88801e1d2470 (&sqd->lock){+.+.}-{3:3}, at: io_sq_thread+0x24c/0x13a0 fs/io_uring.c:6731

 stack backtrace:
 CPU: 1 PID: 8387 Comm: iou-sqp-8386 Not tainted 5.12.0-rc4-syzkaller #0
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
 Call Trace:
  __dump_stack lib/dump_stack.c:79 [inline]
  dump_stack+0x141/0x1d7 lib/dump_stack.c:120
  try_to_freeze include/linux/freezer.h:66 [inline]
  get_signal+0x171a/0x2150 kernel/signal.c:2576
  io_sq_thread+0x8d2/0x13a0 fs/io_uring.c:6748

Fold the get_signal() case in with the parking checks, as we need to drop
the lock in both cases, and since we need to be checking for parking when
juggling the lock anyway.

Reported-by: syzbot+796d767eb376810256f5@syzkaller.appspotmail.com
Fixes: dbe1bdbb ("io_uring: handle signals for IO threads like a normal thread")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

82734c5b

29 3月, 2021 2 次提交

io_uring: handle setup-failed ctx in kill_timeouts · 51520426

由 Pavel Begunkov 提交于 3月 29, 2021

general protection fault, probably for non-canonical address
	0xdffffc0000000018: 0000 [#1] KASAN: null-ptr-deref
	in range [0x00000000000000c0-0x00000000000000c7]
RIP: 0010:io_commit_cqring+0x37f/0xc10 fs/io_uring.c:1318
Call Trace:
 io_kill_timeouts+0x2b5/0x320 fs/io_uring.c:8606
 io_ring_ctx_wait_and_kill+0x1da/0x400 fs/io_uring.c:8629
 io_uring_create fs/io_uring.c:9572 [inline]
 io_uring_setup+0x10da/0x2ae0 fs/io_uring.c:9599
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xae

It can get into wait_and_kill() before setting up ctx->rings, and hence
io_commit_cqring() fails. Mimic poll cancel and do it only when we
completed events, there can't be any requests if it failed before
initialising rings.

Fixes: 80c4cbdb ("io_uring: do post-completion chore on t-out cancel")
Reported-by: syzbot+0e905eb8228070c457a0@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/660261a48f0e7abf260c8e43c87edab3c16736fa.1617014345.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

51520426

io_uring: always go for cancellation spin on exec · 5a978dcf

由 Pavel Begunkov 提交于 3月 27, 2021

Always try to do cancellation in __io_uring_task_cancel() at least once,
so it actually goes and cleans its sqpoll tasks (i.e. via
io_sqpoll_cancel_sync()), otherwise sqpoll task may submit new requests
after cancellation and it's racy for many reasons.

Fixes: 521d6a73 ("io_uring: cancel sqpoll via task_work")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0a21bd6d794bb1629bc906dd57a57b2c2985a8ac.1616839147.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

5a978dcf

28 3月, 2021 6 次提交

io_uring: remove unsued assignment to pointer io · 2b8ed1c9

由 Colin Ian King 提交于 3月 26, 2021

There is an assignment to io that is never read after the assignment,
the assignment is redundant and can be removed.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2b8ed1c9

io_uring: don't cancel extra on files match · 78d9d7c2

由 Pavel Begunkov 提交于 3月 25, 2021

As tasks always wait and kill their io-wq on exec/exit, files are of no
more concern to us, so we don't need to specifically cancel them by hand
in those cases. Moreover we should not, because io_match_task() looks at
req->task->files now, which is always true and so leads to extra
cancellations, that wasn't a case before per-task io-wq.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0566c1de9b9dd417f5de345c817ca953580e0e2e.1616696997.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

78d9d7c2

io_uring: don't cancel-track common timeouts · 2482b58f

由 Pavel Begunkov 提交于 3月 25, 2021

Don't account usual timeouts (i.e. not linked) as REQ_F_INFLIGHT but
keep behaviour prior to dd59a3d5 ("io_uring: reliably cancel linked
timeouts").
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/104441ef5d97e3932113d44501fda0df88656b83.1616696997.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2482b58f

io_uring: do post-completion chore on t-out cancel · 80c4cbdb

由 Pavel Begunkov 提交于 3月 25, 2021

Don't forget about io_commit_cqring() + io_cqring_ev_posted() after
exit/exec cancelling timeouts. Both functions declared only after
io_kill_timeouts(), so to avoid tons of forward declarations move
it down.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/72ace588772c0f14834a6a4185d56c445a366fb4.1616696997.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

80c4cbdb

io_uring: fix timeout cancel return code · 1ee4160c

由 Pavel Begunkov 提交于 3月 25, 2021

When we cancel a timeout we should emit a sensible return code, like
-ECANCELED but not 0, otherwise it may trick users.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7b0ad1065e3bd1994722702bd0ba9e7bc9b0683b.1616696997.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1ee4160c

io_uring: handle signals for IO threads like a normal thread · dbe1bdbb

由 Jens Axboe 提交于 3月 25, 2021

We go through various hoops to disallow signals for the IO threads, but
there's really no reason why we cannot just allow them. The IO threads
never return to userspace like a normal thread, and hence don't go through
normal signal processing. Instead, just check for a pending signal as part
of the work loop, and call get_signal() to handle it for us if anything
is pending.

With that, we can support receiving signals, including special ones like
SIGSTOP.
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dbe1bdbb

27 3月, 2021 4 次提交

smb3: fix cached file size problems in duplicate extents (reflink) · cfc63fc8

由 Steve French 提交于 3月 26, 2021

There were two problems (one of which could cause data corruption)
that were noticed with duplicate extents (ie reflink)
when debugging why various xfstests were being incorrectly skipped
(e.g. generic/138, generic/140, generic/142). First, we were not
updating the file size locally in the cache when extending a
file due to reflink (it would refresh after actimeo expires)
but xfstest was checking the size immediately which was still
0 so caused the test to be skipped.  Second, we were setting
the target file size (which could shrink the file) in all cases
to the end of the reflinked range rather than only setting the
target file size when reflink would extend the file.

CC: <stable@vger.kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>

cfc63fc8

cifs: Silently ignore unknown oplock break handle · 219481a8

由 Vincent Whitchurch 提交于 3月 19, 2021

Make SMB2 not print out an error when an oplock break is received for an
unknown handle, similar to SMB1. The debug message which is printed for
these unknown handles may also be misleading, so fix that too.

The SMB2 lease break path is not affected by this patch.

Without this, a program which writes to a file from one thread, and
opens, reads, and writes the same file from another thread triggers the
below errors several times a minute when run against a Samba server
configured with "smb2 leases = no".

CIFS: VFS: \\192.168.0.1 No task to wake, unknown frame received! NumMids 2
00000000: 424d53fe 00000040 00000000 00000012 .SMB@...........
00000010: 00000001 00000000 ffffffff ffffffff ................
00000020: 00000000 00000000 00000000 00000000 ................
00000030: 00000000 00000000 00000000 00000000 ................
Signed-off-by: NVincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: NTom Talpey <tom@talpey.com>
Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: NSteve French <stfrench@microsoft.com>

219481a8

cifs: revalidate mapping when we open files for SMB1 POSIX · cee8f4f6

由 Ronnie Sahlberg 提交于 3月 25, 2021

RHBZ: 1933527

Under SMB1 + POSIX, if an inode is reused on a server after we have read and
cached a part of a file, when we then open the new file with the
re-cycled inode there is a chance that we may serve the old data out of cache
to the application.
This only happens for SMB1 (deprecated) and when posix are used.
The simplest solution to avoid this race is to force a revalidate
on smb1-posix open.
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: NSteve French <stfrench@microsoft.com>

cee8f4f6

cifs: Fix chmod with modefromsid when an older ACE already exists. · 3bffbe9e

由 Shyam Prasad N 提交于 3月 26, 2021

My recent fixes to cifsacl to maintain inherited ACEs had
regressed modefromsid when an older ACL already exists.

Found testing xfstest 495 with modefromsid mount option

Fixes: f5065508 ("cifs: Retain old ACEs when converting between mode bits and ACL")
Signed-off-by: NShyam Prasad N <sprasad@microsoft.com>
Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: NSteve French <stfrench@microsoft.com>

3bffbe9e

26 3月, 2021 2 次提交

cifs: Adjust key sizes and key generation routines for AES256 encryption · 45a4546c

由 Shyam Prasad N 提交于 3月 25, 2021

For AES256 encryption (GCM and CCM), we need to adjust the size of a few
fields to 32 bytes instead of 16 to accommodate the larger keys.

Also, the L value supplied to the key generator needs to be changed from
to 256 when these algorithms are used.

Keeping the ioctl struct for dumping keys of the same size for now.
Will send out a different patch for that one.
Signed-off-by: NShyam Prasad N <sprasad@microsoft.com>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
CC: <stable@vger.kernel.org> # v5.10+
Signed-off-by: NSteve French <stfrench@microsoft.com>

45a4546c

hostfs: fix memory handling in follow_link() · 7f6c411c

由 Al Viro 提交于 3月 25, 2021

1) argument should not be freed in any case - the caller already has
it as ->s_fs_info (and uses it a lot afterwards)
2) allocate readlink buffer with kmalloc() - the caller has no way
to tell if it's got that (on absolute symlink) or a result of
kasprintf().  Sure, for SLAB and SLUB kfree() works on results of
kmem_cache_alloc(), but that's not documented anywhere, might change
in the future *and* is already not true for SLOB.

Fixes: 52b209f7 ("get rid of hostfs_read_inode()")
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7f6c411c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功