提交 · 8137824eddd2e790c61c70c20d70a087faca95fa · openeuler / Kernel

29 3月, 2021 5 次提交

erofs: don't use erofs_map_blocks() any more · 8137824e

由 Yue Hu 提交于 3月 25, 2021

Currently, erofs_map_blocks() will be called only from
erofs_{bmap, read_raw_page} which are all for uncompressed files.
So, the compression branch in erofs_map_blocks() is pointless. Let's
remove it and use erofs_map_blocks_flatmode() directly. Also update
related comments.

Link: https://lore.kernel.org/r/20210325071008.573-1-zbestahu@gmail.comReviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NYue Hu <huyue2@yulong.com>
Signed-off-by: NGao Xiang <hsiangkao@redhat.com>

8137824e

erofs: complete a missing case for inplace I/O · 0b964600

由 Gao Xiang 提交于 3月 22, 2021

Add a missing case which could cause unnecessary page allocation but
not directly use inplace I/O instead, which increases runtime extra
memory footprint.

The detail is, considering an online file-backed page, the right half
of the page is chosen to be cached (e.g. the end page of a readahead
request) and some of its data doesn't exist in managed cache, so the
pcluster will be definitely kept in the submission chain. (IOWs, it
cannot be decompressed without I/O, e.g., due to the bypass queue).

Currently, DELAYEDALLOC/TRYALLOC cases can be downgraded as NOINPLACE,
and stop online pages from inplace I/O. After this patch, unneeded page
allocations won't be observed in pickup_page_for_submission() then.

Link: https://lore.kernel.org/r/20210321183227.5182-1-hsiangkao@aol.comSigned-off-by: NGao Xiang <hsiangkao@redhat.com>

0b964600

erofs: use sync decompression for atomic contexts only · 30048cda

由 Huang Jianan 提交于 3月 17, 2021

Sync decompression was introduced to get rid of additional kworker
scheduling overhead. But there is no such overhead in non-atomic
contexts. Therefore, it should be better to turn off sync decompression
to avoid the current thread waiting in z_erofs_runqueue.

Link: https://lore.kernel.org/r/20210317035448.13921-3-huangjianan@oppo.comReviewed-by: NGao Xiang <hsiangkao@redhat.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NHuang Jianan <huangjianan@oppo.com>
Signed-off-by: NGuo Weichao <guoweichao@oppo.com>
Signed-off-by: NGao Xiang <hsiangkao@redhat.com>

30048cda

erofs: use workqueue decompression for atomic contexts only · 648f2de0

由 Huang Jianan 提交于 3月 17, 2021

z_erofs_decompressqueue_endio may not be executed in the atomic
context, for example, when dm-verity is turned on. In this scenario,
data can be decompressed directly to get rid of additional kworker
scheduling overhead.

Link: https://lore.kernel.org/r/20210317035448.13921-2-huangjianan@oppo.comReviewed-by: NGao Xiang <hsiangkao@redhat.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NHuang Jianan <huangjianan@oppo.com>
Signed-off-by: NGuo Weichao <guoweichao@oppo.com>
Signed-off-by: NGao Xiang <hsiangkao@redhat.com>

648f2de0

erofs: avoid memory allocation failure during rolling decompression · b4892fa3

由 Huang Jianan 提交于 3月 16, 2021

Currently, err would be treated as io error. Therefore, it'd be
better to ensure memory allocation during rolling decompression
to avoid such io error.

In the long term, we might consider adding another !Uptodate case
for such case.

Link: https://lore.kernel.org/r/20210316031515.90954-1-huangjianan@oppo.comReviewed-by: NGao Xiang <hsiangkao@redhat.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NHuang Jianan <huangjianan@oppo.com>
Signed-off-by: NGuo Weichao <guoweichao@oppo.com>
Signed-off-by: NGao Xiang <hsiangkao@redhat.com>

b4892fa3

28 3月, 2021 6 次提交

io_uring: remove unsued assignment to pointer io · 2b8ed1c9

由 Colin Ian King 提交于 3月 26, 2021

There is an assignment to io that is never read after the assignment,
the assignment is redundant and can be removed.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2b8ed1c9

io_uring: don't cancel extra on files match · 78d9d7c2

由 Pavel Begunkov 提交于 3月 25, 2021

As tasks always wait and kill their io-wq on exec/exit, files are of no
more concern to us, so we don't need to specifically cancel them by hand
in those cases. Moreover we should not, because io_match_task() looks at
req->task->files now, which is always true and so leads to extra
cancellations, that wasn't a case before per-task io-wq.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0566c1de9b9dd417f5de345c817ca953580e0e2e.1616696997.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

78d9d7c2

io_uring: don't cancel-track common timeouts · 2482b58f

由 Pavel Begunkov 提交于 3月 25, 2021

Don't account usual timeouts (i.e. not linked) as REQ_F_INFLIGHT but
keep behaviour prior to dd59a3d5 ("io_uring: reliably cancel linked
timeouts").
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/104441ef5d97e3932113d44501fda0df88656b83.1616696997.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

2482b58f

io_uring: do post-completion chore on t-out cancel · 80c4cbdb

由 Pavel Begunkov 提交于 3月 25, 2021

Don't forget about io_commit_cqring() + io_cqring_ev_posted() after
exit/exec cancelling timeouts. Both functions declared only after
io_kill_timeouts(), so to avoid tons of forward declarations move
it down.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/72ace588772c0f14834a6a4185d56c445a366fb4.1616696997.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

80c4cbdb

io_uring: fix timeout cancel return code · 1ee4160c

由 Pavel Begunkov 提交于 3月 25, 2021

When we cancel a timeout we should emit a sensible return code, like
-ECANCELED but not 0, otherwise it may trick users.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7b0ad1065e3bd1994722702bd0ba9e7bc9b0683b.1616696997.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1ee4160c

io_uring: handle signals for IO threads like a normal thread · dbe1bdbb

由 Jens Axboe 提交于 3月 25, 2021

We go through various hoops to disallow signals for the IO threads, but
there's really no reason why we cannot just allow them. The IO threads
never return to userspace like a normal thread, and hence don't go through
normal signal processing. Instead, just check for a pending signal as part
of the work loop, and call get_signal() to handle it for us if anything
is pending.

With that, we can support receiving signals, including special ones like
SIGSTOP.
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

dbe1bdbb

27 3月, 2021 4 次提交

smb3: fix cached file size problems in duplicate extents (reflink) · cfc63fc8

由 Steve French 提交于 3月 26, 2021

There were two problems (one of which could cause data corruption)
that were noticed with duplicate extents (ie reflink)
when debugging why various xfstests were being incorrectly skipped
(e.g. generic/138, generic/140, generic/142). First, we were not
updating the file size locally in the cache when extending a
file due to reflink (it would refresh after actimeo expires)
but xfstest was checking the size immediately which was still
0 so caused the test to be skipped.  Second, we were setting
the target file size (which could shrink the file) in all cases
to the end of the reflinked range rather than only setting the
target file size when reflink would extend the file.

CC: <stable@vger.kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>

cfc63fc8

cifs: Silently ignore unknown oplock break handle · 219481a8

由 Vincent Whitchurch 提交于 3月 19, 2021

Make SMB2 not print out an error when an oplock break is received for an
unknown handle, similar to SMB1. The debug message which is printed for
these unknown handles may also be misleading, so fix that too.

The SMB2 lease break path is not affected by this patch.

Without this, a program which writes to a file from one thread, and
opens, reads, and writes the same file from another thread triggers the
below errors several times a minute when run against a Samba server
configured with "smb2 leases = no".

CIFS: VFS: \\192.168.0.1 No task to wake, unknown frame received! NumMids 2
00000000: 424d53fe 00000040 00000000 00000012 .SMB@...........
00000010: 00000001 00000000 ffffffff ffffffff ................
00000020: 00000000 00000000 00000000 00000000 ................
00000030: 00000000 00000000 00000000 00000000 ................
Signed-off-by: NVincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: NTom Talpey <tom@talpey.com>
Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: NSteve French <stfrench@microsoft.com>

219481a8

cifs: revalidate mapping when we open files for SMB1 POSIX · cee8f4f6

由 Ronnie Sahlberg 提交于 3月 25, 2021

RHBZ: 1933527

Under SMB1 + POSIX, if an inode is reused on a server after we have read and
cached a part of a file, when we then open the new file with the
re-cycled inode there is a chance that we may serve the old data out of cache
to the application.
This only happens for SMB1 (deprecated) and when posix are used.
The simplest solution to avoid this race is to force a revalidate
on smb1-posix open.
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: NSteve French <stfrench@microsoft.com>

cee8f4f6

cifs: Fix chmod with modefromsid when an older ACE already exists. · 3bffbe9e

由 Shyam Prasad N 提交于 3月 26, 2021

My recent fixes to cifsacl to maintain inherited ACEs had
regressed modefromsid when an older ACL already exists.

Found testing xfstest 495 with modefromsid mount option

Fixes: f5065508 ("cifs: Retain old ACEs when converting between mode bits and ACL")
Signed-off-by: NShyam Prasad N <sprasad@microsoft.com>
Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: NSteve French <stfrench@microsoft.com>

3bffbe9e

26 3月, 2021 5 次提交

cifs: Adjust key sizes and key generation routines for AES256 encryption · 45a4546c

由 Shyam Prasad N 提交于 3月 25, 2021

For AES256 encryption (GCM and CCM), we need to adjust the size of a few
fields to 32 bytes instead of 16 to accommodate the larger keys.

Also, the L value supplied to the key generator needs to be changed from
to 256 when these algorithms are used.

Keeping the ioctl struct for dumping keys of the same size for now.
Will send out a different patch for that one.
Signed-off-by: NShyam Prasad N <sprasad@microsoft.com>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
CC: <stable@vger.kernel.org> # v5.10+
Signed-off-by: NSteve French <stfrench@microsoft.com>

45a4546c

io_uring: maintain CQE order of a failed link · 90b87490

由 Pavel Begunkov 提交于 3月 25, 2021

Arguably we want CQEs of linked requests be in a strict order of
submission as it always was. Now if init of a request fails its CQE may
be posted before all prior linked requests including the head of the
link. Fix it by failing it last.

Fixes: de59bc10 ("io_uring: fail links more in io_submit_sqe()")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b7a96b05832e7ab23ad55f84092a2548c4a888b0.1616699075.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

90b87490

squashfs: fix xattr id and id lookup sanity checks · 8b44ca2b

由 Phillip Lougher 提交于 3月 24, 2021

The checks for maximum metadata block size is missing
SQUASHFS_BLOCK_OFFSET (the two byte length count).

Link: https://lkml.kernel.org/r/2069685113.2081245.1614583677427@webmail.123-reg.co.uk
Fixes: f37aa4c7 ("squashfs: add more sanity checks in id lookup")
Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
Cc: Sean Nyekjaer <sean@geanix.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8b44ca2b

squashfs: fix inode lookup sanity checks · c1b20283

由 Sean Nyekjaer 提交于 3月 24, 2021

When mouting a squashfs image created without inode compression it fails
with: "unable to read inode lookup table"

It turns out that the BLOCK_OFFSET is missing when checking the
SQUASHFS_METADATA_SIZE agaist the actual size.

Link: https://lkml.kernel.org/r/20210226092903.1473545-1-sean@geanix.com
Fixes: eabac19e ("squashfs: add more sanity checks in inode lookup")
Signed-off-by: NSean Nyekjaer <sean@geanix.com>
Acked-by: NPhillip Lougher <phillip@squashfs.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c1b20283

io-wq: fix race around pending work on teardown · f5d2d23b

由 Jens Axboe 提交于 3月 25, 2021

syzbot reports that it's triggering the warning condition on having
pending work on shutdown:

WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_destroy fs/io-wq.c:1061 [inline]
WARNING: CPU: 1 PID: 12346 at fs/io-wq.c:1061 io_wq_put+0x153/0x260 fs/io-wq.c:1072
Modules linked in:
CPU: 1 PID: 12346 Comm: syz-executor.5 Not tainted 5.12.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:io_wq_destroy fs/io-wq.c:1061 [inline]
RIP: 0010:io_wq_put+0x153/0x260 fs/io-wq.c:1072
Code: 8d e8 71 90 ea 01 49 89 c4 41 83 fc 40 7d 4f e8 33 4d 97 ff 42 80 7c 2d 00 00 0f 85 77 ff ff ff e9 7a ff ff ff e8 1d 4d 97 ff <0f> 0b eb b9 8d 6b ff 89 ee 09 de bf ff ff ff ff e8 18 51 97 ff 09
RSP: 0018:ffffc90001ebfb08 EFLAGS: 00010293
RAX: ffffffff81e16083 RBX: ffff888019038040 RCX: ffff88801e86b780
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000040
RBP: 1ffff1100b2f8a80 R08: ffffffff81e15fce R09: ffffed100b2f8a82
R10: ffffed100b2f8a82 R11: 0000000000000000 R12: 0000000000000000
R13: dffffc0000000000 R14: ffff8880597c5400 R15: ffff888019038000
FS:  00007f8dcd89c700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055e9a054e160 CR3: 000000001dfb8000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 io_uring_clean_tctx+0x1b7/0x210 fs/io_uring.c:8802
 __io_uring_files_cancel+0x13c/0x170 fs/io_uring.c:8820
 io_uring_files_cancel include/linux/io_uring.h:47 [inline]
 do_exit+0x258/0x2340 kernel/exit.c:780
 do_group_exit+0x168/0x2d0 kernel/exit.c:922
 get_signal+0x1734/0x1ef0 kernel/signal.c:2773
 arch_do_signal_or_restart+0x3c/0x610 arch/x86/kernel/signal.c:811
 handle_signal_work kernel/entry/common.c:147 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0xac/0x1e0 kernel/entry/common.c:208
 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
 syscall_exit_to_user_mode+0x48/0x180 kernel/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x465f69

which shouldn't happen, but seems to be possible due to a race on whether
or not the io-wq manager sees a fatal signal first, or whether the io-wq
workers do. If we race with queueing work and then send a fatal signal to
the owning task, and the io-wq worker sees that before the manager sets
IO_WQ_BIT_EXIT, then it's possible to have the worker exit and leave work
behind.

Just turn the WARN_ON_ONCE() into a cancelation condition instead.

Reported-by: syzbot+77a738a6bc947bf639ca@syzkaller.appspotmail.com
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f5d2d23b

25 3月, 2021 1 次提交

cachefiles: do not yet allow on idmapped mounts · bf1c82a5

由 Christian Brauner 提交于 3月 24, 2021

Based on discussions (e.g. in [1]) my understanding of cachefiles and
the cachefiles userspace daemon is that it creates a cache on a local
filesystem (e.g. ext4, xfs etc.) for a network filesystem. The way this
is done is by writing "bind" to /dev/cachefiles and pointing it to the
directory to use as the cache.

Currently this directory can technically also be an idmapped mount but
cachefiles aren't yet fully aware of such mounts and thus don't take the
idmapping into account when creating cache entries. This could leave
users confused as the ownership of the files wouldn't match to what they
expressed in the idmapping. Block cache files on idmapped mounts until
the fscache rework is done and we have ported it to support idmapped
mounts.
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/lkml/20210303161528.n3jzg66ou2wa43qb@wittgenstein [1]
Link: https://lore.kernel.org/r/20210316112257.2974212-1-christian.brauner@ubuntu.com/ # v1
Link: https://listman.redhat.com/archives/linux-cachefs/2021-March/msg00044.html # v2
Link: https://lore.kernel.org/r/20210319114146.410329-1-christian.brauner@ubuntu.com/ # v3
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bf1c82a5

24 3月, 2021 3 次提交

io_uring: do ctx sqd ejection in a clear context · a185f1db

由 Pavel Begunkov 提交于 3月 23, 2021

WARNING: CPU: 1 PID: 27907 at fs/io_uring.c:7147 io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147
CPU: 1 PID: 27907 Comm: iou-sqp-27905 Not tainted 5.12.0-rc4-syzkaller #0
RIP: 0010:io_sq_thread_park+0xb5/0xd0 fs/io_uring.c:7147
Call Trace:
 io_ring_ctx_wait_and_kill+0x214/0x700 fs/io_uring.c:8619
 io_uring_release+0x3e/0x50 fs/io_uring.c:8646
 __fput+0x288/0x920 fs/file_table.c:280
 task_work_run+0xdd/0x1a0 kernel/task_work.c:140
 io_run_task_work fs/io_uring.c:2238 [inline]
 io_run_task_work fs/io_uring.c:2228 [inline]
 io_uring_try_cancel_requests+0x8ec/0xc60 fs/io_uring.c:8770
 io_uring_cancel_sqpoll+0x1cf/0x290 fs/io_uring.c:8974
 io_sqpoll_cancel_cb+0x87/0xb0 fs/io_uring.c:8907
 io_run_task_work_head+0x58/0xb0 fs/io_uring.c:1961
 io_sq_thread+0x3e2/0x18d0 fs/io_uring.c:6763
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

May happen that last ctx ref is killed in io_uring_cancel_sqpoll(), so
fput callback (i.e. io_uring_release()) is enqueued through task_work,
and run by same cancellation. As it's deeply nested we can't do parking
or taking sqd->lock there, because its state is unclear. So avoid
ctx ejection from sqd list from io_ring_ctx_wait_and_kill() and do it
in a clear context in io_ring_exit_work().

Fixes: f6d54255 ("io_uring: halt SQO submission on ctx exit")
Reported-by: syzbot+e3a3f84f5cecf61f0583@syzkaller.appspotmail.com
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e90df88b8ff2cabb14a7534601d35d62ab4cb8c7.1616496707.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

a185f1db

afs: Use wait_on_page_writeback_killable · 75b69799

由 Matthew Wilcox (Oracle) 提交于 3月 20, 2021

Open-coding this function meant it missed out on the recent bugfix
for waiters being woken by a delayed wake event from a previous
instantiation of the page[1].

[DH: Changed the patch to use vmf->page rather than variable page which
 doesn't exist yet upstream]

Fixes: 1cf7a151 ("afs: Implement shared-writeable mmap")
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: kafs-testing@auristor.com
cc: linux-afs@lists.infradead.org
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20210320054104.1300774-4-willy@infradead.org
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7 [1]

75b69799

fs/cachefiles: Remove wait_bit_key layout dependency · 39f985c8

由 Matthew Wilcox (Oracle) 提交于 3月 20, 2021

Cachefiles was relying on wait_page_key and wait_bit_key being the
same layout, which is fragile.  Now that wait_page_key is exposed in
the pagemap.h header, we can remove that fragility

A comment on the need to maintain structure layout equivalence was added by
Linus[1] and that is no longer applicable.

Fixes: 62906027 ("mm: add PageWaiters indicating tasks are waiting for a page bit")
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: kafs-testing@auristor.com
cc: linux-cachefs@redhat.com
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/20210320054104.1300774-2-willy@infradead.org/
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3510ca20ece0150af6b10c77a74ff1b5c198e3e2 [1]

39f985c8

23 3月, 2021 1 次提交

block: clear GD_NEED_PART_SCAN later in bdev_disk_changed · 51167840

由 Chris Chiu 提交于 3月 23, 2021

The GD_NEED_PART_SCAN is set by bdev_check_media_change to initiate
a partition scan while removing a block device. It should be cleared
after blk_drop_paritions because blk_drop_paritions could return
-EBUSY and then the consequence __blkdev_get has no chance to do
delete_partition if GD_NEED_PART_SCAN already cleared.

It causes some problems on some card readers. Ex. Realtek card
reader 0bda:0328 and 0bda:0158. The device node of the partition
will not disappear after the memory card removed. Thus the user
applications can not update the device mapping correctly.

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1920874Signed-off-by: NChris Chiu <chris.chiu@canonical.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210323085219.24428-1-chris.chiu@canonical.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

51167840

22 3月, 2021 4 次提交

io_uring: fix provide_buffers sign extension · d81269fe

由 Pavel Begunkov 提交于 3月 19, 2021

io_provide_buffers_prep()'s "p->len * p->nbufs" to sign extension
problems. Not a huge problem as it's only used for access_ok() and
increases the checked length, but better to keep typing right.
Reported-by: NColin Ian King <colin.king@canonical.com>
Fixes: efe68c1c ("io_uring: validate the full range of provided buffers for access")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NColin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/562376a39509e260d8532186a06226e56eb1f594.1616149233.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d81269fe

io_uring: don't skip file_end_write() on reissue · b65c128f

由 Pavel Begunkov 提交于 3月 22, 2021

Don't miss to call kiocb_end_write() from __io_complete_rw() on reissue.
Shouldn't be much of a problem as the function actually does some work
only for ISREG, and NONBLOCK won't be reissued.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/32af9b77c5b874e1bee1a3c46396094bd969e577.1616366969.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

b65c128f

io_uring: correct io_queue_async_work() traces · d07f1e8a

由 Pavel Begunkov 提交于 3月 22, 2021

Request's io-wq work is hashed in io_prep_async_link(), so
as trace_io_uring_queue_async_work() looks at it should follow after
prep has been done.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/709c9f872f4d2e198c7aed9c49019ca7095dd24d.1616366969.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

d07f1e8a

io_uring: don't use {test,clear}_tsk_thread_flag() for current · 0b8cfa97

由 Jens Axboe 提交于 3月 21, 2021

Linus correctly points out that this is both unnecessary and generates
much worse code on some archs as going from current to thread_info is
actually backwards - and obviously just wasteful, since the thread_info
is what we care about.

Since io_uring only operates on current for these operations, just use
test_thread_flag() instead. For io-wq, we can further simplify and use
tracehook_notify_signal() to handle the TIF_NOTIFY_SIGNAL work and clear
the flag. The latter isn't an actual bug right now, but it may very well
be in the future if we place other work items under TIF_NOTIFY_SIGNAL.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/io-uring/CAHk-=wgYhNck33YHKZ14mFB5MzTTk8gqXHcfj=RWTAXKwgQJgg@mail.gmail.com/Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b8cfa97

21 3月, 2021 10 次提交

io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL · 0031275d

由 Stefan Metzmacher 提交于 3月 20, 2021

Without that it's not safe to use them in a linked combination with
others.

Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
should be possible.

We already handle short reads and writes for the following opcodes:

- IORING_OP_READV
- IORING_OP_READ_FIXED
- IORING_OP_READ
- IORING_OP_WRITEV
- IORING_OP_WRITE_FIXED
- IORING_OP_WRITE
- IORING_OP_SPLICE
- IORING_OP_TEE

Now we have it for these as well:

- IORING_OP_SENDMSG
- IORING_OP_SEND
- IORING_OP_RECVMSG
- IORING_OP_RECV

For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
flags in order to call req_set_fail_links().

There might be applications arround depending on the behavior
that even short send[msg]()/recv[msg]() retuns continue an
IOSQE_IO_LINK chain.

It's very unlikely that such applications pass in MSG_WAITALL,
which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.

It's expected that the low level sock_sendmsg() call just ignores
MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
SO_ZEROCOPY.

We also expect the caller to know about the implicit truncation to
MAX_RW_COUNT, which we don't detect.

cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.orgSigned-off-by: NStefan Metzmacher <metze@samba.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0031275d

io-wq: ensure task is running before processing task_work · 00ddff43

由 Jens Axboe 提交于 3月 21, 2021

Mark the current task as running if we need to run task_work from the
io-wq threads as part of work handling. If that is the case, then return
as such so that the caller can appropriately loop back and reset if it
was part of a going-to-sleep flush.

Fixes: 3bfe6106 ("io-wq: fork worker threads from original task")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

00ddff43

T
ext4: initialize ret to suppress smatch warning · 64395d95
由 Theodore Ts'o 提交于 3月 21, 2021
```
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
64395d95

ext4: stop inode update before return · 512c15ef

由 Pan Bian 提交于 1月 17, 2021

The inode update should be stopped before returing the error code.
Signed-off-by: NPan Bian <bianpan2016@163.com>
Link: https://lore.kernel.org/r/20210117085732.93788-1-bianpan2016@163.com
Fixes: 8016e29f ("ext4: fast commit recovery path")
Cc: stable@kernel.org
Reviewed-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

512c15ef

ext4: fix rename whiteout with fast commit · 8210bb29

由 Harshad Shirwadkar 提交于 3月 16, 2021

This patch adds rename whiteout support in fast commits. Note that the
whiteout object that gets created is actually char device. Which
imples, the function ext4_inode_journal_mode(struct inode *inode)
would return "JOURNAL_DATA" for this inode. This has a consequence in
fast commit code that it will make creation of the whiteout object a
fast-commit ineligible behavior and thus will fall back to full
commits. With this patch, this can be observed by running fast commits
with rename whiteout and seeing the stats generated by ext4_fc_stats
tracepoint as follows:

ext4_fc_stats: dev 254:32 fc ineligible reasons:
XATTR:0, CROSS_RENAME:0, JOURNAL_FLAG_CHANGE:0, NO_MEM:0, SWAP_BOOT:0,
RESIZE:0, RENAME_DIR:0, FALLOC_RANGE:0, INODE_JOURNAL_DATA:16;
num_commits:6, ineligible: 6, numblks: 3

So in short, this patch guarantees that in case of rename whiteout, we
fall back to full commits.

Amir mentioned that instead of creating a new whiteout object for
every rename, we can create a static whiteout object with irrelevant
nlink. That will make fast commits to not fall back to full
commit. But until this happens, this patch will ensure correctness by
falling back to full commits.

Fixes: 8016e29f ("ext4: fast commit recovery path")
Cc: stable@kernel.org
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20210316221921.1124955-1-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

8210bb29

ext4: fix timer use-after-free on failed mount · 2a4ae3bc

由 Jan Kara 提交于 3月 15, 2021

When filesystem mount fails because of corrupted filesystem we first
cancel the s_err_report timer reminding fs errors every day and only
then we flush s_error_work. However s_error_work may report another fs
error and re-arm timer thus resulting in timer use-after-free. Fix the
problem by first flushing the work and only after that canceling the
s_err_report timer.

Reported-by: syzbot+628472a2aac693ab0fcd@syzkaller.appspotmail.com
Fixes: 2d01ddc8 ("ext4: save error info to sb through journal if available")
CC: stable@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210315165906.2175-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

2a4ae3bc

ext4: fix potential error in ext4_do_update_inode · 7d8bd3c7

由 Shijie Luo 提交于 3月 12, 2021

If set_large_file = 1 and errors occur in ext4_handle_dirty_metadata(),
the error code will be overridden, go to out_brelse to avoid this
situation.
Signed-off-by: NShijie Luo <luoshijie1@huawei.com>
Link: https://lore.kernel.org/r/20210312065051.36314-1-luoshijie1@huawei.com
Cc: stable@kernel.org
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

7d8bd3c7

ext4: do not try to set xattr into ea_inode if value is empty · 6b224899

由 zhangyi (F) 提交于 3月 05, 2021

Syzbot report a warning that ext4 may create an empty ea_inode if set
an empty extent attribute to a file on the file system which is no free
blocks left.

  WARNING: CPU: 6 PID: 10667 at fs/ext4/xattr.c:1640 ext4_xattr_set_entry+0x10f8/0x1114 fs/ext4/xattr.c:1640
  ...
  Call trace:
   ext4_xattr_set_entry+0x10f8/0x1114 fs/ext4/xattr.c:1640
   ext4_xattr_block_set+0x1d0/0x1b1c fs/ext4/xattr.c:1942
   ext4_xattr_set_handle+0x8a0/0xf1c fs/ext4/xattr.c:2390
   ext4_xattr_set+0x120/0x1f0 fs/ext4/xattr.c:2491
   ext4_xattr_trusted_set+0x48/0x5c fs/ext4/xattr_trusted.c:37
   __vfs_setxattr+0x208/0x23c fs/xattr.c:177
  ...

Now, ext4 try to store extent attribute into an external inode if
ext4_xattr_block_set() return -ENOSPC, but for the case of store an
empty extent attribute, store the extent entry into the extent
attribute block is enough. A simple reproduce below.

  fallocate test.img -l 1M
  mkfs.ext4 -F -b 2048 -O ea_inode test.img
  mount test.img /mnt
  dd if=/dev/zero of=/mnt/foo bs=2048 count=500
  setfattr -n "user.test" /mnt/foo

Reported-by: syzbot+98b881fdd8ebf45ab4ae@syzkaller.appspotmail.com
Fixes: 9c6e7853 ("ext4: reserve space for xattr entries/names")
Cc: stable@kernel.org
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210305120508.298465-1-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

6b224899

ext4: do not iput inode under running transaction in ext4_rename() · 5dccdc5a

由 zhangyi (F) 提交于 3月 03, 2021

In ext4_rename(), when RENAME_WHITEOUT failed to add new entry into
directory, it ends up dropping new created whiteout inode under the
running transaction. After commit <9b88f9fb> ("ext4: Do not iput inode
under running transaction"), we follow the assumptions that evict() does
not get called from a transaction context but in ext4_rename() it breaks
this suggestion. Although it's not a real problem, better to obey it, so
this patch add inode to orphan list and stop transaction before final
iput().
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210303131703.330415-2-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

5dccdc5a

ext4: find old entry again if failed to rename whiteout · b7ff91fd

由 zhangyi (F) 提交于 3月 03, 2021

If we failed to add new entry on rename whiteout, we cannot reset the
old->de entry directly, because the old->de could have moved from under
us during make indexed dir. So find the old entry again before reset is
needed, otherwise it may corrupt the filesystem as below.

  /dev/sda: Entry '00000001' in ??? (12) has deleted/unused inode 15. CLEARED.
  /dev/sda: Unattached inode 75
  /dev/sda: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

Fixes: 6b4b8e6b ("ext4: fix bug for rename with RENAME_WHITEOUT")
Cc: stable@vger.kernel.org
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210303131703.330415-1-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

b7ff91fd

20 3月, 2021 1 次提交

cifs: fix allocation size on newly created files · 65af8f01

由 Steve French 提交于 3月 19, 2021

Applications that create and extend and write to a file do not
expect to see 0 allocation size.  When file is extended,
set its allocation size to a plausible value until we have a
chance to query the server for it.  When the file is cached
this will prevent showing an impossible number of allocated
blocks (like 0).  This fixes e.g. xfstests 614 which does

    1) create a file and set its size to 64K
    2) mmap write 64K to the file
    3) stat -c %b for the file (to query the number of allocated blocks)

It was failing because we returned 0 blocks.  Even though we would
return the correct cached file size, we returned an impossible
allocation size.
Signed-off-by: NSteve French <stfrench@microsoft.com>
CC: <stable@vger.kernel.org>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>

65af8f01

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功