提交 · 4c67dbea4d2b335634ab37daa148dca8e0e18555 · openanolis / cloud-kernel

13 12月, 2019 27 次提交

iomap: partially revert 4721a601099 (simulated directio short read on EFAULT) · 4c67dbea

由 Darrick J. Wong 提交于 12月 02, 2018

[ Upstream commit 8f67b5adc030553fbc877124306f3f3bdab89aa8 ]

In commit 4721a601099, we tried to fix a problem wherein directio reads
into a splice pipe will bounce EFAULT/EAGAIN all the way out to
userspace by simulating a zero-byte short read.  This happens because
some directio read implementations (xfs) will call
bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
reads, but as soon as we run out of pipe buffers that _get_pages call
returns EFAULT, which the splice code translates to EAGAIN and bounces
out to userspace.

In that commit, the iomap code catches the EFAULT and simulates a
zero-byte read, but that causes assertion errors on regular splice reads
because xfs doesn't allow short directio reads.  This causes infinite
splice() loops and assertion failures on generic/095 on overlayfs
because xfs only permit total success or total failure of a directio
operation.  The underlying issue in the pipe splice code has now been
fixed by changing the pipe splice loop to avoid avoid reading more data
than there is space in the pipe.

Therefore, it's no longer necessary to simulate the short directio, so
remove the hack from iomap.

Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill")
Reported-by: NMurphy Zhou <jencce.kernel@gmail.com>
Ranted-by: NAmir Goldstein <amir73il@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

4c67dbea

splice: don't read more than available pipe space · 019b6325

由 Darrick J. Wong 提交于 11月 30, 2018

[ Upstream commit 17614445576b6af24e9cf36607c6448164719c96 ]

In commit 4721a601099, we tried to fix a problem wherein directio reads
into a splice pipe will bounce EFAULT/EAGAIN all the way out to
userspace by simulating a zero-byte short read.  This happens because
some directio read implementations (xfs) will call
bio_iov_iter_get_pages to grab pipe buffer pages and issue asynchronous
reads, but as soon as we run out of pipe buffers that _get_pages call
returns EFAULT, which the splice code translates to EAGAIN and bounces
out to userspace.

In that commit, the iomap code catches the EFAULT and simulates a
zero-byte read, but that causes assertion errors on regular splice reads
because xfs doesn't allow short directio reads.

The brokenness is compounded by splice_direct_to_actor immediately
bailing on do_splice_to returning <= 0 without ever calling ->actor
(which empties out the pipe), so if userspace calls back we'll EFAULT
again on the full pipe, and nothing ever gets copied.

Therefore, teach splice_direct_to_actor to clamp its requests to the
amount of free space in the pipe and remove the simulated short read
from the iomap directio code.

Fixes: 4721a601099 ("iomap: dio data corruption and spurious errors when pipes fill")
Reported-by: NMurphy Zhou <jencce.kernel@gmail.com>
Ranted-by: NAmir Goldstein <amir73il@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

019b6325

iomap: Fix pipe page leakage during splicing · b59116ff

由 Jan Kara 提交于 11月 21, 2019

commit 419e9c38aa075ed0cd3c13d47e15954b686bcdb6 upstream.

When splicing using iomap_dio_rw() to a pipe, we may leak pipe pages
because bio_iov_iter_get_pages() records that the pipe will have full
extent worth of data however if file size is not block size aligned
iomap_dio_rw() returns less than what bio_iov_iter_get_pages() set up
and splice code gets confused leaking a pipe page with the file tail.

Handle the situation similarly to the old direct IO implementation and
revert iter to actually returned read amount which makes iter consistent
with value returned from iomap_dio_rw() and thus the splice code is
happy.

Fixes: ff6a9292 ("iomap: implement direct I/O")
CC: stable@vger.kernel.org
Reported-by: syzbot+991400e8eba7e00a26e1@syzkaller.appspotmail.com
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b59116ff

kernfs: fix ino wrap-around detection · 18493bac

由 Tejun Heo 提交于 11月 04, 2019

commit e23f568aa63f64cd6b355094224cc9356c0f696b upstream.

When the 32bit ino wraps around, kernfs increments the generation
number to distinguish reused ino instances.  The wrap-around detection
tests whether the allocated ino is lower than what the cursor but the
cursor is pointing to the next ino to allocate so the condition never
triggers.

Fix it by remembering the last ino and comparing against that.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Fixes: 4a3ef68a ("kernfs: implement i_generation")
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

18493bac

CIFS: Fix SMB2 oplock break processing · d4785d88

由 Pavel Shilovsky 提交于 10月 31, 2019

commit fa9c2362497fbd64788063288dc4e74daf977ebb upstream.

Even when mounting modern protocol version the server may be
configured without supporting SMB2.1 leases and the client
uses SMB2 oplock to optimize IO performance through local caching.

However there is a problem in oplock break handling that leads
to missing a break notification on the client who has a file
opened. It latter causes big latencies to other clients that
are trying to open the same file.

The problem reproduces when there are multiple shares from the
same server mounted on the client. The processing code tries to
match persistent and volatile file ids from the break notification
with an open file but it skips all share besides the first one.
Fix this by looking up in all shares belonging to the server that
issued the oplock break.

Cc: Stable <stable@vger.kernel.org>
Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d4785d88

CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locks · df871e55

由 Pavel Shilovsky 提交于 11月 27, 2019

commit 6f582b273ec23332074d970a7fb25bef835df71f upstream.

Currently when the client creates a cifsFileInfo structure for
a newly opened file, it allocates a list of byte-range locks
with a pointer to the new cfile and attaches this list to the
inode's lock list. The latter happens before initializing all
other fields, e.g. cfile->tlink. Thus a partially initialized
cifsFileInfo structure becomes available to other threads that
walk through the inode's lock list. One example of such a thread
may be an oplock break worker thread that tries to push all
cached byte-range locks. This causes NULL-pointer dereference
in smb2_push_mandatory_locks() when accessing cfile->tlink:

[598428.945633] BUG: kernel NULL pointer dereference, address: 0000000000000038
...
[598428.945749] Workqueue: cifsoplockd cifs_oplock_break [cifs]
[598428.945793] RIP: 0010:smb2_push_mandatory_locks+0xd6/0x5a0 [cifs]
...
[598428.945834] Call Trace:
[598428.945870]  ? cifs_revalidate_mapping+0x45/0x90 [cifs]
[598428.945901]  cifs_oplock_break+0x13d/0x450 [cifs]
[598428.945909]  process_one_work+0x1db/0x380
[598428.945914]  worker_thread+0x4d/0x400
[598428.945921]  kthread+0x104/0x140
[598428.945925]  ? process_one_work+0x380/0x380
[598428.945931]  ? kthread_park+0x80/0x80
[598428.945937]  ret_from_fork+0x35/0x40

Fix this by reordering initialization steps of the cifsFileInfo
structure: initialize all the fields first and then add the new
byte-range lock list to the inode's lock list.

Cc: Stable <stable@vger.kernel.org>
Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

df871e55

fuse: verify attributes · 710c33ad

由 Miklos Szeredi 提交于 11月 12, 2019

commit eb59bd17d2fa6e5e84fba61a5ebdea984222e6d5 upstream.

If a filesystem returns negative inode sizes, future reads on the file were
causing the cpu to spin on truncate_pagecache.

Create a helper to validate the attributes.  This now does two things:

 - check the file mode
 - check if the file size fits in i_size without overflowing
Reported-by: NArijit Banerjee <arijit@rubrik.com>
Fixes: d8a5ba45 ("[PATCH] FUSE - core")
Cc: <stable@vger.kernel.org> # v2.6.14
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

710c33ad

fuse: verify nlink · 9f435a5e

由 Miklos Szeredi 提交于 11月 12, 2019

commit c634da718db9b2fac201df2ae1b1b095344ce5eb upstream.

When adding a new hard link, make sure that i_nlink doesn't overflow.

Fixes: ac45d613 ("fuse: fix nlink after unlink")
Cc: <stable@vger.kernel.org> # v3.4
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

9f435a5e

nfsd: Return EPERM, not EACCES, in some SETATTR cases · 1ff89e6d

由 zhengbin 提交于 11月 30, 2018

[ Upstream commit 255fbca65137e25b12bced18ec9a014dc77ecda0 ]

As the man(2) page for utime/utimes states, EPERM is returned when the
second parameter of utime or utimes is not NULL, the caller's effective UID
does not match the owner of the file, and the caller is not privileged.

However, in a NFS directory mounted from knfsd, it will return EACCES
(from nfsd_setattr-> fh_verify->nfsd_permission).  This patch fixes
that.
Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

1ff89e6d

pstore/ram: Avoid NULL deref in ftrace merging failure path · 6b6f6030

由 Kees Cook 提交于 12月 03, 2018

[ Upstream commit 8665569e97dd52920713b95675409648986b5b0d ]

Given corruption in the ftrace records, it might be possible to allocate
tmp_prz without assigning prz to it, but still marking it as needing to
be freed, which would cause at least a NULL dereference.

smatch warnings:
fs/pstore/ram.c:340 ramoops_pstore_read() error: we previously assumed 'prz' could be null (see line 255)

https://lists.01.org/pipermail/kbuild-all/2018-December/055528.htmlReported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: 2fbea82b ("pstore: Merge per-CPU ftrace records into one")
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

6b6f6030

dlm: fix invalid cluster name warning · 446a04d8

由 David Teigland 提交于 12月 03, 2018

[ Upstream commit 3595c559326d0b660bb088a88e22e0ca630a0e35 ]

The warning added in commit 3b0e761b
  "dlm: print log message when cluster name is not set"

did not account for the fact that lockspaces created
from userland do not supply a cluster name, so bogus
warnings are printed every time a userland lockspace
is created.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

446a04d8

nfsd: fix a warning in __cld_pipe_upcall() · 3ff6af8e

由 Scott Mayhew 提交于 11月 06, 2018

[ Upstream commit b493fd31c0b89d9453917e977002de58bebc3802 ]

__cld_pipe_upcall() emits a "do not call blocking ops when
!TASK_RUNNING" warning due to the dput() call in rpc_queue_upcall().
Fix it by using a completion instead of hand coding the wait.
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

3ff6af8e

dlm: NULL check before kmem_cache_destroy is not needed · 3b0107ca

由 Wen Yang 提交于 11月 28, 2018

[ Upstream commit f31a89692830061bceba8469607e4e4b0f900159 ]

kmem_cache_destroy(NULL) is safe, so removes NULL check before
freeing the mem. This patch also fix ifnullfree.cocci warnings.
Signed-off-by: NWen Yang <wen.yang99@zte.com.cn>
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

3b0107ca

lockd: fix decoding of TEST results · a2b7010f

由 J. Bruce Fields 提交于 11月 26, 2018

[ Upstream commit b8db159239b3f51e2b909859935cc25cb3ff3eed ]

We fail to advance the read pointer when reading the stat.oh field that
identifies the lock-holder in a TEST result.

This turns out not to matter if the server is knfsd, which always
returns a zero-length field.  But other servers (Ganesha is an example)
may not do this.  The result is bad values in fcntl F_GETLK results.

Fix this.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

a2b7010f

f2fs: fix to allow node segment for GC by ioctl path · c8aa27cf

由 Sahitya Tummala 提交于 11月 26, 2018

[ Upstream commit 08ac9a3870f6babb2b1fff46118536ca8a71ef19 ]

Allow node type segments also to be GC'd via f2fs ioctl
F2FS_IOC_GARBAGE_COLLECT_RANGE.
Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

c8aa27cf

f2fs: change segment to section in f2fs_ioc_gc_range · 313f1fef

由 Yunlong Song 提交于 10月 30, 2018

[ Upstream commit 67b0e42b768c9ddc3fd5ca1aee3db815cfaa635c ]

f2fs_ioc_gc_range skips blocks_per_seg each time, however, f2fs_gc moves
blocks of section each time, so fix it from segment to section.
Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

313f1fef

f2fs: fix count of seg_freed to make sec_freed correct · 859c93a0

由 Yunlong Song 提交于 10月 24, 2018

[ Upstream commit d6c66cd19ef322fe0d51ba09ce1b7f386acab04a ]

When sbi->segs_per_sec > 1, and if some segno has 0 valid blocks before
gc starts, do_garbage_collect will skip counting seg_freed++, and this
will cause seg_freed < sbi->segs_per_sec and finally skip sec_freed++.
Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

859c93a0

f2fs: fix to account preflush command for noflush_merge mode · c1054aeb

由 Chao Yu 提交于 10月 24, 2018

[ Upstream commit a8075dc484cf10ebdb07bee2b17322fb0a846309 ]

Previously, we only account preflush command for flush_merge mode,
so for noflush_merge mode, we can not know in-flight preflush
command count, fix it.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

c1054aeb

iomap: readpages doesn't zero page tail beyond EOF · c3a62b65

由 Dave Chinner 提交于 11月 21, 2018

[ Upstream commit 8c110d43c6bca4b24dd13272a9d4e0ba6f2ec957 ]

When we read the EOF page of the file via readpages, we need
to zero the region beyond EOF that we either do not read or
should not contain data so that mmap does not expose stale data to
user applications.

However, iomap_adjust_read_range() fails to detect EOF correctly,
and so fsx on 1k block size filesystems fails very quickly with
mapreads exposing data beyond EOF. There are two problems here.

Firstly, when calculating the end block of the EOF byte, we have
to round the size by one to avoid a block aligned EOF from reporting
a block too large. i.e. a size of 1024 bytes is 1 block, which in
index terms is block 0. Therefore we have to calculate the end block
from (isize - 1), not isize.

The second bug is determining if the current page spans EOF, and so
whether we need split it into two half, one for the IO, and the
other for zeroing. Unfortunately, the code that checks whether
we should split the block doesn't actually check if we span EOF, it
just checks if the read spans the /offset in the page/ that EOF
sits on. So it splits every read into two if EOF is not page
aligned, regardless of whether we are reading the EOF block or not.

Hence we need to restrict the "does the read span EOF" check to
just the page that spans EOF, not every page we read.

This patch results in correct EOF detection through readpages:

xfs_vm_readpages: dev 259:0 ino 0x43 nr_pages 24
xfs_iomap_found: dev 259:0 ino 0x43 size 0x66c00 offset 0x4f000 count 98304 type hole startoff 0x13c startblock 1368 blockcount 0x4
iomap_readpage_actor: orig pos 323584 pos 323584, length 4096, poff 0 plen 4096, isize 420864
xfs_iomap_found: dev 259:0 ino 0x43 size 0x66c00 offset 0x50000 count 94208 type hole startoff 0x140 startblock 1497 blockcount 0x5c
iomap_readpage_actor: orig pos 327680 pos 327680, length 94208, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 331776 pos 331776, length 90112, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 335872 pos 335872, length 86016, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 339968 pos 339968, length 81920, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 344064 pos 344064, length 77824, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 348160 pos 348160, length 73728, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 352256 pos 352256, length 69632, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 356352 pos 356352, length 65536, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 360448 pos 360448, length 61440, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 364544 pos 364544, length 57344, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 368640 pos 368640, length 53248, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 372736 pos 372736, length 49152, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 376832 pos 376832, length 45056, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 380928 pos 380928, length 40960, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 385024 pos 385024, length 36864, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 389120 pos 389120, length 32768, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 393216 pos 393216, length 28672, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 397312 pos 397312, length 24576, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 401408 pos 401408, length 20480, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 405504 pos 405504, length 16384, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 409600 pos 409600, length 12288, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 413696 pos 413696, length 8192, poff 0 plen 4096, isize 420864
iomap_readpage_actor: orig pos 417792 pos 417792, length 4096, poff 0 plen 3072, isize 420864
iomap_readpage_actor: orig pos 420864 pos 420864, length 1024, poff 3072 plen 1024, isize 420864

As you can see, it now does full page reads until the last one which
is split correctly at the block aligned EOF, reading 3072 bytes and
zeroing the last 1024 bytes. The original version of the patch got
this right, but it got another case wrong.

The EOF detection crossing really needs to the the original length
as plen, while it starts at the end of the block, will be shortened
as up-to-date blocks are found on the page. This means "orig_pos +
plen" no longer points to the end of the page, and so will not
correctly detect EOF crossing. Hence we have to use the length
passed in to detect this partial page case:

xfs_filemap_fault: dev 259:1 ino 0x43 write_fault 0
xfs_vm_readpage: dev 259:1 ino 0x43 nr_pages 1
xfs_iomap_found: dev 259:1 ino 0x43 size 0x2cc00 offset 0x2c000 count 4096 type hole startoff 0xb0 startblock 282 blockcount 0x4
iomap_readpage_actor: orig pos 180224 pos 181248, length 4096, poff 1024 plen 2048, isize 183296
xfs_iomap_found: dev 259:1 ino 0x43 size 0x2cc00 offset 0x2cc00 count 1024 type hole startoff 0xb3 startblock 285 blockcount 0x1
iomap_readpage_actor: orig pos 183296 pos 183296, length 1024, poff 3072 plen 1024, isize 183296

Heere we see a trace where the first block on the EOF page is up to
date, hence poff = 1024 bytes. The offset into the page of EOF is
3072, so the range we want to read is 1024 - 3071, and the range we
want to zero is 3072 - 4095. You can see this is split correctly
now.

This fixes the stale data beyond EOF problem that fsx quickly
uncovers on 1k block size filesystems.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

c3a62b65

iomap: dio data corruption and spurious errors when pipes fill · 807a5972

由 Dave Chinner 提交于 11月 19, 2018

[ Upstream commit 4721a6010990971440b4ffefbdf014976b8eda2f ]

When doing direct IO to a pipe for do_splice_direct(), then pipe is
trivial to fill up and overflow as it can only hold 16 pages. At
this point bio_iov_iter_get_pages() then returns -EFAULT, and we
abort the IO submission process. Unfortunately, iomap_dio_rw()
propagates the error back up the stack.

The error is converted from the EFAULT to EAGAIN in
generic_file_splice_read() to tell the splice layers that the pipe
is full. do_splice_direct() completely fails to handle EAGAIN errors
(it aborts on error) and returns EAGAIN to the caller.

copy_file_write() then completely fails to handle EAGAIN as well,
and so returns EAGAIN to userspace, having failed to copy the data
it was asked to.

Avoid this whole steaming pile of fail by having iomap_dio_rw()
silently swallow EFAULT errors and so do short reads.

To make matters worse, iomap_dio_actor() has a stale data exposure
bug bio_iov_iter_get_pages() fails - it does not zero the tail block
that it may have been left uncovered by partial IO. Fix the error
handling case to drop to the sub-block zeroing rather than
immmediately returning the -EFAULT error.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

807a5972

iomap: sub-block dio needs to zeroout beyond EOF · b61fdcdb

由 Dave Chinner 提交于 11月 19, 2018

[ Upstream commit b450672fb66b4a991a5b55ee24209ac7ae7690ce ]

If we are doing sub-block dio that extends EOF, we need to zero
the unused tail of the block to initialise the data in it it. If we
do not zero the tail of the block, then an immediate mmap read of
the EOF block will expose stale data beyond EOF to userspace. Found
with fsx running sub-block DIO sizes vs MAPREAD/MAPWRITE operations.

Fix this by detecting if the end of the DIO write is beyond EOF
and zeroing the tail if necessary.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

b61fdcdb

iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents · ac3ec5a4

由 Dave Chinner 提交于 11月 19, 2018

[ Upstream commit 0929d8580071c6a1cec1a7916a8f674c243ceee1 ]

When we write into an unwritten extent via direct IO, we dirty
metadata on IO completion to convert the unwritten extent to
written. However, when we do the FUA optimisation checks, the inode
may be clean and so we issue a FUA write into the unwritten extent.
This means we then bypass the generic_write_sync() call after
unwritten extent conversion has ben done and we don't force the
modified metadata to stable storage.

This violates O_DSYNC semantics. The window of exposure is a single
IO, as the next DIO write will see the inode has dirty metadata and
hence will not use the FUA optimisation. Calling
generic_write_sync() after completion of the second IO will also
sync the first write and it's metadata.

Fix this by avoiding the FUA optimisation when writing to unwritten
extents.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

ac3ec5a4

xfs: extent shifting doesn't fully invalidate page cache · 249279c6

由 Dave Chinner 提交于 11月 19, 2018

[ Upstream commit 7f9f71be84bcab368e58020a42f6d0dd97adf0ce ]

The extent shifting code uses a flush and invalidate mechainsm prior
to shifting extents around. This is similar to what
xfs_free_file_space() does, but it doesn't take into account things
like page cache vs block size differences, and it will fail if there
is a page that it currently busy.

xfs_flush_unmap_range() handles all of these cases, so just convert
xfs_prepare_shift() to us that mechanism rather than having it's own
special sauce.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

249279c6

dlm: fix missing idr_destroy for recover_idr · 1bcac298

由 David Teigland 提交于 11月 15, 2018

[ Upstream commit 8fc6ed9a3508a0435b9270c313600799d210d319 ]

Which would leak memory for the idr internals.
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

1bcac298

dlm: fix possible call to kfree() for non-initialized pointer · ed224150

由 Denis V. Lunev 提交于 11月 13, 2018

[ Upstream commit 58a923adf4d9aca8bf7205985c9c8fc531c65d72 ]

Technically dlm_config_nodes() could return error and keep nodes
uninitialized. After that on the fail path of we'll call kfree()
for that uninitialized value.

The patch is simple - we should just initialize nodes with NULL.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

ed224150

exportfs_decode_fh(): negative pinned may become positive without the parent locked · af17e1fc

由 Al Viro 提交于 11月 08, 2019

[ Upstream commit a2ece088882666e1dc7113744ac912eb161e3f87 ]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NSasha Levin <sashal@kernel.org>

af17e1fc

autofs: fix a leak in autofs_expire_indirect() · d4bc855a

由 Al Viro 提交于 10月 25, 2019

[ Upstream commit 03ad0d703df75c43f78bd72e16124b5b94a95188 ]

if the second call of should_expire() in there ends up
grabbing and returning a new reference to dentry, we need
to drop it before continuing.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NSasha Levin <sashal@kernel.org>

d4bc855a

05 12月, 2019 13 次提交

ext4: add more paranoia checking in ext4_expand_extra_isize handling · e91cce02

由 Theodore Ts'o 提交于 11月 07, 2019

commit 4ea99936a1630f51fc3a2d61a58ec4a1c4b7d55a upstream.

It's possible to specify a non-zero s_want_extra_isize via debugging
option, and this can cause bad things(tm) to happen when using a file
system with an inode size of 128 bytes.

Add better checking when the file system is mounted, as well as when
we are actually doing the trying to do the inode expansion.

Link: https://lore.kernel.org/r/20191110121510.GH23325@mit.edu
Reported-by: syzbot+f8d6f8386ceacdbfff57@syzkaller.appspotmail.com
Reported-by: syzbot+33d7ea72e47de3bdf4e1@syzkaller.appspotmail.com
Reported-by: syzbot+44b6763edfc17144296f@syzkaller.appspotmail.com
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e91cce02

xfs: end sync buffer I/O properly on shutdown error · fe685954

由 Brian Foster 提交于 2月 03, 2019

[ Upstream commit 465fa17f4a303d9fdff9eac4d45f91ece92e96ca ]

As of commit e339dd8d ("xfs: use sync buffer I/O for sync delwri
queue submission"), the delwri submission code uses sync buffer I/O
for sync delwri I/O. Instead of waiting on async I/O to unlock the
buffer, it uses the underlying sync I/O completion mechanism.

If delwri buffer submission fails due to a shutdown scenario, an
error is set on the buffer and buffer completion never occurs. This
can cause xfs_buf_delwri_submit() to deadlock waiting on a
completion event.

We could check the error state before waiting on such buffers, but
that doesn't serialize against the case of an error set via a racing
I/O completion. Instead, invoke I/O completion in the shutdown case
regardless of buffer I/O type.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

fe685954

ocfs2: clear journal dirty flag after shutdown journal · 1b0e201c

由 Junxiao Bi 提交于 12月 28, 2018

[ Upstream commit d85400af790dba2aa294f0a77e712f166681f977 ]

Dirty flag of the journal should be cleared at the last stage of umount,
if do it before jbd2_journal_destroy(), then some metadata in uncommitted
transaction could be lost due to io error, but as dirty flag of journal
was already cleared, we can't find that until run a full fsck.  This may
cause system panic or other corruption.

Link: http://lkml.kernel.org/r/20181121020023.3034-3-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: NYiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: NJoseph Qi <jiangqi903@gmail.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@versity.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

1b0e201c

f2fs: fix to dirty inode synchronously · 18bd3588

由 Chao Yu 提交于 12月 18, 2018

[ Upstream commit b32e019049e959ee10ec359893c9dd5d057dad55 ]

If user change inode's i_flags via ioctl, let's add it into global
dirty list, so that checkpoint can guarantee its persistence before
fsync, it can make checkpoint keeping strong consistency.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

18bd3588

f2fs: fix block address for __check_sit_bitmap · 75491eae

由 Qiuyang Sun 提交于 12月 18, 2018

[ Upstream commit 9249dded7b5cb539a8c8698b25d08a3c15261470 ]

Should use lstart (logical start address) instead of start (in dev) here.
This fixes a bug in multi-device scenarios.
Signed-off-by: NQiuyang Sun <sunqiuyang@huawei.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

75491eae

xfs: Fix bulkstat compat ioctls on x32 userspace. · 68cb344c

由 Nick Bowler 提交于 12月 17, 2018

[ Upstream commit 7ca860e3c1a74ad6bd8949364073ef1044cad758 ]

The bulkstat family of ioctls are problematic on x32, because there is
a mixup of native 32-bit and 64-bit conventions.  The xfs_fsop_bulkreq
struct contains pointers and 32-bit integers so that matches the native
32-bit layout, and that means the ioctl implementation goes into the
regular compat path on x32.

However, the 'ubuffer' member of that struct in turn refers to either
struct xfs_inogrp or xfs_bstat (or an array of these).  On x32, those
structures match the native 64-bit layout.  The compat implementation
writes out the 32-bit version of these structures.  This is not the
expected format for x32 userspace, causing problems.

Fortunately the functions which actually output these xfs_inogrp and
xfs_bstat structures have an easy way to select which output format
is required, so we just need a little tweak to select the right format
on x32.
Signed-off-by: NNick Bowler <nbowler@draconx.ca>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

68cb344c

xfs: Align compat attrlist_by_handle with native implementation. · 5d8d2116

由 Nick Bowler 提交于 12月 17, 2018

[ Upstream commit c456d64449efe37da50832b63d91652a85ea1d20 ]

While inspecting the ioctl implementations, I noticed that the compat
implementation of XFS_IOC_ATTRLIST_BY_HANDLE does not do exactly the
same thing as the native implementation.  Specifically, the "cursor"
does not appear to be written out to userspace on the compat path,
like it is on the native path.

This adjusts the compat implementation to copy out the cursor just
like the native implementation does.  The attrlist cursor does not
require any special compat handling.  This fixes xfstests xfs/269
on both IA-32 and x32 userspace, when running on an amd64 kernel.
Signed-off-by: NNick Bowler <nbowler@draconx.ca>
Fixes: 0facef7f ("xfs: in _attrlist_by_handle, copy the cursor back to userspace")
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

5d8d2116

gfs2: take jdata unstuff into account in do_grow · 7baf8fd1

由 Bob Peterson 提交于 12月 18, 2018

[ Upstream commit bc0205612bbd4dd4026d4ba6287f5643c37366ec ]

Before this patch, function do_grow would not reserve enough journal
blocks in the transaction to unstuff jdata files while growing them.
This patch adds the logic to add one more block if the file to grow
is jdata.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

7baf8fd1

exofs_mount(): fix leaks on failure exits · bd5acf3e

由 Al Viro 提交于 11月 09, 2018

[ Upstream commit 26cb5a328c6b2bda9e859307ce4cfc60df3a2c28 ]

... and don't abuse mount_nodev(), while we are at it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Reviewed-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

bd5acf3e

btrfs: only track ref_heads in delayed_ref_updates · df3a72c9

由 Josef Bacik 提交于 12月 03, 2018

[ Upstream commit 158ffa364bf723fa1ef128060646d23dc3942994 ]

We use this number to figure out how many delayed refs to run, but
__btrfs_run_delayed_refs really only checks every time we need a new
delayed ref head, so we always run at least one ref head completely no
matter what the number of items on it.  Fix the accounting to only be
adjusted when we add/remove a ref head.

In addition to using this number to limit the number of delayed refs
run, a future patch is also going to use it to calculate the amount of
space required for delayed refs space reservation.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

df3a72c9

Btrfs: allow clear_extent_dirty() to receive a cached extent state record · e9a7415a

由 Filipe Manana 提交于 11月 16, 2018

[ Upstream commit 0e6ec385b55f6001da8c6b1532494241e52c550d ]

We can have a lot freed extents during the life span of transaction, so
the red black tree that keeps track of the ranges of each freed extent
(fs_info->freed_extents[]) can get quite big. When finishing a
transaction commit we find each range, process it (discard the extents,
unpin them) and then remove it from the red black tree.

We can use an extent state record as a cache when searching for a range,
so that when we clean the range we can use the cached extent state we
passed to the search function instead of iterating the red black tree
again. Doing things as fast as possible when finishing a transaction (in
state TRANS_STATE_UNBLOCKED) is convenient as it reduces the time we
block another task that wants to commit the next transaction.

So change clear_extent_dirty() to allow an optional extent state record to
be passed as an argument, which will be passed down to __clear_extent_bit.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

e9a7415a

btrfs: dev-replace: set result code of cancel by status of scrub · d87ca19c

由 Anand Jain 提交于 11月 11, 2018

[ Upstream commit b47dda2ef6d793b67fd5979032dcd106e3f0a5c9 ]

The device-replace needs to check the result code of the scrub workers
in btrfs_dev_replace_cancel and distinguish if successful cancel
operation and when the there was no operation running.

If btrfs_scrub_cancel() fails, return
BTRFS_IOCTL_DEV_REPLACE_RESULT_NOT_STARTED so that user can try
to cancel the replace again.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

d87ca19c

btrfs: fix ncopies raid_attr for RAID56 · 37410d6a

由 Hans van Kranenburg 提交于 10月 04, 2018

[ Upstream commit da612e31aee51bd13231c78a47c714b543bd3ad8 ]

RAID5 and RAID6 profile store one copy of the data, not 2 or 3. These
values are not yet used anywhere so there's no change.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NHans van Kranenburg <hans.van.kranenburg@mendix.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

37410d6a

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功