提交 · def596e9557c91d9846fc4d84d26f2c564644416 · openeuler / Kernel

28 2月, 2019 3 次提交

io_uring: support for IO polling · def596e9

由 Jens Axboe 提交于 6年前

Add support for a polled io_uring instance. When a read or write is
submitted to a polled io_uring, the application must poll for
completions on the CQ ring through io_uring_enter(2). Polled IO may not
generate IRQ completions, hence they need to be actively found by the
application itself.

To use polling, io_uring_setup() must be used with the
IORING_SETUP_IOPOLL flag being set. It is illegal to mix and match
polled and non-polled IO on an io_uring.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

def596e9

io_uring: add fsync support · c992fe29

由 Christoph Hellwig 提交于 6年前

Add a new fsync opcode, which either syncs a range if one is passed,
or the whole file if the offset and length fields are both cleared
to zero.  A flag is provided to use fdatasync semantics, that is only
force out metadata which is required to retrieve the file data, but
not others like metadata.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c992fe29

Add io_uring IO interface · 2b188cc1

由 Jens Axboe 提交于 6年前

The submission queue (SQ) and completion queue (CQ) rings are shared
between the application and the kernel. This eliminates the need to
copy data back and forth to submit and complete IO.

IO submissions use the io_uring_sqe data structure, and completions
are generated in the form of io_uring_cqe data structures. The SQ
ring is an index into the io_uring_sqe array, which makes it possible
to submit a batch of IOs without them being contiguous in the ring.
The CQ ring is always contiguous, as completion events are inherently
unordered, and hence any io_uring_cqe entry can point back to an
arbitrary submission.

Two new system calls are added for this:

io_uring_setup(entries, params)
Sets up an io_uring instance for doing async IO. On success,
returns a file descriptor that the application can mmap to
gain access to the SQ ring, CQ ring, and io_uring_sqes.

io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
Initiates IO against the rings mapped to this fd, or waits for
them to complete, or both. The behavior is controlled by the
parameters passed in. If 'to_submit' is non-zero, then we'll
try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
kernel will wait for 'min_complete' events, if they aren't
already available. It's valid to set IORING_ENTER_GETEVENTS
and 'min_complete' == 0 at the same time, this allows the
kernel to return already completed events without waiting
for them. This is useful only for polling, as for IRQ
driven IO, the application can just check the CQ ring
without entering the kernel.

With this setup, it's possible to do async IO with a single system
call. Future developments will enable polled IO with this interface,
and polled submission as well. The latter will enable an application
to do IO without doing ANY system calls at all.

For IRQ driven IO, an application only needs to enter the kernel for
completions if it wants to wait for them to occur.

Each io_uring is backed by a workqueue, to support buffered async IO
as well. We will only punt to an async context if the command would
need to wait for IO on the device side. Any data that can be accessed
directly in the page cache is done inline. This avoids the slowness
issue of usual threadpools, since cached data is accessed as quickly
as a sync interface.

Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.cReviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2b188cc1

24 2月, 2019 3 次提交

iomap: wire up the iopoll method · 81214bab

由 Christoph Hellwig 提交于 6年前

Store the request queue the last bio was submitted to in the iocb
private data in addition to the cookie so that we find the right block
device.  Also refactor the common direct I/O bio submission code into a
nice little helper.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

Modified to use bio_set_polled().
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

81214bab

block: add bio_set_polled() helper · 0bbb280d

由 Jens Axboe 提交于 6年前

For the upcoming async polled IO, we can't sleep allocating requests.
If we do, then we introduce a deadlock where the submitter already
has async polled IO in-flight, but can't wait for them to complete
since polled requests must be active found and reaped.

Utilize the helper in the blockdev DIRECT_IO code.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0bbb280d

block: wire up block device iopoll method · eae83ce1

由 Christoph Hellwig 提交于 6年前

Just call blk_poll on the iocb cookie, we can derive the block device
from the inode trivially.
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

eae83ce1

15 2月, 2019 5 次提交

block: enable multipage bvecs · 07173c3e

由 Ming Lei 提交于 5年前

This patch pulls the trigger for multi-page bvecs.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

07173c3e

block: allow bio_for_each_segment_all() to iterate over multi-page bvec · 6dc4f100

由 Ming Lei 提交于 5年前

This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

Given it is just one mechannical & simple change on all bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6dc4f100

btrfs: use mp_bvec_last_segment to get bio's last page · c3a7ce73

由 Ming Lei 提交于 5年前

Preparing for supporting multi-page bvec.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c3a7ce73

fs/buffer.c: use bvec iterator to truncate the bio · f70f4464

由 Ming Lei 提交于 5年前

Once multi-page bvec is enabled, the last bvec may include more than one
page, this patch use mp_bvec_last_segment() to truncate the bio.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f70f4464

btrfs: look at bi_size for repair decisions · 8a2ee44a

由 Christoph Hellwig 提交于 5年前

bio_readpage_error currently uses bi_vcnt to decide if it is worth
retrying an I/O.  But the vector count is mostly an implementation
artifact - it really should figure out if there is more than a
single sector worth retrying.  Use bi_size for that and shift by
PAGE_SHIFT.  This really should be blocks/sectors, but given that
btrfs doesn't support a sector size different from the PAGE_SIZE
using the page size keeps the changes to a minimum.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8a2ee44a

07 2月, 2019 2 次提交

nfsd: Fix error return values for nfsd4_clone_file_range() · e3fdc89c

由 Trond Myklebust 提交于 6年前

If the parameter 'count' is non-zero, nfsd4_clone_file_range() will
currently clobber all errors returned by vfs_clone_file_range() and
replace them with EINVAL.

Fixes: 42ec3d4c ("vfs: make remap_file_range functions take and...")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e3fdc89c

fs: ratelimit __find_get_block_slow() failure message. · 43636c80

由 Tetsuo Handa 提交于 6年前

When something let __find_get_block_slow() hit all_mapped path, it calls
printk() for 100+ times per a second. But there is no need to print same
message with such high frequency; it is just asking for stall warning, or
at least bloating log files.

  [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
  [  399.873324][T15342] b_state=0x00000029, b_size=512
  [  399.878403][T15342] device loop0 blocksize: 4096
  [  399.883296][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
  [  399.890400][T15342] b_state=0x00000029, b_size=512
  [  399.895595][T15342] device loop0 blocksize: 4096
  [  399.900556][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
  [  399.907471][T15342] b_state=0x00000029, b_size=512
  [  399.912506][T15342] device loop0 blocksize: 4096

This patch reduces frequency to up to once per a second, in addition to
concatenating three lines into one.

  [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8, b_state=0x00000029, b_size=512, device loop0 blocksize: 4096
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

43636c80

06 2月, 2019 1 次提交

aio: initialize kiocb private in case any filesystems expect it. · ec51f8ee

由 Mike Marshall 提交于 5年前

A recent optimization had left private uninitialized.

Fixes: 2bc4ca9b ("aio: don't zero entire aio_kiocb aio_get_req()")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMike Marshall <hubcap@omnibond.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ec51f8ee

04 2月, 2019 3 次提交

xfs: set buffer ops when repair probes for btree type · add46b3b

由 Darrick J. Wong 提交于 5年前

In xrep_findroot_block, we work out the btree type and correctness of a
given block by calling different btree verifiers on root block
candidates.  However, we leave the NULL b_ops while ->verify_read
validates the block, which means that if the verifier calls
xfs_buf_verifier_error it'll crash on the null b_ops.  Fix it to set
b_ops before calling the verifier and unsetting it if the verifier
fails.

Furthermore, improve the documentation around xfs_buf_ensure_ops, which
is the function that is responsible for cleaning up the b_ops state of
buffers that go through xrep_findroot_block but don't match anything.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

add46b3b

xfs: end sync buffer I/O properly on shutdown error · 465fa17f

由 Brian Foster 提交于 5年前

As of commit e339dd8d ("xfs: use sync buffer I/O for sync delwri
queue submission"), the delwri submission code uses sync buffer I/O
for sync delwri I/O. Instead of waiting on async I/O to unlock the
buffer, it uses the underlying sync I/O completion mechanism.

If delwri buffer submission fails due to a shutdown scenario, an
error is set on the buffer and buffer completion never occurs. This
can cause xfs_buf_delwri_submit() to deadlock waiting on a
completion event.

We could check the error state before waiting on such buffers, but
that doesn't serialize against the case of an error set via a racing
I/O completion. Instead, invoke I/O completion in the shutdown case
regardless of buffer I/O type.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

465fa17f

xfs: eof trim writeback mapping as soon as it is cached · aa6ee4ab

由 Brian Foster 提交于 5年前

The cached writeback mapping is EOF trimmed to try and avoid races
between post-eof block management and writeback that result in
sending cached data to a stale location. The cached mapping is
currently trimmed on the validation check, which leaves a race
window between the time the mapping is cached and when it is trimmed
against the current inode size.

For example, if a new mapping is cached by delalloc conversion on a
blocksize == page size fs, we could cycle various locks, perform
memory allocations, etc.  in the writeback codepath before the
associated mapping is eventually trimmed to i_size. This leaves
enough time for a post-eof truncate and file append before the
cached mapping is trimmed. The former event essentially invalidates
a range of the cached mapping and the latter bumps the inode size
such the trim on the next writepage event won't trim all of the
invalid blocks. fstest generic/464 reproduces this scenario
occasionally and causes a lost writeback and stale delalloc blocks
warning on inode inactivation.

To work around this problem, trim the cached writeback mapping as
soon as it is cached in addition to on subsequent validation checks.
This is a minor tweak to tighten the race window as much as possible
until a proper invalidation mechanism is available.

Fixes: 40214d12 ("xfs: trim writepage mapping to within eof")
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

aa6ee4ab

02 2月, 2019 4 次提交

autofs: fix error return in autofs_fill_super() · f585b283

由 Ian Kent 提交于 5年前

In autofs_fill_super() on error of get inode/make root dentry the return
should be ENOMEM as this is the only failure case of the called
functions.

Link: http://lkml.kernel.org/r/154725123240.11260.796773942606871359.stgit@pluto-themaw-netSigned-off-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f585b283

autofs: drop dentry reference only when it is never used · 63ce5f55

由 Pan Bian 提交于 5年前

autofs_expire_run() calls dput(dentry) to drop the reference count of
dentry. However, dentry is read via autofs_dentry_ino(dentry) after
that. This may result in a use-free-bug. The patch drops the reference
count of dentry only when it is never used.

Link: http://lkml.kernel.org/r/154725122396.11260.16053424107144453867.stgit@pluto-themaw-netSigned-off-by: NPan Bian <bianpan2016@163.com>
Signed-off-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

63ce5f55

fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() · c27d82f5

由 Jan Kara 提交于 5年前

When superblock has lots of inodes without any pagecache (like is the
case for /proc), drop_pagecache_sb() will iterate through all of them
without dropping sb->s_inode_list_lock which can lead to softlockups
(one of our customers hit this).

Fix the problem by going to the slow path and doing cond_resched() in
case the process needs rescheduling.

Link: http://lkml.kernel.org/r/20190114085343.15011-1-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c27d82f5

proc: fix /proc/net/* after setns(2) · 1fde6f21

由 Alexey Dobriyan 提交于 5年前

/proc entries under /proc/net/* can't be cached into dcache because
setns(2) can change current net namespace.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: avoid vim miscolorization]
[adobriyan@gmail.com: write test, add dummy ->d_revalidate hook: necessary if /proc/net/* is pinned at setns time]
  Link: http://lkml.kernel.org/r/20190108192350.GA12034@avx2
Link: http://lkml.kernel.org/r/20190107162336.GA9239@avx2
Fixes: 1da4d377 ("proc: revalidate misc dentries")
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Reported-by: NMateusz Stępień <mateusz.stepien@netrounds.com>
Reported-by: NAhmad Fatoum <a.fatoum@pengutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1fde6f21

01 2月, 2019 1 次提交

gfs2: Revert "Fix loop in gfs2_rbm_find" · e74c98ca

由 Andreas Gruenbacher 提交于 5年前

This reverts commit 2d29f6b9.

It turns out that the fix can lead to a ~20 percent performance regression
in initial writes to the page cache according to iozone.  Let's revert this
for now to have more time for a proper fix.

Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e74c98ca

31 1月, 2019 7 次提交

S
cifs: update internal module version number · b9b9378b
由 Steve French 提交于 5年前
```
To 2.17
Signed-off-by: NSteve French <stfrench@microsoft.com>
```
b9b9378b

CIFS: fix use-after-free of the lease keys · d339adc1

由 Aurelien Aptel 提交于 5年前

The request buffers are freed right before copying the pointers.
Use the func args instead which are identical and still valid.

Simple reproducer (requires KASAN enabled) on a cifs mount:

echo foo > foo ; tail -f foo & rm foo

Cc: <stable@vger.kernel.org> # 4.20
Fixes: 179e44d4 ("smb3: add tracepoint for sending lease break responses to server")
Signed-off-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NPaulo Alcantara <palcantara@suse.de>

d339adc1

fs/dcache: Track & report number of negative dentries · af0c9af1

由 Waiman Long 提交于 5年前

The current dentry number tracking code doesn't distinguish between
positive & negative dentries.  It just reports the total number of
dentries in the LRU lists.

As excessive number of negative dentries can have an impact on system
performance, it will be wise to track the number of positive and
negative dentries separately.

This patch adds tracking for the total number of negative dentries in
the system LRU lists and reports it in the 5th field in the
/proc/sys/fs/dentry-state file.  The number, however, does not include
negative dentries that are in flight but not in the LRU yet as well as
those in the shrinker lists which are on the way out anyway.

The number of positive dentries in the LRU lists can be roughly found by
subtracting the number of negative dentries from the unused count.

Matthew Wilcox had confirmed that since the introduction of the
dentry_stat structure in 2.1.60, the dummy array was there, probably for
future extension.  They were not replacements of pre-existing fields.
So no sane applications that read the value of /proc/sys/fs/dentry-state
will do dummy thing if the last 2 fields of the sysctl parameter are not
zero.  IOW, it will be safe to use one of the dummy array entry for
negative dentry count.
Signed-off-by: NWaiman Long <longman@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

af0c9af1

fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb() · 1dbd449c

由 Waiman Long 提交于 5年前

The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
lists and the shrink lists where the DCACHE_LRU_LIST bit is set.

The shrink_dcache_sb() function moves dentries from the LRU list to a
shrink list and subtracts the dentry count from nr_dentry_unused.  This
is incorrect as the nr_dentry_unused count will also be decremented in
shrink_dentry_list() via d_shrink_del().

To fix this double decrement, the decrement in the shrink_dcache_sb()
function is taken out.

Fixes: 4e717f5c ("list_lru: remove special case function list_lru_dispose_all."
Cc: stable@kernel.org
Signed-off-by: NWaiman Long <longman@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1dbd449c

btrfs: On error always free subvol_name in btrfs_mount · 532b618b

由 Eric W. Biederman 提交于 5年前

The subvol_name is allocated in btrfs_parse_subvol_options and is
consumed and freed in mount_subvol.  Add a free to the error paths that
don't call mount_subvol so that it is guaranteed that subvol_name is
freed when an error happens.

Fixes: 312c89fb ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()")
Cc: stable@vger.kernel.org # v4.19+
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

532b618b

btrfs: clean up pending block groups when transaction commit aborts · c7cc64a9

由 David Sterba 提交于 6年前

The fstests generic/475 stresses transaction aborts and can reveal
space accounting or use-after-free bugs regarding block goups.

In this case the pending block groups that remain linked to the
structures after transaction commit aborts in the middle.

The corrupted slabs lead to failures in following tests, eg. generic/476

  [ 8172.752887] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
  [ 8172.755799] #PF error: [normal kernel read fault]
  [ 8172.757571] PGD 661ae067 P4D 661ae067 PUD 3db8e067 PMD 0
  [ 8172.759000] Oops: 0000 [#1] PREEMPT SMP
  [ 8172.760209] CPU: 0 PID: 39 Comm: kswapd0 Tainted: G        W         5.0.0-rc2-default #408
  [ 8172.762495] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
  [ 8172.765772] RIP: 0010:shrink_page_list+0x2f9/0xe90
  [ 8172.770453] RSP: 0018:ffff967f00663b18 EFLAGS: 00010287
  [ 8172.771184] RAX: 0000000000000000 RBX: ffff967f00663c20 RCX: 0000000000000000
  [ 8172.772850] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8c0620ab20e0
  [ 8172.774629] RBP: ffff967f00663dd8 R08: 0000000000000000 R09: 0000000000000000
  [ 8172.776094] R10: ffff8c0620ab22f8 R11: ffff8c063f772688 R12: ffff967f00663b78
  [ 8172.777533] R13: ffff8c063f625600 R14: ffff8c063f625608 R15: dead000000000200
  [ 8172.778886] FS:  0000000000000000(0000) GS:ffff8c063d400000(0000) knlGS:0000000000000000
  [ 8172.780545] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 8172.781787] CR2: 0000000000000058 CR3: 000000004e962000 CR4: 00000000000006f0
  [ 8172.783547] Call Trace:
  [ 8172.784112]  shrink_inactive_list+0x194/0x410
  [ 8172.784747]  shrink_node_memcg.constprop.85+0x3a5/0x6a0
  [ 8172.785472]  shrink_node+0x62/0x1e0
  [ 8172.786011]  balance_pgdat+0x216/0x460
  [ 8172.786577]  kswapd+0xe3/0x4a0
  [ 8172.787085]  ? finish_wait+0x80/0x80
  [ 8172.787795]  ? balance_pgdat+0x460/0x460
  [ 8172.788799]  kthread+0x116/0x130
  [ 8172.789640]  ? kthread_create_on_node+0x60/0x60
  [ 8172.790323]  ret_from_fork+0x24/0x30
  [ 8172.794253] CR2: 0000000000000058

or accounting errors at umount time:

  [ 8159.537251] WARNING: CPU: 2 PID: 19031 at fs/btrfs/extent-tree.c:5987 btrfs_free_block_groups+0x3d5/0x410 [btrfs]
  [ 8159.543325] CPU: 2 PID: 19031 Comm: umount Tainted: G        W         5.0.0-rc2-default #408
  [ 8159.545472] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
  [ 8159.548155] RIP: 0010:btrfs_free_block_groups+0x3d5/0x410 [btrfs]
  [ 8159.554030] RSP: 0018:ffff967f079cbde8 EFLAGS: 00010206
  [ 8159.555144] RAX: 0000000001000000 RBX: ffff8c06366cf800 RCX: 0000000000000000
  [ 8159.556730] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff8c06255ad800
  [ 8159.558279] RBP: ffff8c0637ac0000 R08: 0000000000000001 R09: 0000000000000000
  [ 8159.559797] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8c0637ac0108
  [ 8159.561296] R13: ffff8c0637ac0158 R14: 0000000000000000 R15: dead000000000100
  [ 8159.562852] FS:  00007f7f693b9fc0(0000) GS:ffff8c063d800000(0000) knlGS:0000000000000000
  [ 8159.564839] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 8159.566160] CR2: 00007f7f68fab7b0 CR3: 000000000aec7000 CR4: 00000000000006e0
  [ 8159.567898] Call Trace:
  [ 8159.568597]  close_ctree+0x17f/0x350 [btrfs]
  [ 8159.569628]  generic_shutdown_super+0x64/0x100
  [ 8159.570808]  kill_anon_super+0x14/0x30
  [ 8159.571857]  btrfs_kill_super+0x12/0xa0 [btrfs]
  [ 8159.573063]  deactivate_locked_super+0x29/0x60
  [ 8159.574234]  cleanup_mnt+0x3b/0x70
  [ 8159.575176]  task_work_run+0x98/0xc0
  [ 8159.576177]  exit_to_usermode_loop+0x83/0x90
  [ 8159.577315]  do_syscall_64+0x15b/0x180
  [ 8159.578339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

This fix is based on 2 Josef's patches that used sideefects of
btrfs_create_pending_block_groups, this fix introduces the helper that
does what we need.

CC: stable@vger.kernel.org # 4.4+
CC: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c7cc64a9

btrfs: fix potential oops in device_list_add · 92900e51

由 Al Viro 提交于 5年前

alloc_fs_devices() can return ERR_PTR(-ENOMEM), so dereferencing its
result before the check for IS_ERR() is a bad idea.

Fixes: d1a63002 ("btrfs: add members to fs_devices to track fsid changes")
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

92900e51

30 1月, 2019 9 次提交

debugfs: debugfs_lookup() should return NULL if not found · 37ea7b63

由 Greg Kroah-Hartman 提交于 5年前

Lots of callers of debugfs_lookup() were just checking NULL to see if
the file/directory was found or not.  By changing this in ff9fb72b
("debugfs: return error values, not NULL") we caused some subsystems to
easily crash.

Fixes: ff9fb72b ("debugfs: return error values, not NULL")
Reported-by: syzbot+b382ba6a802a3d242790@syzkaller.appspotmail.com
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

37ea7b63

CIFS: Do not consider -ENODATA as stat failure for reads · 082aaa87

由 Pavel Shilovsky 提交于 6年前

When doing reads beyound the end of a file the server returns
error STATUS_END_OF_FILE error which is mapped to -ENODATA.
Currently we report it as a failure which confuses read stats.
Change it to not consider -ENODATA as failure for stat purposes.
Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>

082aaa87

CIFS: Do not count -ENODATA as failure for query directory · 8e6e72ae

由 Pavel Shilovsky 提交于 5年前

Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>

8e6e72ae

CIFS: Fix trace command logging for SMB2 reads and writes · 7d42e72f

由 Pavel Shilovsky 提交于 6年前

Currently we log success once we send an async IO request to
the server. Instead we need to analyse a response and then log
success or failure for a particular command. Also fix argument
list for read logging.

Cc: <stable@vger.kernel.org> # 4.18
Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

7d42e72f

CIFS: Fix possible oops and memory leaks in async IO · 9bda8723

由 Pavel Shilovsky 提交于 6年前

Allocation of a page array for non-cached IO was separated from
allocation of rdata and wdata structures and this introduced memory
leaks and a possible null pointer dereference. This patch fixes
these problems.

Cc: <stable@vger.kernel.org>
Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

9bda8723

cifs: limit amount of data we request for xattrs to CIFSMaxBufSize · c4627e66

由 Ronnie Sahlberg 提交于 5年前

minus the various headers and blobs that will be part of the reply.

or else we might trigger a session reconnect.
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>

c4627e66

cifs: fix computation for MAX_SMB2_HDR_SIZE · 58d15ed1

由 Ronnie Sahlberg 提交于 5年前

The size of the fixed part of the create response is 88 bytes not 56.
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>

58d15ed1

NFS: Fix up return value on fatal errors in nfs_page_async_flush() · 8fc75bed

由 Trond Myklebust 提交于 5年前

Ensure that we return the fatal error value that caused us to exit
nfs_page_async_flush().

Fixes: c373fff7 ("NFSv4: Don't special case "launder"")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.12+
Reviewed-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8fc75bed

debugfs: return error values, not NULL · ff9fb72b

由 Greg Kroah-Hartman 提交于 6年前

When an error happens, debugfs should return an error pointer value, not
NULL.  This will prevent the totally theoretical error where a debugfs
call fails due to lack of memory, returning NULL, and that dentry value
is then passed to another debugfs call, which would end up succeeding,
creating a file at the root of the debugfs tree, but would then be
impossible to remove (because you can not remove the directory NULL).

So, to make everyone happy, always return errors, this makes the users
of debugfs much simpler (they do not have to ever check the return
value), and everyone can rest easy.
Reported-by: NGary R Hook <ghook@amd.com>
Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: NMasami Hiramatsu <mhiramat@kernel.org>
Reported-by: NMichal Hocko <mhocko@kernel.org>
Reported-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Reported-by: NUlf Hansson <ulf.hansson@linaro.org>
Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ff9fb72b

29 1月, 2019 1 次提交

nfs: Fix NULL pointer dereference of dev_name · 80ff0017

由 Yao Liu 提交于 5年前

There is a NULL pointer dereference of dev_name in nfs_parse_devname()

The oops looks something like:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  ...
  RIP: 0010:nfs_fs_mount+0x3b6/0xc20 [nfs]
  ...
  Call Trace:
   ? ida_alloc_range+0x34b/0x3d0
   ? nfs_clone_super+0x80/0x80 [nfs]
   ? nfs_free_parsed_mount_data+0x60/0x60 [nfs]
   mount_fs+0x52/0x170
   ? __init_waitqueue_head+0x3b/0x50
   vfs_kern_mount+0x6b/0x170
   do_mount+0x216/0xdc0
   ksys_mount+0x83/0xd0
   __x64_sys_mount+0x25/0x30
   do_syscall_64+0x65/0x220
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fix this by adding a NULL check on dev_name
Signed-off-by: NYao Liu <yotta.liu@ucloud.cn>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

80ff0017

28 1月, 2019 1 次提交

btrfs: don't end the transaction for delayed refs in throttle · 302167c5

由 Josef Bacik 提交于 6年前

Previously callers to btrfs_end_transaction_throttle() would commit the
transaction if there wasn't enough delayed refs space.  This happens in
relocation, and if the fs is relatively empty we'll run out of delayed
refs space basically immediately, so we'll just be stuck in this loop of
committing the transaction over and over again.

This code existed because we didn't have a good feedback mechanism for
running delayed refs, but with the delayed refs rsv we do now.  Delete
this throttling code and let the btrfs_start_transaction() in relocation
deal with putting pressure on the delayed refs infrastructure.  With
this patch we no longer take 5 minutes to balance a metadata only fs.

Qu has submitted a fstest to catch slow balance or excessive transaction
commits. Steps to reproduce:

* create subvolume
* create many (eg. 16000) inlined files, of size 2KiB
* iteratively snapshot and touch several files to trigger metadata
  updates
* start balance -m
Reported-by: NQu Wenruo <wqu@suse.com>
Fixes: 64403612 ("btrfs: rework btrfs_check_space_for_delayed_refs")
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
[ add tags and steps to reproduce ]
Signed-off-by: NDavid Sterba <dsterba@suse.com>

302167c5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功