提交 · c66fd019713e9cf7d6f1243c378cd177d01fe18a · openeuler / Kernel

10 8月, 2021 1 次提交

block: make the block holder code optional · c66fd019

由 Christoph Hellwig 提交于 8月 04, 2021

Move the block holder code into a separate file as it is not in any way
related to the other block_dev.c code, and add a new selectable config
option for it so that we don't have to build it without any remapped
drivers selected.

The Kconfig symbol contains a _DEPRECATED suffix to match the comments
added in commit 49731baa
("block: restore multiple bd_link_disk_holder() support").
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20210804094147.459763-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c66fd019

03 8月, 2021 4 次提交

block: remove bdput · 2f4731dc

由 Christoph Hellwig 提交于 7月 22, 2021

Now that we've stopped using inode references for anything meaninful
in the block layer get rid of the helper to put it and just open code
the call to iput on the block_device inode.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NChaitanya Kulkarni <ckulkarnilinux@gmail.com>
Link: https://lore.kernel.org/r/20210722075402.983367-10-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

2f4731dc

block: remove bdgrab · 14cf1dbb

由 Christoph Hellwig 提交于 7月 22, 2021

All callers are gone, and no one should grab a pure inode reference to
a block device anymore.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20210722075402.983367-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

14cf1dbb

block: change the refcounting for partitions · 9d3b8813

由 Christoph Hellwig 提交于 7月 22, 2021

Instead of acquiring an inode reference on open make sure partitions
always hold device model references to the disk while alive, and switch
open to grab only a device model reference to the opened block device.
If that is a partition the disk reference is transitively held by the
partition already.

Link: https://lore.kernel.org/r/20210722075402.983367-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

9d3b8813

block: unhash the whole device inode earlier · d7a66574

由 Christoph Hellwig 提交于 7月 22, 2021

Unhash the whole device inode early in del_gendisk. This allows to
remove the first GENHD_FL_UP check in the open path as we simply
won't find a just removed inode. The second non-racy check after
taking open_mutex is still kept.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210722075402.983367-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

d7a66574

28 7月, 2021 1 次提交

block: delay freeing the gendisk · 340e8457

由 Christoph Hellwig 提交于 7月 22, 2021

blkdev_get_no_open acquires a reference to the block_device through
the block device inode and then tries to acquire a device model
reference to the gendisk. But at this point the disk migh already
be freed (although the race is free). Fix this by only freeing the
gendisk from the whole device bdevs ->free_inode callback as well.

Fixes: 22ae8ce8 ("block: simplify bdev/disk lookup in blkdev_get")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210722075402.983367-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

340e8457

30 6月, 2021 1 次提交

mm: require ->set_page_dirty to be explicitly wired up · 0af57378

由 Christoph Hellwig 提交于 6月 28, 2021

Remove the CONFIG_BLOCK default to __set_page_dirty_buffers and just wire
that method up for the missing instances.

[hch@lst.de: ecryptfs: add a ->set_page_dirty cludge]
  Link: https://lkml.kernel.org/r/20210624125250.536369-1-hch@lst.de

Link: https://lkml.kernel.org/r/20210614061512.3966143-4-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Tyler Hicks <code@tyhicks.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0af57378

25 6月, 2021 2 次提交

block: pass a gendisk to bdev_disk_changed · 0384264e

由 Christoph Hellwig 提交于 6月 24, 2021

bdev_disk_changed can only operate on whole devices. Make that clear
by passing a gendisk instead of the struct block_device.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210624123240.441814-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0384264e

block: move bdev_disk_changed · 630161cf

由 Christoph Hellwig 提交于 6月 24, 2021

Move bdev_disk_changed to block/partitions/core.c, together with the
rest of the partition scanning code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210624123240.441814-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

630161cf

24 6月, 2021 1 次提交

fs: remove bdev_try_to_free_page callback · acc6100d

由 Zhang Yi 提交于 6月 10, 2021

After remove the unique user of sop->bdev_try_to_free_page() callback,
we could remove the callback and the corresponding blkdev_releasepage()
at all.
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210610112440.3438139-9-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

acc6100d

04 6月, 2021 1 次提交

remove the raw driver · 603e4922

由 Christoph Hellwig 提交于 5月 31, 2021

The raw driver used to provide direct unbuffered access to block devices
before O_DIRECT was invented. It has been obsolete for more than a
decade.
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/lkml/Pine.LNX.4.64.0703180754060.6605@CPE00045a9c397f-CM001225dbafb6/Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210531072526.97052-1-hch@lst.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

603e4922

01 6月, 2021 6 次提交

block: move bd_part_count to struct gendisk · ab4b5705

由 Christoph Hellwig 提交于 5月 25, 2021

The bd_part_count value only makes sense for whole devices, so move it
to struct gendisk and give it a more descriptive name.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210525061301.2242282-7-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

ab4b5705

block: split __blkdev_put · c8276b95

由 Christoph Hellwig 提交于 5月 25, 2021

Split __blkdev_put into one helper for the whole device, and one for
partitions as well as another shared helper for flushing the block
device inode mapping.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210525061301.2242282-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c8276b95

block: move adjusting bd_part_count out of __blkdev_get · e54069ac

由 Christoph Hellwig 提交于 5月 25, 2021

Keep in the callers and thus remove the for_part argument.  This mirrors
what is done on the blkdev_get side and slightly simplifies
blkdev_get_part as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@rehat.com>
Link: https://lore.kernel.org/r/20210525061301.2242282-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

e54069ac

block: move bd_mutex to struct gendisk · a8698707

由 Christoph Hellwig 提交于 5月 25, 2021

Replace the per-block device bd_mutex with a per-gendisk open_mutex,
thus simplifying locking wherever we deal with partitions.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Link: https://lore.kernel.org/r/20210525061301.2242282-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a8698707

block: move sync_blockdev from __blkdev_put to blkdev_put · 210a6d75

由 Christoph Hellwig 提交于 5月 25, 2021

Do the early unlocked syncing even earlier to move more code out of
the recursive path.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210525061301.2242282-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

210a6d75

block: split __blkdev_get · 362529d9

由 Christoph Hellwig 提交于 5月 25, 2021

Split __blkdev_get into one helper for the whole device, and one for
opening partitions.  This removes the (bounded) recursion when opening
a partition.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20210525061301.2242282-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

362529d9

20 5月, 2021 2 次提交

block: fix a race between del_gendisk and BLKRRPART · bc6a3851

由 Gulam Mohamed 提交于 5月 14, 2021

When BLKRRPART is called concurrently with del_gendisk, the partitions
rescan can create a stale partition that will never be be cleaned up.

Fix this by checking the the disk is up before rescanning partitions
while under bd_mutex.
Signed-off-by: NGulam Mohamed <gulam.mohamed@oracle.com>
[hch: split from a larger patch]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210514131842.1600568-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

bc6a3851

block: prevent block device lookups at the beginning of del_gendisk · 6c60ff04

由 Christoph Hellwig 提交于 5月 14, 2021

As an artifact of how gendisk lookup used to work in earlier kernels,
GENHD_FL_UP is only cleared very late in del_gendisk, and a global lock
is used to prevent opens from succeeding while del_gendisk is tearing
down the gendisk. Switch to clearing the flag early and under bd_mutex
so that callers can use bd_mutex to stabilize the flag, which removes
the need for the global mutex.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210514131842.1600568-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

6c60ff04

06 5月, 2021 2 次提交

block: reexpand iov_iter after read/write · cf7b39a0

由 yangerkun 提交于 4月 01, 2021

We get a bug:

BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404
lib/iov_iter.c:1139
Read of size 8 at addr ffff0000d3fb11f8 by task

CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted
5.10.0-00843-g352c8610ccd2 #2
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132
 show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x110/0x164 lib/dump_stack.c:118
 print_address_description+0x78/0x5c8 mm/kasan/report.c:385
 __kasan_report mm/kasan/report.c:545 [inline]
 kasan_report+0x148/0x1e4 mm/kasan/report.c:562
 check_memory_region_inline mm/kasan/generic.c:183 [inline]
 __asan_load8+0xb4/0xbc mm/kasan/generic.c:252
 iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139
 io_read fs/io_uring.c:3421 [inline]
 io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943
 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
 io_submit_sqe fs/io_uring.c:6395 [inline]
 io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
 __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
 __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
 el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
 do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670

Allocated by task 12570:
 stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
 kasan_save_stack mm/kasan/common.c:48 [inline]
 kasan_set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461
 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475
 __kmalloc+0x23c/0x334 mm/slub.c:3970
 kmalloc include/linux/slab.h:557 [inline]
 __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210
 io_setup_async_rw fs/io_uring.c:3229 [inline]
 io_read fs/io_uring.c:3436 [inline]
 io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943
 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260
 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326
 io_submit_sqe fs/io_uring.c:6395 [inline]
 io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624
 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline]
 __se_sys_io_uring_enter fs/io_uring.c:8960 [inline]
 __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960
 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline]
 invoke_syscall arch/arm64/kernel/syscall.c:48 [inline]
 el0_svc_common arch/arm64/kernel/syscall.c:158 [inline]
 do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227
 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367
 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383
 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670

Freed by task 12570:
 stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121
 kasan_save_stack mm/kasan/common.c:48 [inline]
 kasan_set_track+0x38/0x6c mm/kasan/common.c:56
 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355
 __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422
 kasan_slab_free+0x10/0x1c mm/kasan/common.c:431
 slab_free_hook mm/slub.c:1544 [inline]
 slab_free_freelist_hook mm/slub.c:1577 [inline]
 slab_free mm/slub.c:3142 [inline]
 kfree+0x104/0x38c mm/slub.c:4124
 io_dismantle_req fs/io_uring.c:1855 [inline]
 __io_free_req+0x70/0x254 fs/io_uring.c:1867
 io_put_req_find_next fs/io_uring.c:2173 [inline]
 __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279
 __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051
 io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063
 task_work_run+0xdc/0x128 kernel/task_work.c:151
 get_signal+0x6f8/0x980 kernel/signal.c:2562
 do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658
 do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722
 work_pending+0xc/0x180

blkdev_read_iter can truncate iov_iter's count since the count + pos may
exceed the size of the blkdev. This will confuse io_read that we have
consume the iovec. And once we do the iov_iter_revert in io_read, we
will trigger the slab-out-of-bounds. Fix it by reexpand the count with
size has been truncated.

blkdev_write_iter can trigger the problem too.
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Acked-by: NPavel Begunkov <asml.silencec@gmail.com>
Link: https://lore.kernel.org/r/20210401071807.3328235-1-yangerkun@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

cf7b39a0

mm: introduce and use mapping_empty() · 7716506a

由 Matthew Wilcox (Oracle) 提交于 5月 04, 2021

Patch series "Remove nrexceptional tracking", v2.

We actually use nrexceptional for very little these days. It's a minor
pain to keep in sync with nrpages, but the pain becomes much bigger with
the THP patches because we don't know how many indices a shadow entry
occupies. It's easier to just remove it than keep it accurate.

Also, we save 8 bytes per inode which is nothing to sneeze at; on my
laptop, it would improve shmem_inode_cache from 22 to 23 objects per
16kB, and inode_cache from 26 to 27 objects. Combined, that saves
a megabyte of memory from a combined usage of 25MB for both caches.
Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save
any memory for ext4.

This patch (of 4):

Instead of checking the two counters (nrpages and nrexceptional), we can
just check whether i_pages is empty.

Link: https://lkml.kernel.org/r/20201026151849.24232-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20201026151849.24232-2-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: NVishal Verma <vishal.l.verma@intel.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7716506a

12 4月, 2021 2 次提交

block: remove the -ERESTARTSYS handling in blkdev_get_by_dev · a8ed1a06

由 Christoph Hellwig 提交于 4月 12, 2021

Now that md has been cleaned up we can get rid of this hack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a8ed1a06

block: initialize ret in bdev_disk_changed · d173b65a

由 Christoph Hellwig 提交于 4月 08, 2021

Avoid a potentially initialized variabe in the invalidate case.

Fixes: d3c4a43d ("block: refactor blk_drop_partitions")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NKeith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20210408194140.1816537-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

d173b65a

09 4月, 2021 1 次提交

block: refactor blk_drop_partitions · d3c4a43d

由 Christoph Hellwig 提交于 4月 06, 2021

Move the busy check and disk-wide sync into the only caller, so that
the remainder can be shared with del_gendisk. Also pass the gendisk
instead of the bdev as that is all that is needed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210406062303.811835-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

d3c4a43d

02 4月, 2021 1 次提交

block: don't ignore REQ_NOWAIT for direct IO · f8b78caf

由 Pavel Begunkov 提交于 11月 20, 2020

If IOCB_NOWAIT is set on submission, then that needs to get propagated to
REQ_NOWAIT on the block side. Otherwise we completely lose this
information, and any issuer of IOCB_NOWAIT IO will potentially end up
blocking on eg request allocation on the storage side.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f8b78caf

29 3月, 2021 1 次提交

block: remove the revalidate_disk method · 0f00b82e

由 Christoph Hellwig 提交于 3月 08, 2021

No implementations left.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210308074550.422714-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

0f00b82e

23 3月, 2021 1 次提交

block: clear GD_NEED_PART_SCAN later in bdev_disk_changed · 51167840

由 Chris Chiu 提交于 3月 23, 2021

The GD_NEED_PART_SCAN is set by bdev_check_media_change to initiate
a partition scan while removing a block device. It should be cleared
after blk_drop_paritions because blk_drop_paritions could return
-EBUSY and then the consequence __blkdev_get has no chance to do
delete_partition if GD_NEED_PART_SCAN already cleared.

It causes some problems on some card readers. Ex. Realtek card
reader 0bda:0328 and 0bda:0158. The device node of the partition
will not disappear after the memory card removed. Thus the user
applications can not update the device mapping correctly.

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1920874Signed-off-by: NChris Chiu <chris.chiu@canonical.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210323085219.24428-1-chris.chiu@canonical.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

51167840

11 3月, 2021 1 次提交

block: rename BIO_MAX_PAGES to BIO_MAX_VECS · a8affc03

由 Christoph Hellwig 提交于 3月 11, 2021

Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been
horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop
confusing users of the bio API.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a8affc03

06 3月, 2021 1 次提交

block: Try to handle busy underlying device on discard · 56887cff

由 Jan Kara 提交于 2月 22, 2021

Commit 384d87ef ("block: Do not discard buffers under a mounted
filesystem") made paths issuing discard or zeroout requests to the
underlying device try to grab block device in exclusive mode. If that
failed we returned EBUSY to userspace. This however caused unexpected
fallout in userspace where e.g. FUSE filesystems issue discard requests
from userspace daemons although the device is open exclusively by the
kernel. Also shrinking of logical volume by LVM issues discard requests
to a device which may be claimed exclusively because there's another LV
on the same PV. So to avoid these userspace regressions, fall back to
invalidate_inode_pages2_range() instead of returning EBUSY to userspace
and return EBUSY only of that call fails as well (meaning that there's
indeed someone using the particular device range we are trying to
discard).

Link: https://bugzilla.kernel.org/show_bug.cgi?id=211167
Fixes: 384d87ef ("block: Do not discard buffers under a mounted filesystem")
CC: stable@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

56887cff

27 2月, 2021 1 次提交

block: Add bio_max_segs · 5f7136db

由 Matthew Wilcox (Oracle) 提交于 1月 29, 2021

It's often inconvenient to use BIO_MAX_PAGES due to min() requiring the
sign to be the same.  Introduce bio_max_segs() and change BIO_MAX_PAGES to
be unsigned to make it easier for the users.
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5f7136db

25 2月, 2021 1 次提交

fs: delete repeated words in comments · 3d742d4b

由 Randy Dunlap 提交于 2月 24, 2021

Delete duplicate words in fs/*.c.
The doubled words that are being dropped are:
  that, be, the, in, and, for

Link: https://lkml.kernel.org/r/20201224052810.25315-1-rdunlap@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3d742d4b

28 1月, 2021 2 次提交

block: use an on-stack bio in blkdev_issue_flush · c6bf3f0e

由 Christoph Hellwig 提交于 1月 26, 2021

There is no point in allocating memory for a synchronous flush.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c6bf3f0e

Revert "block: simplify set_init_blocksize" to regain lost performance · 8dc932d3

由 Maxim Mikityanskiy 提交于 1月 26, 2021

The cited commit introduced a serious regression with SATA write speed,
as found by bisecting. This patch reverts this commit, which restores
write speed back to the values observed before this commit.

The performance tests were done on a Helios4 NAS (2nd batch) with 4 HDDs
(WD8003FFBX) using dd (bs=1M count=2000). "Direct" is a test with a
single HDD, the rest are different RAID levels built over the first
partitions of 4 HDDs. Test results are in MB/s, R is read, W is write.

                | Direct | RAID0 | RAID10 f2 | RAID10 n2 | RAID6
----------------+--------+-------+-----------+-----------+--------
9011495c    | R:256  | R:313 | R:276     | R:313     | R:323
(before faulty) | W:254  | W:253 | W:195     | W:204     | W:117
----------------+--------+-------+-----------+-----------+--------
5ff9f192    | R:257  | R:398 | R:312     | R:344     | R:391
(faulty commit) | W:154  | W:122 | W:67.7    | W:66.6    | W:67.2
----------------+--------+-------+-----------+-----------+--------
5.10.10         | R:256  | R:401 | R:312     | R:356     | R:375
unpatched       | W:149  | W:123 | W:64      | W:64.1    | W:61.5
----------------+--------+-------+-----------+-----------+--------
5.10.10         | R:255  | R:396 | R:312     | R:340     | R:393
patched         | W:247  | W:274 | W:220     | W:225     | W:121

Applying this patch doesn't hurt read performance, while improves the
write speed by 1.5x - 3.5x (more impact on RAID tests). The write speed
is restored back to the state before the faulty commit, and even a bit
higher in RAID tests (which aren't HDD-bound on this device) - that is
likely related to other optimizations done between the faulty commit and
5.10.10 which also improved the read speed.
Signed-off-by: NMaxim Mikityanskiy <maxtram95@gmail.com>
Fixes: 5ff9f192 ("block: simplify set_init_blocksize")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8dc932d3

27 1月, 2021 2 次提交

block: unexport truncate_bdev_range · 2c2b9fd6

由 Christoph Hellwig 提交于 1月 09, 2021

truncate_bdev_range is only used in always built-in block layer code,
so remove the export and the !CONFIG_BLOCK stub.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2c2b9fd6

bdev: Do not return EBUSY if bdev discard races with write · 767630c6

由 Jan Kara 提交于 1月 07, 2021

blkdev_fallocate() tries to detect whether a discard raced with an
overlapping write by calling invalidate_inode_pages2_range(). However
this check can give both false negatives (when writing using direct IO
or when writeback already writes out the written pagecache range) and
false positives (when write is not actually overlapping but ends in the
same page when blocksize < pagesize). This actually causes issues for
qemu which is getting confused by EBUSY errors.

Fix the problem by removing this conflicting write detection since it is
inherently racy and thus of little use anyway.
Reported-by: NMaxim Levitsky <mlevitsk@redhat.com>
CC: "Darrick J. Wong" <darrick.wong@oracle.com>
Link: https://lore.kernel.org/qemu-devel/20201111153913.41840-1-mlevitsk@redhat.comSigned-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

767630c6

25 1月, 2021 1 次提交

bio: add a helper calculating nr segments to alloc · 3e1a88ec

由 Pavel Begunkov 提交于 1月 09, 2021

Add a helper function calculating the number of bvec segments we need to
allocate to construct a bio. It doesn't change anything functionally,
but will be used to not duplicate special cases in the future.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3e1a88ec

08 1月, 2021 2 次提交

block: pre-initialize struct block_device in bdev_alloc_inode · 2d2f6f1b

由 Christoph Hellwig 提交于 1月 07, 2021

bdev_evict_inode and bdev_free_inode are also called for the root inode
of bdevfs, for which bdev_alloc is never called.  Move the zeroing o
f struct block_device and the initialization of the bd_bdi field into
bdev_alloc_inode to make sure they are initialized for the root inode
as well.

Fixes: e6cb5382 ("block: initialize struct block_device in bdev_alloc")
Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2d2f6f1b

fs: Fix freeze_bdev()/thaw_bdev() accounting of bd_fsfreeze_sb · 04a6a536

由 Satya Tangirala 提交于 12月 24, 2020

freeze/thaw_bdev() currently use bdev->bd_fsfreeze_count to infer
whether or not bdev->bd_fsfreeze_sb is valid (it's valid iff
bd_fsfreeze_count is non-zero). thaw_bdev() doesn't nullify
bd_fsfreeze_sb.

But this means a freeze_bdev() call followed by a thaw_bdev() call can
leave bd_fsfreeze_sb with a non-null value, while bd_fsfreeze_count is
zero. If freeze_bdev() is called again, and this time
get_active_super() returns NULL (e.g. because the FS is unmounted),
we'll end up with bd_fsfreeze_count > 0, but bd_fsfreeze_sb is
*untouched* - it stays the same (now garbage) value. A subsequent
thaw_bdev() will decide that the bd_fsfreeze_sb value is legitimate
(since bd_fsfreeze_count > 0), and attempt to use it.

Fix this by always setting bd_fsfreeze_sb to NULL when
bd_fsfreeze_count is successfully decremented to 0 in thaw_sb().
Alternatively, we could set bd_fsfreeze_sb to whatever
get_active_super() returns in freeze_bdev() whenever bd_fsfreeze_count
is successfully incremented to 1 from 0 (which can be achieved cleanly
by moving the line currently setting bd_fsfreeze_sb to immediately
after the "sync:" label, but it might be a little too subtle/easily
overlooked in future).

This fixes the currently panicking xfstests generic/085.

Fixes: 040f04bd ("fs: simplify freeze_bdev/thaw_bdev")
Signed-off-by: NSatya Tangirala <satyat@google.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

04a6a536

30 12月, 2020 1 次提交

fs: block_dev.c: fix kernel-doc warnings from struct block_device changes · 875b2376

由 Randy Dunlap 提交于 12月 28, 2020

Fix new kernel-doc warnings in fs/block_dev.c:

../fs/block_dev.c:1066: warning: Excess function parameter 'whole' description in 'bd_abort_claiming'
../fs/block_dev.c:1837: warning: Function parameter or member 'dev' not described in 'lookup_bdev'

Fixes: 4e7b5671 ("block: remove i_bdev")
Fixes: 37c3fc9a ("block: simplify the block device claiming interface")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-fsdevel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

875b2376

22 12月, 2020 1 次提交

block: update some copyrights · 7b51e703

由 Christoph Hellwig 提交于 12月 10, 2020

Update copyrights for files that have gotten some major rewrites lately.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7b51e703

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功