提交 · 48b4b4ff1ee044a977929bcf80e79f8212f756b4 · openeuler / Kernel

27 1月, 2020 1 次提交

block: allow partitions on host aware zone devices · b7205307

由 Christoph Hellwig 提交于 1月 26, 2020

Host-aware SMR drives can be used with the commands to explicitly manage
zone state, but they can also be used as normal disks. In the former
case it makes perfect sense to allow partitions on them, in the latter
it does not, just like for host managed devices. Add a check to
add_partition to allow partitions on host aware devices, but give
up any zone management capabilities in that case, which also catches
the previously missed case of adding a partition vs just scanning it.

Because sd can rescan the attribute at runtime it needs to check if
a disk has partitions, for which a new helper is added to genhd.h.

Fixes: 5eac3eb3 ("block: Remove partition support for zoned block devices")
Reported-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7205307

24 1月, 2020 1 次提交

partitions/ldm: fix spelling mistake "to" -> "too" · 5336da37

由 Colin Ian King 提交于 1月 23, 2020

There is a spelling mistake in a ldm_error message. Fix it.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5336da37

23 1月, 2020 2 次提交

block, bfq: improve arithmetic division in bfq_delta() · 554d21ef

由 Wen Yang 提交于 1月 20, 2020

do_div() does a 64-by-32 division. Use div64_ul() instead of it
if the divisor is unsigned long, to avoid truncation to 32-bit.
And as a nice side effect also cleans up the function a bit.
Signed-off-by: NWen Yang <wenyang@linux.alibaba.com>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

554d21ef

block/bfq: remove unused bfq_class_rt which never used · b7f22d99

由 Alex Shi 提交于 1月 21, 2020

This macro is never used after introduced from commit aee69d78
("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")

Better to remove it.
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7f22d99

16 1月, 2020 1 次提交

block: fix an integer overflow in logical block size · ad6bf88a

由 Mikulas Patocka 提交于 1月 15, 2020

Logical block size has type unsigned short. That means that it can be at
most 32768. However, there are architectures that can run with 64k pages
(for example arm64) and on these architectures, it may be possible to
create block devices with 64k block size.

For exmaple (run this on an architecture with 64k pages):

Mount will fail with this error because it tries to read the superblock using 2-sector
access:
  device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
  EXT4-fs (dm-0): unable to read superblock

This patch changes the logical block size from unsigned short to unsigned
int to avoid the overflow.

Cc: stable@vger.kernel.org
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ad6bf88a

15 1月, 2020 1 次提交

block: fix get_max_segment_size() overflow on 32bit arch · 4a2f704e

由 Ming Lei 提交于 1月 11, 2020

Commit 429120f3 starts to take account of segment's start dma address
when computing max segment size, and data type of 'unsigned long'
is used to do that. However, the segment mask may be 0xffffffff, so
the figured out segment size may be overflowed in case of zero physical
address on 32bit arch.

Fix the issue by returning queue_max_segment_size() directly when that
happens.

Fixes: 429120f3 ("block: fix splitting segments on boundary masks")
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Christoph Hellwig <hch@lst.de>
Tested-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4a2f704e

09 1月, 2020 2 次提交

fs: move guard_bio_eod() after bio_set_op_attrs · 83c9c547

由 Ming Lei 提交于 1月 05, 2020

Commit 85a8ce62 ("block: add bio_truncate to fix guard_bio_eod")
adds bio_truncate() for handling bio EOD. However, bio_truncate()
doesn't use the passed 'op' parameter from guard_bio_eod's callers.

So bio_trunacate() may retrieve wrong 'op', and zering pages may
not be done for READ bio.

Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs()
in submit_bh_wbc() so that bio_truncate() can always retrieve correct
op info.

Meantime remove the 'op' parameter from guard_bio_eod() because it isn't
used any more.

Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Fixes: 85a8ce62 ("block: add bio_truncate to fix guard_bio_eod")
Signed-off-by: NMing Lei <ming.lei@redhat.com>

Fold in kerneldoc and bio_op() change.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

83c9c547

block: mark zone-mgmt bios with REQ_SYNC · 8e42d239

由 Chaitanya Kulkarni 提交于 1月 07, 2020

In the current implementation, final zone-mgmt request is issued with
submit_bio_wait() which marks the bio REQ_SYNC. This is needed since
immediate action is expected for zone-mgmt requests as these are
blocking operations. This also bypasses the scheduler in the
blk_mq_make_request() and dispatches the request directly into the
hw ctx.

This patch marks all the chained bios REQ_SYNC so that we can have
above-mentioned behavior for non-final bios also.
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8e42d239

07 1月, 2020 2 次提交

blk-mq: Document functions for sending request · 105663f7

由 André Almeida 提交于 1月 06, 2020

Add or improve documentation for function regarding creating and sending
IO requests to the hardware.
Signed-off-by: NAndré Almeida <andrealmeid@collabora.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

105663f7

block: Allow t10-pi to be modular · a754bd5f

由 Herbert Xu 提交于 12月 23, 2019

Currently t10-pi can only be built into the block layer which via
crc-t10dif pulls in a whole chunk of the Crypto API.  In fact all
users of t10-pi work as modules and there is no reason for it to
always be built-in.

This patch adds a new hidden option for t10-pi that is selected
automatically based on BLK_DEV_INTEGRITY and whether the users
of t10-pi are built-in or not.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a754bd5f

30 12月, 2019 1 次提交

block: fix splitting segments on boundary masks · 429120f3

由 Ming Lei 提交于 12月 29, 2019

We ran into a problem with a mpt3sas based controller, where we would
see random (and hard to reproduce) file corruption). The issue seemed
specific to this controller, but wasn't specific to the file system.
After a lot of debugging, we find out that it's caused by segments
spanning a 4G memory boundary. This shouldn't happen, as the default
setting for segment boundary masks is 4G.

Turns out there are two issues in get_max_segment_size():

1) The default segment boundary mask is bypassed

2) The segment start address isn't taken into account when checking
   segment boundary limit

Fix these two issues by removing the bypass of the segment boundary
check even if the mask is set to the default value, and taking into
account the actual start address of the request when checking if a
segment needs splitting.

Cc: stable@vger.kernel.org # v5.1+
Reviewed-by: NChris Mason <clm@fb.com>
Tested-by: NChris Mason <clm@fb.com>
Fixes: dcebd755 ("block: use bio_for_each_bvec() to compute multi-page bvec count")
Signed-off-by: NMing Lei <ming.lei@redhat.com>

Dropped const on the page pointer, ppc page_to_phys() doesn't mark the
page as const...
Signed-off-by: NJens Axboe <axboe@kernel.dk>

429120f3

29 12月, 2019 1 次提交

block: add bio_truncate to fix guard_bio_eod · 85a8ce62

由 Ming Lei 提交于 12月 28, 2019

Some filesystem, such as vfat, may send bio which crosses device boundary,
and the worse thing is that the IO request starting within device boundaries
can contain more than one segment past EOD.

Commit dce30ca9 ("fs: fix guard_bio_eod to check for real EOD errors")
tries to fix this issue by returning -EIO for this situation. However,
this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb()
may hang for ever.

Also the current truncating on last segment is dangerous by updating the
last bvec, given bvec table becomes not immutable any more, and fs bio
users may not retrieve the truncated pages via bio_for_each_segment_all() in
its .end_io callback.

Fixes this issue by supporting multi-segment truncating. And the
approach is simpler:

- just update bio size since block layer can make correct bvec with
the updated bio size. Then bvec table becomes really immutable.

- zero all truncated segments for read bio

Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Fixed-by: dce30ca9 ("fs: fix guard_bio_eod to check for real EOD errors")
Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

85a8ce62

21 12月, 2019 7 次提交

compat_ioctl: block: handle Persistent Reservations · b2c0fcd2

由 Arnd Bergmann 提交于 11月 29, 2019

These were added to blkdev_ioctl() in linux-5.5 but not
blkdev_compat_ioctl, so add them now.

Cc: <stable@vger.kernel.org> # v4.4+
Fixes: bbd3e064 ("block: add an API for Persistent Reservations")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

Fold in followup patch from Arnd with missing pr.h header include.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b2c0fcd2

compat_ioctl: block: handle add zone open, close and finish ioctl · 4b43f31d

由 Arnd Bergmann 提交于 11月 29, 2019

These were added to blkdev_ioctl() in linux-5.5 but not
blkdev_compat_ioctl, so add them now.

Fixes: e876df1f ("block: add zone open, close and finish ioctl support")
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4b43f31d

compat_ioctl: block: handle BLKGETZONESZ/BLKGETNRZONES · 21d37340

由 Arnd Bergmann 提交于 11月 29, 2019

These were added to blkdev_ioctl() in v4.20 but not blkdev_compat_ioctl,
so add them now.

Cc: <stable@vger.kernel.org> # v4.20+
Fixes: 72cd8757 ("block: Introduce BLKGETZONESZ ioctl")
Fixes: 65e4e3ee ("block: Introduce BLKGETNRZONES ioctl")
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

21d37340

compat_ioctl: block: handle BLKREPORTZONE/BLKRESETZONE · 673bdf8c

由 Arnd Bergmann 提交于 11月 29, 2019

These were added to blkdev_ioctl() but not blkdev_compat_ioctl,
so add them now.

Cc: <stable@vger.kernel.org> # v4.10+
Fixes: 3ed05a98 ("blk-zoned: implement ioctls")
Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

673bdf8c

block: fix memleak when __blk_rq_map_user_iov() is failed · 3b7995a9

由 Yang Yingliang 提交于 12月 18, 2019

When I doing fuzzy test, get the memleak report:

BUG: memory leak
unreferenced object 0xffff88837af80000 (size 4096):
  comm "memleak", pid 3557, jiffies 4294817681 (age 112.499s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    20 00 00 00 10 01 00 00 00 00 00 00 01 00 00 00   ...............
  backtrace:
    [<000000001c894df8>] bio_alloc_bioset+0x393/0x590
    [<000000008b139a3c>] bio_copy_user_iov+0x300/0xcd0
    [<00000000a998bd8c>] blk_rq_map_user_iov+0x2f1/0x5f0
    [<000000005ceb7f05>] blk_rq_map_user+0xf2/0x160
    [<000000006454da92>] sg_common_write.isra.21+0x1094/0x1870
    [<00000000064bb208>] sg_write.part.25+0x5d9/0x950
    [<000000004fc670f6>] sg_write+0x5f/0x8c
    [<00000000b0d05c7b>] __vfs_write+0x7c/0x100
    [<000000008e177714>] vfs_write+0x1c3/0x500
    [<0000000087d23f34>] ksys_write+0xf9/0x200
    [<000000002c8dbc9d>] do_syscall_64+0x9f/0x4f0
    [<00000000678d8e9a>] entry_SYSCALL_64_after_hwframe+0x49/0xbe

If __blk_rq_map_user_iov() is failed in blk_rq_map_user_iov(),
the bio(s) which is allocated before this failing will leak. The
refcount of the bio(s) is init to 1 and increased to 2 by calling
bio_get(), but __blk_rq_unmap_user() only decrease it to 1, so
the bio cannot be freed. Fix it by calling blk_rq_unmap_user().
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3b7995a9

block: Fix a lockdep complaint triggered by request queue flushing · b3c6a599

由 Bart Van Assche 提交于 12月 17, 2019

Avoid that running test nvme/012 from the blktests suite triggers the
following false positive lockdep complaint:

============================================
WARNING: possible recursive locking detected
5.0.0-rc3-xfstests-00015-g1236f7d60242 #841 Not tainted
--------------------------------------------
ksoftirqd/1/16 is trying to acquire lock:
000000000282032e (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

but task is already holding lock:
00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&fq->mq_flush_lock)->rlock);
  lock(&(&fq->mq_flush_lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

1 lock held by ksoftirqd/1/16:
 #0: 00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0

stack backtrace:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3-xfstests-00015-g1236f7d60242 #841
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 dump_stack+0x67/0x90
 __lock_acquire.cold.45+0x2b4/0x313
 lock_acquire+0x98/0x160
 _raw_spin_lock_irqsave+0x3b/0x80
 flush_end_io+0x4e/0x1d0
 blk_mq_complete_request+0x76/0x110
 nvmet_req_complete+0x15/0x110 [nvmet]
 nvmet_bio_done+0x27/0x50 [nvmet]
 blk_update_request+0xd7/0x2d0
 blk_mq_end_request+0x1a/0x100
 blk_flush_complete_seq+0xe5/0x350
 flush_end_io+0x12f/0x1d0
 blk_done_softirq+0x9f/0xd0
 __do_softirq+0xca/0x440
 run_ksoftirqd+0x24/0x50
 smpboot_thread_fn+0x113/0x1e0
 kthread+0x121/0x140
 ret_from_fork+0x3a/0x50

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b3c6a599

block: Fix the type of 'sts' in bsg_queue_rq() · c44a4edb

由 Bart Van Assche 提交于 12月 17, 2019

This patch fixes the following sparse warnings:

block/bsg-lib.c:269:19: warning: incorrect type in initializer (different base types)
block/bsg-lib.c:269:19:    expected int sts
block/bsg-lib.c:269:19:    got restricted blk_status_t [usertype]
block/bsg-lib.c:286:16: warning: incorrect type in return expression (different base types)
block/bsg-lib.c:286:16:    expected restricted blk_status_t
block/bsg-lib.c:286:16:    got int [assigned] sts

Cc: Martin Wilck <mwilck@suse.com>
Fixes: d46fe2cb ("block: drop device references in bsg_queue_rq()")
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c44a4edb

19 12月, 2019 2 次提交

blk-mq: optimise blk_mq_flush_plug_list() · 95ed0c5b

由 Pavel Begunkov 提交于 11月 29, 2019

Instead of using list_del_init() in a loop, that generates a lot of
unnecessary memory read/writes, iterate from the first request of a
batch and cut out a sublist with list_cut_before().

Apart from removing the list node initialisation part, this is more
register-friendly, and the assembly uses the stack less intensively.

list_empty() at the beginning is done with hope, that the compiler can
optimise out the same check in the following list_splice_init().
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

95ed0c5b

blk-mq: optimise rq sort function · 7d30a621

由 Pavel Begunkov 提交于 11月 29, 2019

Check "!=" in multi-layer comparisons. The same memory usage, fewer
instructions, and 2 from 4 jumps are replaced with SETcc.

Note, that list_sort() doesn't differ 0 and <0.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7d30a621

18 12月, 2019 1 次提交

block: end bio with BLK_STS_AGAIN in case of non-mq devs and REQ_NOWAIT · c58c1f83

由 Roman Penyaev 提交于 12月 17, 2019

Non-mq devs do not honor REQ_NOWAIT so give a chance to the caller to repeat
request gracefully on -EAGAIN error.

The problem is well reproduced using io_uring:

   mkfs.ext4 /dev/ram0
   mount /dev/ram0 /mnt

   # Preallocate a file
   dd if=/dev/zero of=/mnt/file bs=1M count=1

   # Start fio with io_uring and get -EIO
   fio --rw=write --ioengine=io_uring --size=1M --direct=1 --name=job --filename=/mnt/file
Signed-off-by: NRoman Penyaev <rpenyaev@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c58c1f83

17 12月, 2019 1 次提交

iocost: over-budget forced IOs should schedule async delay · d7bd15a1

由 Tejun Heo 提交于 12月 16, 2019

When over-budget IOs are force-issued through root cgroup,
iocg_kick_delay() adjusts the async delay accordingly but doesn't
actually schedule async throttle for the issuing task.  This bug is
pretty well masked because sooner or later the offending threads are
gonna get directly throttled on regular IOs or have async delay
scheduled by mem_cgroup_throttle_swaprate().

However, it can affect control quality on filesystem metadata heavy
operations.  Let's fix it by invoking blkcg_schedule_throttle() when
iocg_kick_delay() says async delay is needed.
Signed-off-by: NTejun Heo <tj@kernel.org>
Fixes: 7caa4715 ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org
Reported-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d7bd15a1

13 12月, 2019 1 次提交

blk-cgroup: remove blkcg_drain_queue · 5addeae1

由 Guoqing Jiang 提交于 12月 12, 2019

Since blk_drain_queue had already been removed, so this function
is not needed anymore.
Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5addeae1

12 12月, 2019 1 次提交

block: fix NULL pointer dereference in account statistics with IDE · ecb6186c

由 Logan Gunthorpe 提交于 12月 10, 2019

The IDE driver creates some passthru requests which never get
submitted to the block layer in such a way that blk_account_io_start()
gets called. However, the driver still calls __blk_mq_end_request() in
ide_end_rq() which will call blk_account_io_completion() which tries
to dereferences req->part which is never set. See ide_prep_sense() for
an example of where these requests come from.

To fix this, blk_account_io_completion() and blk_account_io_done()
should do nothing if req->part is not set.

The back trace of this bug is:

    BUG: kernel NULL pointer dereference, address: 000002ac
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    *pde = 00000000
    Oops: 0002 [#1]
    CPU: 0 PID: 237 Comm: kworker/0:1H Not tainted
    5.4.0-rc2-00011-g48d9b0d4 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1
    04/01/2014
    Workqueue: kblockd drive_rq_insert_work
    EIP: blk_account_io_completion+0x7a/0xf0
    Code: 89 54 24 08 31 d2 89 4c 24 04 31 c9 c7 04 24 02 00 00 00 c1 ee
    09 e8 f5 21 a6 ff e8 70 5c a7 ff 8b 53 60 8d 04 bd 00 00 00 00 <01> b4
    02 ac 02 00 00 8b 9a 88 02 00 00 85 db 74 11 85 d2 74 51 8b
    EAX: 00000000 EBX: f5b80000 ECX: 00000000 EDX: 00000000
    ESI: 00000000 EDI: 00000000 EBP: f3031e70 ESP: f3031e54
    DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010046
    CR0: 80050033 CR2: 000002ac CR3: 03c25000 CR4: 000406d0
    Call Trace:
     <IRQ>
      blk_update_request+0x85/0x420
      ide_end_rq+0x38/0xa0
      ide_complete_rq+0x3d/0x70
      cdrom_newpc_intr+0x258/0xba0
      ide_intr+0x135/0x250
      __handle_irq_event_percpu+0x3e/0x250
      handle_irq_event_percpu+0x1f/0x50
      handle_irq_event+0x32/0x60
      handle_level_irq+0x6c/0x110
      handle_irq+0x72/0xa0
      </IRQ>
      do_IRQ+0x45/0xad
      common_interrupt+0x115/0x11c

Fixes: 48d9b0d4 ("block: account statistics for passthrough requests")
Reported-by: Nkernel test robot <rong.a.chen@intel.com>
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ecb6186c

10 12月, 2019 2 次提交

block: fix "check bi_size overflow before merge" · cc90bc68

由 Andreas Gruenbacher 提交于 12月 09, 2019

This partially reverts commit e3a5d8e3.

Commit e3a5d8e3 ("check bi_size overflow before merge") adds a bio_full
check to __bio_try_merge_page.  This will cause __bio_try_merge_page to fail
when the last bi_io_vec has been reached.  Instead, what we want here is only
the bi_size overflow check.

Fixes: e3a5d8e3 ("block: check bi_size overflow before merge")
Cc: stable@vger.kernel.org # v5.4+
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cc90bc68

treewide: Use sizeof_field() macro · c593642c

由 Pankaj Bharadiya 提交于 12月 09, 2019

Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().

This patch is generated using following script:

EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"

git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do

	if [[ "$file" =~ $EXCLUDE_FILES ]]; then
		continue
	fi
	sed -i  -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done
Signed-off-by: NPankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.comCo-developed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net

c593642c

06 12月, 2019 1 次提交

block: fix memleak of bio integrity data · ece841ab

由 Justin Tee 提交于 12月 05, 2019

7c20f116 ("bio-integrity: stop abusing bi_end_io") moves
bio_integrity_free from bio_uninit() to bio_integrity_verify_fn()
and bio_endio(). This way looks wrong because bio may be freed
without calling bio_endio(), for example, blk_rq_unprep_clone() is
called from dm_mq_queue_rq() when the underlying queue of dm-mpath
is busy.

So memory leak of bio integrity data is caused by commit 7c20f116.

Fixes this issue by re-adding bio_integrity_free() to bio_uninit().

Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by Justin Tee <justin.tee@broadcom.com>

Add commit log, and simplify/fix the original patch wroten by Justin.
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ece841ab

05 12月, 2019 1 次提交

bfq-iosched: Ensure bio->bi_blkg is valid before using it · 08802ed6

由 Hou Tao 提交于 12月 05, 2019

bio->bi_blkg will be NULL when the issue of the request
has bypassed the block layer as shown in the following oops:

 Internal error: Oops: 96000005 [#1] SMP
 CPU: 17 PID: 2996 Comm: scsi_id Not tainted 5.4.0 #4
 Call trace:
  percpu_counter_add_batch+0x38/0x4c8
  bfqg_stats_update_legacy_io+0x9c/0x280
  bfq_insert_requests+0xbac/0x2190
  blk_mq_sched_insert_request+0x288/0x670
  blk_execute_rq_nowait+0x140/0x178
  blk_execute_rq+0x8c/0x140
  sg_io+0x604/0x9c0
  scsi_cmd_ioctl+0xe38/0x10a8
  scsi_cmd_blk_ioctl+0xac/0xe8
  sd_ioctl+0xe4/0x238
  blkdev_ioctl+0x590/0x20e0
  block_ioctl+0x60/0x98
  do_vfs_ioctl+0xe0/0x1b58
  ksys_ioctl+0x80/0xd8
  __arm64_sys_ioctl+0x40/0x78
  el0_svc_handler+0xc4/0x270

so ensure its validity before using it.

Fixes: fd41e603 ("bfq-iosched: stop using blkg->stat_bytes and ->stat_ios")
Signed-off-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

08802ed6

04 12月, 2019 1 次提交

block: set the zone size in blk_revalidate_disk_zones atomically · 6c6b3549

由 Christoph Hellwig 提交于 12月 03, 2019

The current zone revalidation code has a major problem in that it
doesn't update the zone size and q->nr_zones atomically, leading
to a short window where an out of bounds access to the zone arrays
is possible.

To fix this move the setting of the zone size into the crticial
sections blk_revalidate_disk_zones so that it gets updated together
with the zone bitmaps and q->nr_zones.  This also slightly simplifies
the caller as it deducts the zone size from the report_zones.

This change also allows to check for a power of two zone size in generic
code.
Reported-by: NHans Holmberg <hans@owltronix.com>
Reviewed-by: NJavier González <javier@javigon.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6c6b3549

03 12月, 2019 5 次提交

block: don't handle bio based drivers in blk_revalidate_disk_zones · ae58954d

由 Christoph Hellwig 提交于 12月 03, 2019

bio based drivers only need to update q->nr_zones.  Do that manually
instead of overloading blk_revalidate_disk_zones to keep that function
simpler for the next round of changes that will rely even more on the
request based functionality.
Reviewed-by: NJavier González <javier@javigon.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ae58954d

block: allocate the zone bitmaps lazily · e94f5819

由 Christoph Hellwig 提交于 12月 03, 2019

Allocate the conventional zone bitmap and the sequential zone locking
bitmap only when we find a zone of the respective type.  This avoids
wasting memory on the conventional zone bitmap for devices that only
have sequential zones, and will also prepare for other future changes.
Reviewed-by: NJavier González <javier@javigon.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e94f5819

block: replace seq_zones_bitmap with conv_zones_bitmap · f216fdd7

由 Christoph Hellwig 提交于 12月 03, 2019

Invert the meaning of seq_zones_bitmap by keeping a bitmap of
conventional zones.  This allows not having a bitmap for devices
that do not have conventional zones.
Reviewed-by: NJavier González <javier@javigon.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f216fdd7

block: simplify blkdev_nr_zones · 9b38bb4b

由 Christoph Hellwig 提交于 12月 03, 2019

Simplify the arguments to blkdev_nr_zones by passing a gendisk instead
of the block_device and capacity.  This also removes the need for
__blkdev_nr_zones as all callers are outside the fast path and can
deal with the additional branch.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9b38bb4b

C
block: remove the empty line at the end of blk-zoned.c · bb556282
由 Christoph Hellwig 提交于 12月 03, 2019
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
```
bb556282

22 11月, 2019 2 次提交

Revert "block: split bio if the only bvec's length is > SZ_4K" · 1e279153

由 Jens Axboe 提交于 11月 21, 2019

We really don't need this, as the slow path will do the right thing
anyway.

This reverts commit 6952a7f8.
Signed-off-by: NJens Axboe <axboe@kernel.dk>

1e279153

block: add iostat counters for flush requests · b6866318

由 Konstantin Khlebnikov 提交于 11月 21, 2019

Requests that triggers flushing volatile writeback cache to disk (barriers)
have significant effect to overall performance.

Block layer has sophisticated engine for combining several flush requests
into one. But there is no statistics for actual flushes executed by disk.
Requests which trigger flushes usually are barriers - zero-size writes.

This patch adds two iostat counters into /sys/class/block/$dev/stat and
/proc/diskstats - count of completed flush requests and their total time.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b6866318

21 11月, 2019 1 次提交

block,bfq: Skip tracing hooks if possible · 40d47c15

由 Dmitry Monakhov 提交于 11月 01, 2019

In most cases blk_tracing is not active, but  bfq_log_bfqq macro
generate pid_str unconditionally, which result in significant overhead.

## Test
modprobe null_blk
echo bfq > /sys/block/nullb0/queue/scheduler
fio --name=t --ioengine=libaio --direct=1 --filename=/dev/nullb0 \
   --runtime=30 --time_based=1 --rw=write --iodepth=128 --bs=4k

# Results
|        | baseline | w/ patch | gain |
| iops   | 113.19K  | 126.42K  | +11% |
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NDmitry Monakhov <dmonakhov@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

40d47c15

19 11月, 2019 1 次提交

block: sed-opal: Introduce SUM_SET_LIST parameter and append it using 'add_token_u64' · c6da429e

由 Revanth Rajashekar 提交于 11月 08, 2019

In function 'activate_lsp', rather than hard-coding the short atom
header(0x83), we need to let the function 'add_short_atom_header' append
the header based on the parameter being appended.

The parameter has been defined in Section 3.1.2.1 of
https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage-Opal_Feature_Set_Single_User_Mode_v1-00_r1-00-Final.pdfReviewed-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NRevanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c6da429e

18 11月, 2019 1 次提交

block: Don't disable interrupts in trigger_softirq() · de678bc6

由 Sebastian Andrzej Siewior 提交于 11月 18, 2019

trigger_softirq() is always invoked as a SMP-function call which is
always invoked with disables interrupts.

Don't disable interrupt in trigger_softirq() because interrupts are
already disabled.
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de678bc6

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功