提交 · c6a564ffadc9105880329710164ee493f0de103c · openeuler / Kernel

25 3月, 2020 1 次提交

block: move the part_stat* helpers from genhd.h to a new header · c6a564ff

由 Christoph Hellwig 提交于 3月 25, 2020

These macros are just used by a few files.  Move them out of genhd.h,
which is included everywhere into a new standalone header.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c6a564ff

19 3月, 2020 2 次提交

xen-blkfront.c: Convert to use set_capacity_revalidate_and_notify · 3cbc28bb

由 Balbir Singh 提交于 3月 13, 2020

block/genhd provides set_capacity_revalidate_and_notify() for
sending RESIZE notifications via uevents.
Signed-off-by: NBalbir Singh <sblbir@amazon.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3cbc28bb

virtio_blk.c: Convert to use set_capacity_revalidate_and_notify · 662155e2

由 Balbir Singh 提交于 3月 13, 2020

block/genhd provides set_capacity_revalidate_and_notify() for sending RESIZE
notifications via uevents.
Signed-off-by: NBalbir Singh <sblbir@amazon.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

662155e2

10 3月, 2020 5 次提交

null_blk: Add support for init_hctx() fault injection · 596444e7

由 Bart Van Assche 提交于 3月 09, 2020

This makes it possible to test the error path in blk_mq_realloc_hw_ctxs()
and also several error paths in null_blk.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

596444e7

null_blk: Handle null_add_dev() failures properly · 9b03b713

由 Bart Van Assche 提交于 3月 09, 2020

If null_add_dev() fails then null_del_dev() is called with a NULL argument.
Make null_del_dev() handle this scenario correctly. This patch fixes the
following KASAN complaint:

null-ptr-deref in null_del_dev+0x28/0x280 [null_blk]
Read of size 8 at addr 0000000000000000 by task find/1062

Call Trace:
 dump_stack+0xa5/0xe6
 __kasan_report.cold+0x65/0x99
 kasan_report+0x16/0x20
 __asan_load8+0x58/0x90
 null_del_dev+0x28/0x280 [null_blk]
 nullb_group_drop_item+0x7e/0xa0 [null_blk]
 client_drop_item+0x53/0x80 [configfs]
 configfs_rmdir+0x395/0x4e0 [configfs]
 vfs_rmdir+0xb6/0x220
 do_rmdir+0x238/0x2c0
 __x64_sys_unlinkat+0x75/0x90
 do_syscall_64+0x6f/0x2f0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9b03b713

null_blk: Fix the null_add_dev() error path · 2004bfde

由 Bart Van Assche 提交于 3月 09, 2020

If null_add_dev() fails, clear dev->nullb.

This patch fixes the following KASAN complaint:

BUG: KASAN: use-after-free in nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
Read of size 8 at addr ffff88803280fc30 by task check/8409

Call Trace:
 dump_stack+0xa5/0xe6
 print_address_description.constprop.0+0x26/0x260
 __kasan_report.cold+0x7b/0x99
 kasan_report+0x16/0x20
 __asan_load8+0x58/0x90
 nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
 configfs_write_file+0x1c4/0x250 [configfs]
 __vfs_write+0x4c/0x90
 vfs_write+0x145/0x2c0
 ksys_write+0xd7/0x180
 __x64_sys_write+0x47/0x50
 do_syscall_64+0x6f/0x2f0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7ff370926317
Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007fff2dd2da48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff370926317
RDX: 0000000000000002 RSI: 0000559437ef23f0 RDI: 0000000000000001
RBP: 0000559437ef23f0 R08: 000000000000000a R09: 0000000000000001
R10: 0000559436703471 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ff370a006a0 R14: 00007ff370a014a0 R15: 00007ff370a008a0

Allocated by task 8409:
 save_stack+0x23/0x90
 __kasan_kmalloc.constprop.0+0xcf/0xe0
 kasan_kmalloc+0xd/0x10
 kmem_cache_alloc_node_trace+0x129/0x4c0
 null_add_dev+0x24a/0xe90 [null_blk]
 nullb_device_power_store+0x1b6/0x270 [null_blk]
 configfs_write_file+0x1c4/0x250 [configfs]
 __vfs_write+0x4c/0x90
 vfs_write+0x145/0x2c0
 ksys_write+0xd7/0x180
 __x64_sys_write+0x47/0x50
 do_syscall_64+0x6f/0x2f0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 8409:
 save_stack+0x23/0x90
 __kasan_slab_free+0x112/0x160
 kasan_slab_free+0x12/0x20
 kfree+0xdf/0x250
 null_add_dev+0xaf3/0xe90 [null_blk]
 nullb_device_power_store+0x1b6/0x270 [null_blk]
 configfs_write_file+0x1c4/0x250 [configfs]
 __vfs_write+0x4c/0x90
 vfs_write+0x145/0x2c0
 ksys_write+0xd7/0x180
 __x64_sys_write+0x47/0x50
 do_syscall_64+0x6f/0x2f0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: 2984c868 ("nullb: factor disk parameters")
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2004bfde

null_blk: Fix changing the number of hardware queues · 78b10be2

由 Bart Van Assche 提交于 3月 09, 2020

Instead of initializing null_blk hardware queues explicitly after the
request queue has been created, provide .init_hctx() and .exit_hctx()
callback functions. The latter functions are not only called during
request queue allocation but also when the number of hardware queues
changes. Allocate nr_cpu_ids queues during initialization to support
increasing the number of hardware queues above the initial hardware
queue count.

This change fixes increasing the number of hardware queues above the
initial number of hardware queues and also keeps nullb->nr_queues in
sync with the number of hardware queues.

Fixes: 45919fbf ("null_blk: Enable modifying 'submit_queues' after an instance has been configured")
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

78b10be2

null_blk: Suppress an UBSAN complaint triggered when setting 'memory_backed' · b9853b4d

由 Bart Van Assche 提交于 3月 09, 2020

Although it is not clear to me why UBSAN complains when 'memory_backed'
is set, this patch suppresses the UBSAN complaint that is triggered when
setting that configfs attribute.

UBSAN: Undefined behaviour in drivers/block/null_blk_main.c:327:1
load of value 16 is not a valid value for type '_Bool'
CPU: 2 PID: 8396 Comm: check Not tainted 5.6.0-rc1-dbg+ #14
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 dump_stack+0xa5/0xe6
 ubsan_epilogue+0x9/0x26
 __ubsan_handle_load_invalid_value+0x6d/0x76
 nullb_device_memory_backed_store.cold+0x2c/0x38 [null_blk]
 configfs_write_file+0x1c4/0x250 [configfs]
 __vfs_write+0x4c/0x90
 vfs_write+0x145/0x2c0
 ksys_write+0xd7/0x180
 __x64_sys_write+0x47/0x50
 do_syscall_64+0x6f/0x2f0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b9853b4d

05 3月, 2020 1 次提交

xen/blkfront: fix ring info addressing · 4ab50af6

由 Juergen Gross 提交于 3月 05, 2020

Commit 0265d6e8 ("xen/blkfront: limit allocated memory size to
actual use case") made struct blkfront_ring_info size dynamic. This is
fine when running with only one queue, but with multiple queues the
addressing of the single queues has to be adapted as the structs are
allocated in an array.

Fixes: 0265d6e8 ("xen/blkfront: limit allocated memory size to actual use case")
Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
Tested-by: NSander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Link: https://lore.kernel.org/r/20200305155129.28326-1-jgross@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

4ab50af6

26 2月, 2020 1 次提交

null_blk: remove unused fields in 'nullb_cmd' · 93d7c318

由 Dongli Zhang 提交于 2月 24, 2020

'list', 'll_list' and 'csd' are no longer used.

The 'list' is not used since it was introduced by commit f2298c04
("null_blk: multi queue aware block test driver").

The 'll_list' is no longer used since commit 3c395a96 ("null_blk: set a
separate timer for each command").

The 'csd' is no longer used since commit ce2c350b ("null_blk: use
blk_complete_request and blk_mq_complete_request").
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

93d7c318

25 2月, 2020 2 次提交

scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places · 03264ddd

由 Adam Williamson 提交于 2月 19, 2020

Arnd Bergmann inadvertently typoed these in d320a955 and 64cbfa96;
they seem to be the cause of
https://bugzilla.redhat.com/show_bug.cgi?id=1801353 , invalid SCSI commands
when udev tries to query a DVD drive.

[arnd] Found another instance of the same bug, also introduced in my
compat_ioctl series.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1801353
Link: https://lore.kernel.org/r/20200219165139.3467320-1-arnd@arndb.de
Fixes: c103d6ee ("compat_ioctl: ide: floppy: add handler")
Fixes: 64cbfa96 ("compat_ioctl: move cdrom commands into cdrom.c")
Fixes: d320a955 ("compat_ioctl: scsi: move ioctl handling into drivers")
Bisected-by: NChris Murphy <bugzilla@colorremedies.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NAdam Williamson <awilliam@redhat.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

03264ddd

floppy: check FDC index for errors before assigning it · 2e90ca68

由 Linus Torvalds 提交于 2月 21, 2020

Jordy Zomer reported a KASAN out-of-bounds read in the floppy driver in
wait_til_ready().

Which on the face of it can't happen, since as Willy Tarreau points out,
the function does no particular memory access.  Except through the FDCS
macro, which just indexes a static allocation through teh current fdc,
which is always checked against N_FDC.

Except the checking happens after we've already assigned the value.

The floppy driver is a disgrace (a lot of it going back to my original
horrd "design"), and has no real maintainer.  Nobody has the hardware,
and nobody really cares.  But it still gets used in virtual environment
because it's one of those things that everybody supports.

The whole thing should be re-written, or at least parts of it should be
seriously cleaned up.  The 'current fdc' index, which is used by the
FDCS macro, and which is often shadowed by a local 'fdc' variable, is a
prime example of how not to write code.

But because nobody has the hardware or the motivation, let's just fix up
the immediate problem with a nasty band-aid: test the fdc index before
actually assigning it to the static 'fdc' variable.
Reported-by: NJordy Zomer <jordy@simplyhacker.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e90ca68

08 2月, 2020 5 次提交

fs_parse: fold fs_parameter_desc/fs_parameter_spec · d7167b14

由 Al Viro 提交于 9月 07, 2019

The former contains nothing but a pointer to an array of the latter...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d7167b14

fs_parser: remove fs_parameter_description name field · 96cafb9c

由 Eric Sandeen 提交于 12月 06, 2019

Unused now.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

96cafb9c

new primitive: __fs_parse() · 7f5d3814

由 Al Viro 提交于 12月 20, 2019

fs_parse() analogue taking p_log instead of fs_context.
fs_parse() turned into a wrapper, callers in ceph_common and rbd
switched to __fs_parse().

As the result, fs_parse() never gets NULL fs_context and neither
do fs_context-based logging primitives
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7f5d3814

A
switch rbd and libceph to p_log-based primitives · 2c3f3dc3
由 Al Viro 提交于 12月 20, 2019
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2c3f3dc3
A
struct p_log, variants of warnf() et.al. taking that one instead · 3fbb8d55
由 Al Viro 提交于 12月 20, 2019
```
primitives for prefixed logging
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3fbb8d55

07 2月, 2020 1 次提交

Pass consistent param->type to fs_parse() · 0f89589a

由 Al Viro 提交于 12月 17, 2019

As it is, vfs_parse_fs_string() makes "foo" and "foo=" indistinguishable;
both get fs_value_is_string for ->type and NULL for ->string.  To make
it even more unpleasant, that combination is impossible to produce with
fsconfig().

Much saner rules would be
        "foo"           => fs_value_is_flag, NULL
	"foo="          => fs_value_is_string, ""
	"foo=bar"       => fs_value_is_string, "bar"
All cases are distinguishable, all results are expressable by fsconfig(),
->has_value checks are much simpler that way (to the point of the field
being useless) and quite a few regressions go away (gfs2 has no business
accepting -o nodebug=, for example).

Partially based upon patches from Miklos.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0f89589a

06 2月, 2020 1 次提交

virtio-blk: remove VIRTIO_BLK_F_SCSI support · 782e067d

由 Christoph Hellwig 提交于 12月 12, 2019

Since the need for a special flag to support SCSI passthrough on a
block device was added in May 2017 the SCSI passthrough support in
virtio-blk has been disabled.  It has always been a bad idea
(just ask the original author..) and we have virtio-scsi for proper
passthrough.  The feature also never made it into the virtio 1.0
or later specifications.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.de>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

782e067d

04 2月, 2020 3 次提交

brd: check and limit max_part par · c8ab4225

由 Zhiqiang Liu 提交于 2月 04, 2020

In brd_init func, rd_nr num of brd_device are firstly allocated
and add in brd_devices, then brd_devices are traversed to add each
brd_device by calling add_disk func. When allocating brd_device,
the disk->first_minor is set to i * max_part, if rd_nr * max_part
is larger than MINORMASK, two different brd_device may have the same
devt, then only one of them can be successfully added.
when rmmod brd.ko, it will cause oops when calling brd_exit.

Follow those steps:
  # modprobe brd rd_nr=3 rd_size=102400 max_part=1048576
  # rmmod brd
then, the oops will appear.

Oops log:
[  726.613722] Call trace:
[  726.614175]  kernfs_find_ns+0x24/0x130
[  726.614852]  kernfs_find_and_get_ns+0x44/0x68
[  726.615749]  sysfs_remove_group+0x38/0xb0
[  726.616520]  blk_trace_remove_sysfs+0x1c/0x28
[  726.617320]  blk_unregister_queue+0x98/0x100
[  726.618105]  del_gendisk+0x144/0x2b8
[  726.618759]  brd_exit+0x68/0x560 [brd]
[  726.619501]  __arm64_sys_delete_module+0x19c/0x2a0
[  726.620384]  el0_svc_common+0x78/0x130
[  726.621057]  el0_svc_handler+0x38/0x78
[  726.621738]  el0_svc+0x8/0xc
[  726.622259] Code: aa0203f6 aa0103f7 aa1e03e0 d503201f (7940e260)

Here, we add brd_check_and_reset_par func to check and limit max_part par.

--
V5->V6:
 - remove useless code

V4->V5:(suggested by Ming Lei)
 - make sure max_part is not larger than DISK_MAX_PARTS

V3->V4:(suggested by Ming Lei)
 - remove useless change
 - add one limit of max_part

V2->V3: (suggested by Ming Lei)
 - clear .minors when running out of consecutive minor space in brd_alloc
 - remove limit of rd_nr

V1->V2:
 - add more checks in brd_check_par_valid as suggested by Ming Lei.
Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
Reviewed-by: NBob Liu <bob.liu@oracle.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c8ab4225

drivers/block/null_blk_main.c: fix uninitialized var warnings · 046755a2

由 Andrew Morton 提交于 2月 03, 2020

With gcc-7.2, many instances of

drivers/block/null_blk_main.c: In function ‘nullb_device_zone_nr_conv_store’:
drivers/block/null_blk_main.c:291:12: warning: ‘new_value’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  dev->NAME = new_value;      \
            ^
drivers/block/null_blk_main.c:279:7: note: ‘new_value’ was declared here
  TYPE new_value;       \
       ^

Presumably notabug, so use uninitialized_var() to suppress them.

Cc: Shaohua Li <shli@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

046755a2

drivers/block/null_blk_main.c: fix layout · ca0a95a6

由 Andrew Morton 提交于 2月 03, 2020

Each line here overflows 80 cols by exactly one character.  Delete one tab
per line to fix.

Cc: Shaohua Li <shli@fb.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ca0a95a6

01 2月, 2020 2 次提交

drivers/block/zram/zram_drv.c: fix error return codes not being returned in writeback_store · 3b82a051

由 Colin Ian King 提交于 1月 30, 2020

Currently when an error code -EIO or -ENOSPC in the for-loop of
writeback_store the error code is being overwritten by a ret = len
assignment at the end of the function and the error codes are being
lost.  Fix this by assigning ret = len at the start of the function and
remove the assignment from the end, hence allowing ret to be preserved
when error codes are assigned to it.

Addresses Coverity ("Unused value")

Link: http://lkml.kernel.org/r/20191128122958.178290-1-colin.king@canonical.com
Fixes: a939888e ("zram: support idle/huge page writeback")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3b82a051

zram: try to avoid worst-case scenario on same element pages · 90f82cbf

由 Taejoon Song 提交于 1月 30, 2020

The worst-case scenario on finding same element pages is that almost all
elements are same at the first glance but only last few elements are
different.

Since the same element tends to be grouped from the beginning of the
pages, if we check the first element with the last element before
looping through all elements, we might have some chances to quickly
detect non-same element pages.

 1. Test is done under LG webOS TV (64-bit arch)
 2. Dump the swap-out pages (~819200 pages)
 3. Analyze the pages with simple test script which counts the iteration
    number and measures the speed at off-line

Under 64-bit arch, the worst iteration count is PAGE_SIZE / 8 bytes =
512.  The speed is based on the time to consume page_same_filled()
function only.  The result, on average, is listed as below:

                                     Num of Iter    Speed(MB/s)
  Looping-Forward (Orig)                 38            99265
  Looping-Backward                       36           102725
  Last-element-check (This Patch)        33           125072

The result shows that the average iteration count decreases by 13% and
the speed increases by 25% with this patch.  This patch does not
increase the overall time complexity, though.

I also ran simpler version which uses backward loop.  Just looping
backward also makes some improvement, but less than this patch.

[taejoon.song@lge.com: fix off-by-one]
  Link: http://lkml.kernel.org/r/1578642001-11765-1-git-send-email-taejoon.song@lge.com
Link: http://lkml.kernel.org/r/1575424418-16119-1-git-send-email-taejoon.song@lge.comSigned-off-by: NTaejoon Song <taejoon.song@lge.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

90f82cbf

30 1月, 2020 3 次提交

xen/blkfront: limit allocated memory size to actual use case · 0265d6e8

由 Juergen Gross 提交于 1月 17, 2020

Today the Xen blkfront driver allocates memory for one struct
blkfront_ring_info for each communication ring. This structure is
statically sized for the maximum supported configuration resulting
in a size of more than 90 kB.

As the main size contributor is one array inside the struct, the
memory allocation can easily be limited by moving this array to be
the last structure element and to allocate only the memory for the
actually needed array size.
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0265d6e8

nbd: add a flush_workqueue in nbd_start_device · 5c0dd228

由 Sun Ke 提交于 1月 22, 2020

When kzalloc fail, may cause trying to destroy the
workqueue from inside the workqueue.

If num_connections is m (2 < m), and NO.1 ~ NO.n
(1 < n < m) kzalloc are successful. The NO.(n + 1)
failed. Then, nbd_start_device will return ENOMEM
to nbd_start_device_ioctl, and nbd_start_device_ioctl
will return immediately without running flush_workqueue.
However, we still have n recv threads. If nbd_release
run first, recv threads may have to drop the last
config_refs and try to destroy the workqueue from
inside the workqueue.

To fix it, add a flush_workqueue in nbd_start_device.

Fixes: e9e006f5 ("nbd: fix max number of supported devs")
Signed-off-by: NSun Ke <sunke32@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5c0dd228

drbd: fifo_alloc() should use struct_size · 6a365874

由 Stephen Kitt 提交于 1月 24, 2020

Switching to struct_size for the allocation in fifo_alloc avoids
hard-coding the type of fifo_buffer.values in fifo_alloc. It also
provides overflow protection; to avoid pessimistic code being
generated by the compiler as a result, this patch also switches
fifo_size to unsigned, propagating the change as appropriate.
Reviewed-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NStephen Kitt <steve@sk2.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6a365874

29 1月, 2020 3 次提交

xen/blkback: Consistently insert one empty line between functions · 8557bbe5

由 SeongJae Park 提交于 1月 27, 2020

The number of empty lines between functions in the xenbus.c is
inconsistent.  This trivial style cleanup commit fixes the file to
consistently place only one empty line.
Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NSeongJae Park <sjpark@amazon.de>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

8557bbe5

xen/blkback: Remove unnecessary static variable name prefixes · 823f2091

由 SeongJae Park 提交于 1月 27, 2020

A few of static variables in blkback have 'xen_blkif_' prefix, though it
is unnecessary for static variables. This commit removes such prefixes.
Reviewed-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NSeongJae Park <sjpark@amazon.de>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

823f2091

xen/blkback: Squeeze page pools if a memory pressure is detected · cb9369bd

由 SeongJae Park 提交于 1月 27, 2020

Each `blkif` has a free pages pool for the grant mapping.  The size of
the pool starts from zero and is increased on demand while processing
the I/O requests.  If current I/O requests handling is finished or 100
milliseconds has passed since last I/O requests handling, it checks and
shrinks the pool to not exceed the size limit, `max_buffer_pages`.

Therefore, host administrators can cause memory pressure in blkback by
attaching a large number of block devices and inducing I/O.  Such
problematic situations can be avoided by limiting the maximum number of
devices that can be attached, but finding the optimal limit is not so
easy.  Improper set of the limit can results in memory pressure or a
resource underutilization.  This commit avoids such problematic
situations by squeezing the pools (returns every free page in the pool
to the system) for a while (users can set this duration via a module
parameter) if memory pressure is detected.

Discussions
===========

The `blkback`'s original shrinking mechanism returns only pages in the
pool which are not currently be used by `blkback` to the system.  In
other words, the pages that are not mapped with granted pages.  Because
this commit is changing only the shrink limit but still uses the same
freeing mechanism it does not touch pages which are currently mapping
grants.

Once memory pressure is detected, this commit keeps the squeezing limit
for a user-specified time duration.  The duration should be neither too
long nor too short.  If it is too long, the squeezing incurring overhead
can reduce the I/O performance.  If it is too short, `blkback` will not
free enough pages to reduce the memory pressure.  This commit sets the
value as `10 milliseconds` by default because it is a short time in
terms of I/O while it is a long time in terms of memory operations.
Also, as the original shrinking mechanism works for at least every 100
milliseconds, this could be a somewhat reasonable choice.  I also tested
other durations (refer to the below section for more details) and
confirmed that 10 milliseconds is the one that works best with the test.
That said, the proper duration depends on actual configurations and
workloads.  That's why this commit allows users to set the duration as a
module parameter.

Memory Pressure Test
====================

To show how this commit fixes the memory pressure situation well, I
configured a test environment on a xen-running virtualization system.
On the `blkfront` running guest instances, I attach a large number of
network-backed volume devices and induce I/O to those.  Meanwhile, I
measure the number of pages that swapped in (pswpin) and out (pswpout)
on the `blkback` running guest.  The test ran twice, once for the
`blkback` before this commit and once for that after this commit.  As
shown below, this commit has dramatically reduced the memory pressure:

                pswpin  pswpout
    before      76,672  185,799
    after          867    3,967

Optimal Aggressive Shrinking Duration
-------------------------------------

To find a best squeezing duration, I repeated the test with three
different durations (1ms, 10ms, and 100ms).  The results are as below:

    duration    pswpin  pswpout
    1           707     5,095
    10          867     3,967
    100         362     3,348

As expected, the memory pressure decreases as the duration increases,
but the reduction become slow from the `10ms`.  Based on this results, I
chose the default duration as 10ms.

Performance Overhead Test
=========================

This commit could incur I/O performance degradation under severe memory
pressure because the squeezing will require more page allocations per
I/O.  To show the overhead, I artificially made a worst-case squeezing
situation and measured the I/O performance of a `blkfront` running
guest.

For the artificial squeezing, I set the `blkback.max_buffer_pages` using
the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
test, I set the value to `1024` and `0`.  The `1024` is the default
value.  Setting the value as `0` is same to a situation doing the
squeezing always (worst-case).

If the underlying block device is slow enough, the squeezing overhead
could be hidden.  For the reason, I use a fast block device, namely the
rbd[1]:

    # xl block-attach guest phy:/dev/ram0 xvdb w

For the I/O performance measurement, I run a simple `dd` command 5 times
directly to the device as below and collect the 'MB/s' results.

    $ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \
                             bs=4k count=$((256*512)); sync; done

The results are as below.  'max_pgs' represents the value of the
`blkback.max_buffer_pages` parameter.

    max_pgs   Min       Max       Median     Avg    Stddev
    0         417       423       420        419.4  2.5099801
    1024      414       425       416        417.8  4.4384682
    No difference proven at 95.0% confidence

In short, even worst case squeezing on ramdisk based fast block device
makes no visible performance degradation.  Please note that this is just
a very simple and minimal test.  On systems using super-fast block
devices and a special I/O workload, the results might be different.  If
you have any doubt, test on your machine with your workload to find the
optimal squeezing duration for you.

[1] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.htmlReviewed-by: NRoger Pau Monné <roger.pau@citrix.com>
Signed-off-by: NSeongJae Park <sjpark@amazon.de>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

cb9369bd

27 1月, 2020 2 次提交

rbd: set the 'device' link in sysfs · 3325322f

由 Hannes Reinecke 提交于 1月 23, 2020

The rbd driver already provides additional information in sysfs
under /sys/bus/rbd, so we should set the 'device' link in the block
device to reference this information.
Signed-off-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3325322f

rbd: work around -Wuninitialized warning · a55e601b

由 Arnd Bergmann 提交于 1月 07, 2020

gcc -O3 warns about a dummy variable that is passed
down into rbd_img_fill_nodata without being initialized:

drivers/block/rbd.c: In function 'rbd_img_fill_nodata':
drivers/block/rbd.c:2573:13: error: 'dummy' is used uninitialized in this function [-Werror=uninitialized]
  fctx->iter = *fctx->pos;

Since this is a dummy, I assume the warning is harmless, but
it's better to initialize it anyway and avoid the warning.

Fixes: mmtom ("init/Kconfig: enable -O3 for all arches")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a55e601b

15 1月, 2020 1 次提交

null_blk: Fix zone write handling · 16c731fe

由 Damien Le Moal 提交于 1月 09, 2020

null_zone_write() only allows writing empty and implicitly opened zones.
Writing to closed and explicitly opened zones must also be allowed and
the zone condition must be transitioned to implicit open if the zone
is not explicitly opened already.

Fixes: da644b2c ("null_blk: add zone open, close, and finish support")
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

16c731fe

06 1月, 2020 1 次提交

remove ioremap_nocache and devm_ioremap_nocache · 4bdc0d67

由 Christoph Hellwig 提交于 1月 06, 2020

ioremap has provided non-cached semantics by default since the Linux 2.6
days, so remove the additional ioremap_nocache interface.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NArnd Bergmann <arnd@arndb.de>

4bdc0d67

03 1月, 2020 6 次提交

compat_ioctl: move cdrom commands into cdrom.c · 64cbfa96

由 Arnd Bergmann 提交于 11月 28, 2019

There is no need for the special cases for the cdrom ioctls any more now,
so make sure that each cdrom driver has a .compat_ioctl() callback and
calls cdrom_compat_ioctl() directly there.
Reviewed-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

64cbfa96

compat_ioctl: scsi: move ioctl handling into drivers · d320a955

由 Arnd Bergmann 提交于 3月 15, 2019

Each driver calling scsi_ioctl() gets an equivalent compat_ioctl()
handler that implements the same commands by calling scsi_compat_ioctl().

The scsi_cmd_ioctl() and scsi_cmd_blk_ioctl() functions are compatible
at this point, so any driver that calls those can do so for both native
and compat mode, with the argument passed through compat_ptr().

With this, we can remove the entries from fs/compat_ioctl.c.  The new
code is larger, but should be easier to maintain and keep updated with
newly added commands.
Reviewed-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

d320a955

compat_ioctl: block: handle cdrom compat ioctl in non-cdrom drivers · 9452b1a3

由 Arnd Bergmann 提交于 11月 28, 2019

Various block drivers implement the CDROMMULTISESSION,
CDROM_GET_CAPABILITY, and CDROMEJECT ioctl commands, relying on the
block layer to handle compat_ioctl mode for them.

Move this into the drivers directly as a preparation for simplifying
the block layer later.

When only integer arguments or no arguments are passed, the
same handler can be used for .ioctl and .compat_ioctl, and
when only pointer arguments are passed, the newly added
blkdev_compat_ptr_ioctl can be used.
Reviewed-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

9452b1a3

compat_ioctl: cdrom: handle CDROM_LAST_WRITTEN · ab8bc541

由 Arnd Bergmann 提交于 12月 09, 2019

This is the only ioctl command that does not have a proper
compat handler. Making the normal implementation do the
right thing is actually very simply, so just do that by
using an in_compat_syscall() check to avoid the special
case in the pkcdvd driver.
Reviewed-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

ab8bc541

compat_ioctl: move CDROM_SEND_PACKET handling into scsi · f3ee6e63

由 Arnd Bergmann 提交于 11月 28, 2019

There is only one implementation of this ioctl, so move the handling out
of the common block layer code into the place where it's actually needed.

It also gets called indirectly through pktcdvd, which needs to be aware
of this change.

As I noticed, the old implementation of the compat handler failed to
convert the structure on the way out, so the updated fields never got
written back to user space. This is either not important, or it has
never worked and should be fixed now.
Reviewed-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

f3ee6e63

compat_ioctl: ubd, aoe: use blkdev_compat_ptr_ioctl · ab0cf1e4

由 Arnd Bergmann 提交于 11月 30, 2019

These drivers implement the HDIO_GET_IDENTITY and CDROMVOLREAD ioctl
commands, which are compatible between 32-bit and 64-bit user space and
traditionally handled by compat_blkdev_driver_ioctl().

As a prerequisite to removing that function, make both drivers use
blkdev_compat_ptr_ioctl() as their .compat_ioctl callback.
Reviewed-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

ab0cf1e4

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功