1. 25 3月, 2020 1 次提交
  2. 19 3月, 2020 2 次提交
  3. 10 3月, 2020 5 次提交
    • B
      null_blk: Add support for init_hctx() fault injection · 596444e7
      Bart Van Assche 提交于
      This makes it possible to test the error path in blk_mq_realloc_hw_ctxs()
      and also several error paths in null_blk.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      596444e7
    • B
      null_blk: Handle null_add_dev() failures properly · 9b03b713
      Bart Van Assche 提交于
      If null_add_dev() fails then null_del_dev() is called with a NULL argument.
      Make null_del_dev() handle this scenario correctly. This patch fixes the
      following KASAN complaint:
      
      null-ptr-deref in null_del_dev+0x28/0x280 [null_blk]
      Read of size 8 at addr 0000000000000000 by task find/1062
      
      Call Trace:
       dump_stack+0xa5/0xe6
       __kasan_report.cold+0x65/0x99
       kasan_report+0x16/0x20
       __asan_load8+0x58/0x90
       null_del_dev+0x28/0x280 [null_blk]
       nullb_group_drop_item+0x7e/0xa0 [null_blk]
       client_drop_item+0x53/0x80 [configfs]
       configfs_rmdir+0x395/0x4e0 [configfs]
       vfs_rmdir+0xb6/0x220
       do_rmdir+0x238/0x2c0
       __x64_sys_unlinkat+0x75/0x90
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9b03b713
    • B
      null_blk: Fix the null_add_dev() error path · 2004bfde
      Bart Van Assche 提交于
      If null_add_dev() fails, clear dev->nullb.
      
      This patch fixes the following KASAN complaint:
      
      BUG: KASAN: use-after-free in nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
      Read of size 8 at addr ffff88803280fc30 by task check/8409
      
      Call Trace:
       dump_stack+0xa5/0xe6
       print_address_description.constprop.0+0x26/0x260
       __kasan_report.cold+0x7b/0x99
       kasan_report+0x16/0x20
       __asan_load8+0x58/0x90
       nullb_device_submit_queues_store+0xcf/0x160 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7ff370926317
      Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      RSP: 002b:00007fff2dd2da48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007ff370926317
      RDX: 0000000000000002 RSI: 0000559437ef23f0 RDI: 0000000000000001
      RBP: 0000559437ef23f0 R08: 000000000000000a R09: 0000000000000001
      R10: 0000559436703471 R11: 0000000000000246 R12: 0000000000000002
      R13: 00007ff370a006a0 R14: 00007ff370a014a0 R15: 00007ff370a008a0
      
      Allocated by task 8409:
       save_stack+0x23/0x90
       __kasan_kmalloc.constprop.0+0xcf/0xe0
       kasan_kmalloc+0xd/0x10
       kmem_cache_alloc_node_trace+0x129/0x4c0
       null_add_dev+0x24a/0xe90 [null_blk]
       nullb_device_power_store+0x1b6/0x270 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 8409:
       save_stack+0x23/0x90
       __kasan_slab_free+0x112/0x160
       kasan_slab_free+0x12/0x20
       kfree+0xdf/0x250
       null_add_dev+0xaf3/0xe90 [null_blk]
       nullb_device_power_store+0x1b6/0x270 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 2984c868 ("nullb: factor disk parameters")
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2004bfde
    • B
      null_blk: Fix changing the number of hardware queues · 78b10be2
      Bart Van Assche 提交于
      Instead of initializing null_blk hardware queues explicitly after the
      request queue has been created, provide .init_hctx() and .exit_hctx()
      callback functions. The latter functions are not only called during
      request queue allocation but also when the number of hardware queues
      changes. Allocate nr_cpu_ids queues during initialization to support
      increasing the number of hardware queues above the initial hardware
      queue count.
      
      This change fixes increasing the number of hardware queues above the
      initial number of hardware queues and also keeps nullb->nr_queues in
      sync with the number of hardware queues.
      
      Fixes: 45919fbf ("null_blk: Enable modifying 'submit_queues' after an instance has been configured")
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      78b10be2
    • B
      null_blk: Suppress an UBSAN complaint triggered when setting 'memory_backed' · b9853b4d
      Bart Van Assche 提交于
      Although it is not clear to me why UBSAN complains when 'memory_backed'
      is set, this patch suppresses the UBSAN complaint that is triggered when
      setting that configfs attribute.
      
      UBSAN: Undefined behaviour in drivers/block/null_blk_main.c:327:1
      load of value 16 is not a valid value for type '_Bool'
      CPU: 2 PID: 8396 Comm: check Not tainted 5.6.0-rc1-dbg+ #14
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      Call Trace:
       dump_stack+0xa5/0xe6
       ubsan_epilogue+0x9/0x26
       __ubsan_handle_load_invalid_value+0x6d/0x76
       nullb_device_memory_backed_store.cold+0x2c/0x38 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b9853b4d
  4. 05 3月, 2020 1 次提交
  5. 26 2月, 2020 1 次提交
  6. 25 2月, 2020 2 次提交
  7. 08 2月, 2020 5 次提交
  8. 07 2月, 2020 1 次提交
    • A
      Pass consistent param->type to fs_parse() · 0f89589a
      Al Viro 提交于
      As it is, vfs_parse_fs_string() makes "foo" and "foo=" indistinguishable;
      both get fs_value_is_string for ->type and NULL for ->string.  To make
      it even more unpleasant, that combination is impossible to produce with
      fsconfig().
      
      Much saner rules would be
              "foo"           => fs_value_is_flag, NULL
      	"foo="          => fs_value_is_string, ""
      	"foo=bar"       => fs_value_is_string, "bar"
      All cases are distinguishable, all results are expressable by fsconfig(),
      ->has_value checks are much simpler that way (to the point of the field
      being useless) and quite a few regressions go away (gfs2 has no business
      accepting -o nodebug=, for example).
      
      Partially based upon patches from Miklos.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0f89589a
  9. 06 2月, 2020 1 次提交
  10. 04 2月, 2020 3 次提交
    • Z
      brd: check and limit max_part par · c8ab4225
      Zhiqiang Liu 提交于
      In brd_init func, rd_nr num of brd_device are firstly allocated
      and add in brd_devices, then brd_devices are traversed to add each
      brd_device by calling add_disk func. When allocating brd_device,
      the disk->first_minor is set to i * max_part, if rd_nr * max_part
      is larger than MINORMASK, two different brd_device may have the same
      devt, then only one of them can be successfully added.
      when rmmod brd.ko, it will cause oops when calling brd_exit.
      
      Follow those steps:
        # modprobe brd rd_nr=3 rd_size=102400 max_part=1048576
        # rmmod brd
      then, the oops will appear.
      
      Oops log:
      [  726.613722] Call trace:
      [  726.614175]  kernfs_find_ns+0x24/0x130
      [  726.614852]  kernfs_find_and_get_ns+0x44/0x68
      [  726.615749]  sysfs_remove_group+0x38/0xb0
      [  726.616520]  blk_trace_remove_sysfs+0x1c/0x28
      [  726.617320]  blk_unregister_queue+0x98/0x100
      [  726.618105]  del_gendisk+0x144/0x2b8
      [  726.618759]  brd_exit+0x68/0x560 [brd]
      [  726.619501]  __arm64_sys_delete_module+0x19c/0x2a0
      [  726.620384]  el0_svc_common+0x78/0x130
      [  726.621057]  el0_svc_handler+0x38/0x78
      [  726.621738]  el0_svc+0x8/0xc
      [  726.622259] Code: aa0203f6 aa0103f7 aa1e03e0 d503201f (7940e260)
      
      Here, we add brd_check_and_reset_par func to check and limit max_part par.
      
      --
      V5->V6:
       - remove useless code
      
      V4->V5:(suggested by Ming Lei)
       - make sure max_part is not larger than DISK_MAX_PARTS
      
      V3->V4:(suggested by Ming Lei)
       - remove useless change
       - add one limit of max_part
      
      V2->V3: (suggested by Ming Lei)
       - clear .minors when running out of consecutive minor space in brd_alloc
       - remove limit of rd_nr
      
      V1->V2:
       - add more checks in brd_check_par_valid as suggested by Ming Lei.
      Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: NBob Liu <bob.liu@oracle.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c8ab4225
    • A
      drivers/block/null_blk_main.c: fix uninitialized var warnings · 046755a2
      Andrew Morton 提交于
      With gcc-7.2, many instances of
      
      drivers/block/null_blk_main.c: In function ‘nullb_device_zone_nr_conv_store’:
      drivers/block/null_blk_main.c:291:12: warning: ‘new_value’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        dev->NAME = new_value;      \
                  ^
      drivers/block/null_blk_main.c:279:7: note: ‘new_value’ was declared here
        TYPE new_value;       \
             ^
      
      Presumably notabug, so use uninitialized_var() to suppress them.
      
      Cc: Shaohua Li <shli@fb.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      046755a2
    • A
      drivers/block/null_blk_main.c: fix layout · ca0a95a6
      Andrew Morton 提交于
      Each line here overflows 80 cols by exactly one character.  Delete one tab
      per line to fix.
      
      Cc: Shaohua Li <shli@fb.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca0a95a6
  11. 01 2月, 2020 2 次提交
  12. 30 1月, 2020 3 次提交
  13. 29 1月, 2020 3 次提交
    • S
      xen/blkback: Consistently insert one empty line between functions · 8557bbe5
      SeongJae Park 提交于
      The number of empty lines between functions in the xenbus.c is
      inconsistent.  This trivial style cleanup commit fixes the file to
      consistently place only one empty line.
      Acked-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NSeongJae Park <sjpark@amazon.de>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      8557bbe5
    • S
      xen/blkback: Remove unnecessary static variable name prefixes · 823f2091
      SeongJae Park 提交于
      A few of static variables in blkback have 'xen_blkif_' prefix, though it
      is unnecessary for static variables.  This commit removes such prefixes.
      Reviewed-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NSeongJae Park <sjpark@amazon.de>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      823f2091
    • S
      xen/blkback: Squeeze page pools if a memory pressure is detected · cb9369bd
      SeongJae Park 提交于
      Each `blkif` has a free pages pool for the grant mapping.  The size of
      the pool starts from zero and is increased on demand while processing
      the I/O requests.  If current I/O requests handling is finished or 100
      milliseconds has passed since last I/O requests handling, it checks and
      shrinks the pool to not exceed the size limit, `max_buffer_pages`.
      
      Therefore, host administrators can cause memory pressure in blkback by
      attaching a large number of block devices and inducing I/O.  Such
      problematic situations can be avoided by limiting the maximum number of
      devices that can be attached, but finding the optimal limit is not so
      easy.  Improper set of the limit can results in memory pressure or a
      resource underutilization.  This commit avoids such problematic
      situations by squeezing the pools (returns every free page in the pool
      to the system) for a while (users can set this duration via a module
      parameter) if memory pressure is detected.
      
      Discussions
      ===========
      
      The `blkback`'s original shrinking mechanism returns only pages in the
      pool which are not currently be used by `blkback` to the system.  In
      other words, the pages that are not mapped with granted pages.  Because
      this commit is changing only the shrink limit but still uses the same
      freeing mechanism it does not touch pages which are currently mapping
      grants.
      
      Once memory pressure is detected, this commit keeps the squeezing limit
      for a user-specified time duration.  The duration should be neither too
      long nor too short.  If it is too long, the squeezing incurring overhead
      can reduce the I/O performance.  If it is too short, `blkback` will not
      free enough pages to reduce the memory pressure.  This commit sets the
      value as `10 milliseconds` by default because it is a short time in
      terms of I/O while it is a long time in terms of memory operations.
      Also, as the original shrinking mechanism works for at least every 100
      milliseconds, this could be a somewhat reasonable choice.  I also tested
      other durations (refer to the below section for more details) and
      confirmed that 10 milliseconds is the one that works best with the test.
      That said, the proper duration depends on actual configurations and
      workloads.  That's why this commit allows users to set the duration as a
      module parameter.
      
      Memory Pressure Test
      ====================
      
      To show how this commit fixes the memory pressure situation well, I
      configured a test environment on a xen-running virtualization system.
      On the `blkfront` running guest instances, I attach a large number of
      network-backed volume devices and induce I/O to those.  Meanwhile, I
      measure the number of pages that swapped in (pswpin) and out (pswpout)
      on the `blkback` running guest.  The test ran twice, once for the
      `blkback` before this commit and once for that after this commit.  As
      shown below, this commit has dramatically reduced the memory pressure:
      
                      pswpin  pswpout
          before      76,672  185,799
          after          867    3,967
      
      Optimal Aggressive Shrinking Duration
      -------------------------------------
      
      To find a best squeezing duration, I repeated the test with three
      different durations (1ms, 10ms, and 100ms).  The results are as below:
      
          duration    pswpin  pswpout
          1           707     5,095
          10          867     3,967
          100         362     3,348
      
      As expected, the memory pressure decreases as the duration increases,
      but the reduction become slow from the `10ms`.  Based on this results, I
      chose the default duration as 10ms.
      
      Performance Overhead Test
      =========================
      
      This commit could incur I/O performance degradation under severe memory
      pressure because the squeezing will require more page allocations per
      I/O.  To show the overhead, I artificially made a worst-case squeezing
      situation and measured the I/O performance of a `blkfront` running
      guest.
      
      For the artificial squeezing, I set the `blkback.max_buffer_pages` using
      the `/sys/module/xen_blkback/parameters/max_buffer_pages` file.  In this
      test, I set the value to `1024` and `0`.  The `1024` is the default
      value.  Setting the value as `0` is same to a situation doing the
      squeezing always (worst-case).
      
      If the underlying block device is slow enough, the squeezing overhead
      could be hidden.  For the reason, I use a fast block device, namely the
      rbd[1]:
      
          # xl block-attach guest phy:/dev/ram0 xvdb w
      
      For the I/O performance measurement, I run a simple `dd` command 5 times
      directly to the device as below and collect the 'MB/s' results.
      
          $ for i in {1..5}; do dd if=/dev/zero of=/dev/xvdb \
                                   bs=4k count=$((256*512)); sync; done
      
      The results are as below.  'max_pgs' represents the value of the
      `blkback.max_buffer_pages` parameter.
      
          max_pgs   Min       Max       Median     Avg    Stddev
          0         417       423       420        419.4  2.5099801
          1024      414       425       416        417.8  4.4384682
          No difference proven at 95.0% confidence
      
      In short, even worst case squeezing on ramdisk based fast block device
      makes no visible performance degradation.  Please note that this is just
      a very simple and minimal test.  On systems using super-fast block
      devices and a special I/O workload, the results might be different.  If
      you have any doubt, test on your machine with your workload to find the
      optimal squeezing duration for you.
      
      [1] https://www.kernel.org/doc/html/latest/admin-guide/blockdev/ramdisk.htmlReviewed-by: NRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: NSeongJae Park <sjpark@amazon.de>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      cb9369bd
  14. 27 1月, 2020 2 次提交
  15. 15 1月, 2020 1 次提交
  16. 06 1月, 2020 1 次提交
  17. 03 1月, 2020 6 次提交