1. 27 5月, 2020 1 次提交
  2. 19 5月, 2020 1 次提交
  3. 14 5月, 2020 1 次提交
    • S
      block: Inline encryption support for blk-mq · a892c8d5
      Satya Tangirala 提交于
      We must have some way of letting a storage device driver know what
      encryption context it should use for en/decrypting a request. However,
      it's the upper layers (like the filesystem/fscrypt) that know about and
      manages encryption contexts. As such, when the upper layer submits a bio
      to the block layer, and this bio eventually reaches a device driver with
      support for inline encryption, the device driver will need to have been
      told the encryption context for that bio.
      
      We want to communicate the encryption context from the upper layer to the
      storage device along with the bio, when the bio is submitted to the block
      layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can
      represent an encryption context (note that we can't use the bi_private
      field in struct bio to do this because that field does not function to pass
      information across layers in the storage stack). We also introduce various
      functions to manipulate the bio_crypt_ctx and make the bio/request merging
      logic aware of the bio_crypt_ctx.
      
      We also make changes to blk-mq to make it handle bios with encryption
      contexts. blk-mq can merge many bios into the same request. These bios need
      to have contiguous data unit numbers (the necessary changes to blk-merge
      are also made to ensure this) - as such, it suffices to keep the data unit
      number of just the first bio, since that's all a storage driver needs to
      infer the data unit number to use for each data block in each bio in a
      request. blk-mq keeps track of the encryption context to be used for all
      the bios in a request with the request's rq_crypt_ctx. When the first bio
      is added to an empty request, blk-mq will program the encryption context
      of that bio into the request_queue's keyslot manager, and store the
      returned keyslot in the request's rq_crypt_ctx. All the functions to
      operate on encryption contexts are in blk-crypto.c.
      
      Upper layers only need to call bio_crypt_set_ctx with the encryption key,
      algorithm and data_unit_num; they don't have to worry about getting a
      keyslot for each encryption context, as blk-mq/blk-crypto handles that.
      Blk-crypto also makes it possible for request-based layered devices like
      dm-rq to make use of inline encryption hardware by cloning the
      rq_crypt_ctx and programming a keyslot in the new request_queue when
      necessary.
      
      Note that any user of the block layer can submit bios with an
      encryption context, such as filesystems, device-mapper targets, etc.
      Signed-off-by: NSatya Tangirala <satyat@google.com>
      Reviewed-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a892c8d5
  4. 13 5月, 2020 3 次提交
  5. 28 3月, 2020 1 次提交
  6. 25 3月, 2020 3 次提交
    • C
      block: move guard_bio_eod to bio.c · 29125ed6
      Christoph Hellwig 提交于
      This is bio layer functionality and not related to buffer heads.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      29125ed6
    • K
      block/diskstats: replace time_in_queue with sum of request times · 8cd5b8fc
      Konstantin Khlebnikov 提交于
      Column "time_in_queue" in diskstats is supposed to show total waiting time
      of all requests. I.e. value should be equal to the sum of times from other
      columns. But this is not true, because column "time_in_queue" is counted
      separately in jiffies rather than in nanoseconds as other times.
      
      This patch removes redundant counter for "time_in_queue" and shows total
      time of read, write, discard and flush requests.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8cd5b8fc
    • K
      block/diskstats: more accurate approximation of io_ticks for slow disks · 2b8bd423
      Konstantin Khlebnikov 提交于
      Currently io_ticks is approximated by adding one at each start and end of
      requests if jiffies counter has changed. This works perfectly for requests
      shorter than a jiffy or if one of requests starts/ends at each jiffy.
      
      If disk executes just one request at a time and they are longer than two
      jiffies then only first and last jiffies will be accounted.
      
      Fix is simple: at the end of request add up into io_ticks jiffies passed
      since last update rather than just one jiffy.
      
      Example: common HDD executes random read 4k requests around 12ms.
      
      fio --name=test --filename=/dev/sdb --rw=randread --direct=1 --runtime=30 &
      iostat -x 10 sdb
      
      Note changes of iostat's "%util" 8,43% -> 99,99% before/after patch:
      
      Before:
      
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
      sdb               0,00     0,00   82,60    0,00   330,40     0,00     8,00     0,96   12,09   12,09    0,00   1,02   8,43
      
      After:
      
      Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
      sdb               0,00     0,00   82,50    0,00   330,00     0,00     8,00     1,00   12,10   12,10    0,00  12,12  99,99
      
      Now io_ticks does not loose time between start and end of requests, but
      for queue-depth > 1 some I/O time between adjacent starts might be lost.
      
      For load estimation "%util" is not as useful as average queue length,
      but it clearly shows how often disk queue is completely empty.
      
      Fixes: 5b18b5a7 ("block: delete part_round_stats and switch to less precise counting")
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2b8bd423
  7. 24 3月, 2020 1 次提交
  8. 18 3月, 2020 1 次提交
  9. 09 1月, 2020 1 次提交
    • M
      fs: move guard_bio_eod() after bio_set_op_attrs · 83c9c547
      Ming Lei 提交于
      Commit 85a8ce62 ("block: add bio_truncate to fix guard_bio_eod")
      adds bio_truncate() for handling bio EOD. However, bio_truncate()
      doesn't use the passed 'op' parameter from guard_bio_eod's callers.
      
      So bio_trunacate() may retrieve wrong 'op', and zering pages may
      not be done for READ bio.
      
      Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs()
      in submit_bh_wbc() so that bio_truncate() can always retrieve correct
      op info.
      
      Meantime remove the 'op' parameter from guard_bio_eod() because it isn't
      used any more.
      
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: linux-fsdevel@vger.kernel.org
      Fixes: 85a8ce62 ("block: add bio_truncate to fix guard_bio_eod")
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      
      Fold in kerneldoc and bio_op() change.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      83c9c547
  10. 29 12月, 2019 1 次提交
    • M
      block: add bio_truncate to fix guard_bio_eod · 85a8ce62
      Ming Lei 提交于
      Some filesystem, such as vfat, may send bio which crosses device boundary,
      and the worse thing is that the IO request starting within device boundaries
      can contain more than one segment past EOD.
      
      Commit dce30ca9 ("fs: fix guard_bio_eod to check for real EOD errors")
      tries to fix this issue by returning -EIO for this situation. However,
      this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb()
      may hang for ever.
      
      Also the current truncating on last segment is dangerous by updating the
      last bvec, given bvec table becomes not immutable any more, and fs bio
      users may not retrieve the truncated pages via bio_for_each_segment_all() in
      its .end_io callback.
      
      Fixes this issue by supporting multi-segment truncating. And the
      approach is simpler:
      
      - just update bio size since block layer can make correct bvec with
      the updated bio size. Then bvec table becomes really immutable.
      
      - zero all truncated segments for read bio
      
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: linux-fsdevel@vger.kernel.org
      Fixed-by: dce30ca9 ("fs: fix guard_bio_eod to check for real EOD errors")
      Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      85a8ce62
  11. 10 12月, 2019 1 次提交
  12. 06 12月, 2019 1 次提交
    • J
      block: fix memleak of bio integrity data · ece841ab
      Justin Tee 提交于
      7c20f116 ("bio-integrity: stop abusing bi_end_io") moves
      bio_integrity_free from bio_uninit() to bio_integrity_verify_fn()
      and bio_endio(). This way looks wrong because bio may be freed
      without calling bio_endio(), for example, blk_rq_unprep_clone() is
      called from dm_mq_queue_rq() when the underlying queue of dm-mpath
      is busy.
      
      So memory leak of bio integrity data is caused by commit 7c20f116.
      
      Fixes this issue by re-adding bio_integrity_free() to bio_uninit().
      
      Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by Justin Tee <justin.tee@broadcom.com>
      
      Add commit log, and simplify/fix the original patch wroten by Justin.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ece841ab
  13. 12 11月, 2019 1 次提交
    • J
      block: check bi_size overflow before merge · e3a5d8e3
      Junichi Nomura 提交于
      __bio_try_merge_page() may merge a page to bio without bio_full() check
      and cause bi_size overflow.
      
      The overflow typically ends up with sd_init_command() warning on zero
      segment request with call trace like this:
      
          ------------[ cut here ]------------
          WARNING: CPU: 2 PID: 1986 at drivers/scsi/scsi_lib.c:1025 scsi_init_io+0x156/0x180
          CPU: 2 PID: 1986 Comm: kworker/2:1H Kdump: loaded Not tainted 5.4.0-rc7 #1
          Workqueue: kblockd blk_mq_run_work_fn
          RIP: 0010:scsi_init_io+0x156/0x180
          RSP: 0018:ffffa11487663bf0 EFLAGS: 00010246
          RAX: 00000000002be0a0 RBX: ffff8e6e9ff30118 RCX: 0000000000000000
          RDX: 00000000ffffffe1 RSI: 0000000000000000 RDI: ffff8e6e9ff30118
          RBP: ffffa11487663c18 R08: ffffa11487663d28 R09: ffff8e6e9ff30150
          R10: 0000000000000001 R11: 0000000000000000 R12: ffff8e6e9ff30000
          R13: 0000000000000001 R14: ffff8e74a1cf1800 R15: ffff8e6e9ff30000
          FS:  0000000000000000(0000) GS:ffff8e6ea7680000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00007fff18cf0fe8 CR3: 0000000659f0a001 CR4: 00000000001606e0
          Call Trace:
           sd_init_command+0x326/0xb40 [sd_mod]
           scsi_queue_rq+0x502/0xaa0
           ? blk_mq_get_driver_tag+0xe7/0x120
           blk_mq_dispatch_rq_list+0x256/0x5a0
           ? elv_rb_del+0x24/0x30
           ? deadline_remove_request+0x7b/0xc0
           blk_mq_do_dispatch_sched+0xa3/0x140
           blk_mq_sched_dispatch_requests+0xfb/0x170
           __blk_mq_run_hw_queue+0x81/0x130
           blk_mq_run_work_fn+0x1b/0x20
           process_one_work+0x179/0x390
           worker_thread+0x4f/0x3e0
           kthread+0x105/0x140
           ? max_active_store+0x80/0x80
           ? kthread_bind+0x20/0x20
           ret_from_fork+0x35/0x40
          ---[ end trace f9036abf5af4a4d3 ]---
          blk_update_request: I/O error, dev sdd, sector 2875552 op 0x1:(WRITE) flags 0x0 phys_seg 0 prio class 0
          XFS (sdd1): writeback error on sector 2875552
      
      __bio_try_merge_page() should check the overflow before actually doing
      merge.
      
      Fixes: 07173c3e ("block: enable multipage bvecs")
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e3a5d8e3
  14. 22 8月, 2019 3 次提交
  15. 14 8月, 2019 1 次提交
  16. 06 8月, 2019 1 次提交
  17. 05 8月, 2019 1 次提交
  18. 12 7月, 2019 1 次提交
  19. 01 7月, 2019 1 次提交
    • M
      block: fix .bi_size overflow · 79d08f89
      Ming Lei 提交于
      'bio->bi_iter.bi_size' is 'unsigned int', which at most hold 4G - 1
      bytes.
      
      Before 07173c3e ("block: enable multipage bvecs"), one bio can
      include very limited pages, and usually at most 256, so the fs bio
      size won't be bigger than 1M bytes most of times.
      
      Since we support multi-page bvec, in theory one fs bio really can
      be added > 1M pages, especially in case of hugepage, or big writeback
      with too many dirty pages. Then there is chance in which .bi_size
      is overflowed.
      
      Fixes this issue by using bio_full() to check if the added segment may
      overflow .bi_size.
      
      Cc: Liu Yiding <liuyd.fnst@cn.fujitsu.com>
      Cc: kernel test robot <rong.a.chen@intel.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: linux-xfs@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 07173c3e ("block: enable multipage bvecs")
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      79d08f89
  20. 29 6月, 2019 5 次提交
  21. 27 6月, 2019 1 次提交
  22. 21 6月, 2019 1 次提交
    • C
      block: remove the bi_phys_segments field in struct bio · 14ccb66b
      Christoph Hellwig 提交于
      We only need the number of segments in the blk-mq submission path.
      Remove the field from struct bio, and return it from a variant of
      blk_queue_split instead of that it can passed as an argument to
      those functions that need the value.
      
      This also means we stop recounting segments except for cloning
      and partial segments.
      
      To keep the number of arguments in this how path down remove
      pointless struct request_queue arguments from any of the functions
      that had it and grew a nr_segs argument.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      14ccb66b
  23. 17 6月, 2019 2 次提交
  24. 15 6月, 2019 1 次提交
    • G
      block: bio: Use struct_size() in kmalloc() · f1f8f292
      Gustavo A. R. Silva 提交于
      One of the more common cases of allocation size calculations is finding
      the size of a structure that has a zero-sized array at the end, along
      with memory for some number of elements for that array. For example:
      
      struct bio_map_data {
      	...
              struct iovec iov[];
      };
      
      instance = kmalloc(sizeof(sizeof(struct bio_map_data) + sizeof(struct iovec) *
                                count, GFP_KERNEL);
      
      Instead of leaving these open-coded and prone to type mistakes, we can
      now use the new struct_size() helper:
      
      instance = kmalloc(struct_size(instance, iov, count), GFP_KERNEL);
      
      This code was detected with the help of Coccinelle.
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f1f8f292
  25. 01 5月, 2019 1 次提交
  26. 30 4月, 2019 4 次提交