1. 28 3月, 2020 2 次提交
  2. 25 3月, 2020 1 次提交
  3. 12 3月, 2020 1 次提交
  4. 10 3月, 2020 2 次提交
    • B
      blk-mq: Fix a recently introduced regression in blk_mq_realloc_hw_ctxs() · d0930bb8
      Bart Van Assche 提交于
      q->nr_hw_queues must only be updated once it is known that
      blk_mq_realloc_hw_ctxs() has succeeded. Otherwise it can happen that
      reallocation fails and that q->nr_hw_queues is larger than the number of
      allocated hardware queues. This patch fixes the following crash if
      increasing the number of hardware queues fails:
      
      BUG: KASAN: null-ptr-deref in blk_mq_map_swqueue+0x775/0x810
      Write of size 8 at addr 0000000000000118 by task check/977
      
      CPU: 3 PID: 977 Comm: check Not tainted 5.6.0-rc1-dbg+ #8
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      Call Trace:
       dump_stack+0xa5/0xe6
       __kasan_report.cold+0x65/0x99
       kasan_report+0x16/0x20
       check_memory_region+0x140/0x1b0
       memset+0x28/0x40
       blk_mq_map_swqueue+0x775/0x810
       blk_mq_update_nr_hw_queues+0x468/0x710
       nullb_device_submit_queues_store+0xf7/0x1a0 [null_blk]
       configfs_write_file+0x1c4/0x250 [configfs]
       __vfs_write+0x4c/0x90
       vfs_write+0x145/0x2c0
       ksys_write+0xd7/0x180
       __x64_sys_write+0x47/0x50
       do_syscall_64+0x6f/0x2f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: ac0d6b92 ("block: Reduce the amount of memory required per request queue")
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Cc: Keith Busch <kbusch@kernel.org>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d0930bb8
    • B
      blk-mq: Keep set->nr_hw_queues and set->map[].nr_queues in sync · 6e66b493
      Bart Van Assche 提交于
      blk_mq_map_queues() and multiple .map_queues() implementations expect that
      set->map[HCTX_TYPE_DEFAULT].nr_queues is set to the number of hardware
      queues. Hence set .nr_queues before calling these functions. This patch
      fixes the following kernel warning:
      
      WARNING: CPU: 0 PID: 2501 at include/linux/cpumask.h:137
      Call Trace:
       blk_mq_run_hw_queue+0x19d/0x350 block/blk-mq.c:1508
       blk_mq_run_hw_queues+0x112/0x1a0 block/blk-mq.c:1525
       blk_mq_requeue_work+0x502/0x780 block/blk-mq.c:775
       process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
       worker_thread+0x98/0xe40 kernel/workqueue.c:2415
       kthread+0x361/0x430 kernel/kthread.c:255
      
      Fixes: ed76e329 ("blk-mq: abstract out queue map") # v5.0
      Reported-by: syzbot+d44e1b26ce5c3e77458d@syzkaller.appspotmail.com
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Cc: Johannes Thumshirn <jth@kernel.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6e66b493
  5. 27 2月, 2020 1 次提交
  6. 25 2月, 2020 1 次提交
    • M
      blk-mq: insert passthrough request into hctx->dispatch directly · 01e99aec
      Ming Lei 提交于
      For some reason, device may be in one situation which can't handle
      FS request, so STS_RESOURCE is always returned and the FS request
      will be added to hctx->dispatch. However passthrough request may
      be required at that time for fixing the problem. If passthrough
      request is added to scheduler queue, there isn't any chance for
      blk-mq to dispatch it given we prioritize requests in hctx->dispatch.
      Then the FS IO request may never be completed, and IO hang is caused.
      
      So passthrough request has to be added to hctx->dispatch directly
      for fixing the IO hang.
      
      Fix this issue by inserting passthrough request into hctx->dispatch
      directly together withing adding FS request to the tail of
      hctx->dispatch in blk_mq_dispatch_rq_list(). Actually we add FS request
      to tail of hctx->dispatch at default, see blk_mq_request_bypass_insert().
      
      Then it becomes consistent with original legacy IO request
      path, in which passthrough request is always added to q->queue_head.
      
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ewan D. Milne <emilne@redhat.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      01e99aec
  7. 07 1月, 2020 1 次提交
  8. 19 12月, 2019 2 次提交
  9. 14 11月, 2019 1 次提交
  10. 01 11月, 2019 1 次提交
  11. 26 10月, 2019 4 次提交
    • A
      blk-mq: remove needless goto from blk_mq_get_driver_tag · 1fead718
      André Almeida 提交于
      The only usage of the label "done" is when (rq->tag != -1) at the
      beginning of the function. Rather than jumping to label, we can just
      remove this label and execute the code at the "if". Besides that, the
      code that would be executed after the label "done" is the return of the
      logical expression (rq->tag != -1) but since we are already inside the
      if, we now that this is true. Remove the label and replace the goto with
      the proper result of the label.
      Signed-off-by: NAndré Almeida <andrealmeid@collabora.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1fead718
    • B
      block: Reduce the amount of memory used for tag sets · f7e76dbc
      Bart Van Assche 提交于
      Instead of allocating an array of size nr_cpu_ids for set->tags, allocate
      an array of size set->nr_hw_queues. This patch improves behavior that was
      introduced by commit 868f2f0b ("blk-mq: dynamic h/w context count").
      
      Reallocating tag sets from inside __blk_mq_update_nr_hw_queues() is safe
      because:
      - All request queues that share the tag sets are frozen before the tag sets
        are reallocated.
      - blk_mq_queue_tag_busy_iter() holds q->q_usage_counter while active and
        hence is serialized against __blk_mq_update_nr_hw_queues().
      
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f7e76dbc
    • B
      block: Reduce the amount of memory required per request queue · ac0d6b92
      Bart Van Assche 提交于
      Instead of always allocating at least nr_cpu_ids hardware queues per request
      queue, reallocate q->queue_hw_ctx if it has to grow. This patch improves
      behavior that was introduced by commit 868f2f0b ("blk-mq: dynamic h/w
      context count").
      
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ac0d6b92
    • B
      block: Remove the synchronize_rcu() call from __blk_mq_update_nr_hw_queues() · a9a80808
      Bart Van Assche 提交于
      Since the blk_mq_{,un}freeze_queue() calls in __blk_mq_update_nr_hw_queues()
      already serialize __blk_mq_update_nr_hw_queues() against
      blk_mq_queue_tag_busy_iter(), the synchronize_rcu() call in
      __blk_mq_update_nr_hw_queues() is not necessary. Hence remove it.
      
      Note: the synchronize_rcu() call in __blk_mq_update_nr_hw_queues() was
      introduced by commit f5bbbbe4 ("blk-mq: sync the update nr_hw_queues with
      blk_mq_queue_tag_busy_iter"). Commit 530ca2c9 ("blk-mq: Allow blocking
      queue tag iter callbacks") removed the rcu_read_{,un}lock() calls that
      correspond to the synchronize_rcu() call in __blk_mq_update_nr_hw_queues().
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a9a80808
  12. 07 10月, 2019 4 次提交
  13. 28 9月, 2019 2 次提交
  14. 27 9月, 2019 1 次提交
    • Y
      block: fix null pointer dereference in blk_mq_rq_timed_out() · 8d699663
      Yufen Yu 提交于
      We got a null pointer deference BUG_ON in blk_mq_rq_timed_out()
      as following:
      
      [  108.825472] BUG: kernel NULL pointer dereference, address: 0000000000000040
      [  108.827059] PGD 0 P4D 0
      [  108.827313] Oops: 0000 [#1] SMP PTI
      [  108.827657] CPU: 6 PID: 198 Comm: kworker/6:1H Not tainted 5.3.0-rc8+ #431
      [  108.829503] Workqueue: kblockd blk_mq_timeout_work
      [  108.829913] RIP: 0010:blk_mq_check_expired+0x258/0x330
      [  108.838191] Call Trace:
      [  108.838406]  bt_iter+0x74/0x80
      [  108.838665]  blk_mq_queue_tag_busy_iter+0x204/0x450
      [  108.839074]  ? __switch_to_asm+0x34/0x70
      [  108.839405]  ? blk_mq_stop_hw_queue+0x40/0x40
      [  108.839823]  ? blk_mq_stop_hw_queue+0x40/0x40
      [  108.840273]  ? syscall_return_via_sysret+0xf/0x7f
      [  108.840732]  blk_mq_timeout_work+0x74/0x200
      [  108.841151]  process_one_work+0x297/0x680
      [  108.841550]  worker_thread+0x29c/0x6f0
      [  108.841926]  ? rescuer_thread+0x580/0x580
      [  108.842344]  kthread+0x16a/0x1a0
      [  108.842666]  ? kthread_flush_work+0x170/0x170
      [  108.843100]  ret_from_fork+0x35/0x40
      
      The bug is caused by the race between timeout handle and completion for
      flush request.
      
      When timeout handle function blk_mq_rq_timed_out() try to read
      'req->q->mq_ops', the 'req' have completed and reinitiated by next
      flush request, which would call blk_rq_init() to clear 'req' as 0.
      
      After commit 12f5b931 ("blk-mq: Remove generation seqeunce"),
      normal requests lifetime are protected by refcount. Until 'rq->ref'
      drop to zero, the request can really be free. Thus, these requests
      cannot been reused before timeout handle finish.
      
      However, flush request has defined .end_io and rq->end_io() is still
      called even if 'rq->ref' doesn't drop to zero. After that, the 'flush_rq'
      can be reused by the next flush request handle, resulting in null
      pointer deference BUG ON.
      
      We fix this problem by covering flush request with 'rq->ref'.
      If the refcount is not zero, flush_end_io() return and wait the
      last holder recall it. To record the request status, we add a new
      entry 'rq_status', which will be used in flush_end_io().
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: stable@vger.kernel.org # v4.18+
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      
      -------
      v2:
       - move rq_status from struct request to struct blk_flush_queue
      v3:
       - remove unnecessary '{}' pair.
      v4:
       - let spinlock to protect 'fq->rq_status'
      v5:
       - move rq_status after flush_running_idx member of struct blk_flush_queue
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8d699663
  15. 18 9月, 2019 1 次提交
  16. 16 9月, 2019 2 次提交
  17. 06 9月, 2019 3 次提交
    • D
      block: Delay default elevator initialization · 737eb78e
      Damien Le Moal 提交于
      When elevator_init_mq() is called from blk_mq_init_allocated_queue(),
      the only information known about the device is the number of hardware
      queues as the block device scan by the device driver is not completed
      yet for most drivers. The device type and elevator required features
      are not set yet, preventing to correctly select the default elevator
      most suitable for the device.
      
      This currently affects all multi-queue zoned block devices which default
      to the "none" elevator instead of the required "mq-deadline" elevator.
      These drives currently include host-managed SMR disks connected to a
      smartpqi HBA and null_blk block devices with zoned mode enabled.
      Upcoming NVMe Zoned Namespace devices will also be affected.
      
      Fix this by adding the boolean elevator_init argument to
      blk_mq_init_allocated_queue() to control the execution of
      elevator_init_mq(). Two cases exist:
      1) elevator_init = false is used for calls to
         blk_mq_init_allocated_queue() within blk_mq_init_queue(). In this
         case, a call to elevator_init_mq() is added to __device_add_disk(),
         resulting in the delayed initialization of the queue elevator
         after the device driver finished probing the device information. This
         effectively allows elevator_init_mq() access to more information
         about the device.
      2) elevator_init = true preserves the current behavior of initializing
         the elevator directly from blk_mq_init_allocated_queue(). This case
         is used for the special request based DM devices where the device
         gendisk is created before the queue initialization and device
         information (e.g. queue limits) is already known when the queue
         initialization is executed.
      
      Additionally, to make sure that the elevator initialization is never
      done while requests are in-flight (there should be none when the device
      driver calls device_add_disk()), freeze and quiesce the device request
      queue before calling blk_mq_init_sched() in elevator_init_mq().
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      737eb78e
    • D
      block: Change elevator_init_mq() to always succeed · 954b4a5c
      Damien Le Moal 提交于
      If the default elevator chosen is mq-deadline, elevator_init_mq() may
      return an error if mq-deadline initialization fails, leading to
      blk_mq_init_allocated_queue() returning an error, which in turn will
      cause the block device initialization to fail and the device not being
      exposed.
      
      Instead of taking such extreme measure, handle mq-deadline
      initialization failures in the same manner as when mq-deadline is not
      available (no module to load), that is, default to the "none" scheduler.
      With this change, elevator_init_mq() return type can be changed to void.
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      954b4a5c
    • D
      block: Cleanup elevator_init_mq() use · 61db437d
      Damien Le Moal 提交于
      Instead of checking a queue tag_set BLK_MQ_F_NO_SCHED flag before
      calling elevator_init_mq() to make sure that the queue supports IO
      scheduling, use the elevator.c function elv_support_iosched() in
      elevator_init_mq(). This does not introduce any functional change but
      ensure that elevator_init_mq() does the right thing based on the queue
      settings.
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      61db437d
  18. 29 8月, 2019 1 次提交
    • T
      blk-mq: add optional request->alloc_time_ns · 6f816b4b
      Tejun Heo 提交于
      There are currently two start time timestamps - start_time_ns and
      io_start_time_ns.  The former marks the request allocation and and the
      second issue-to-device time.  The planned io.weight controller needs
      to measure the total time bios take to execute after it leaves rq_qos
      including the time spent waiting for request to become available,
      which can easily dominate on saturated devices.
      
      This patch adds request->alloc_time_ns which records when the request
      allocation attempt started.  As it isn't used for the usual stats,
      make it optional behind CONFIG_BLK_RQ_ALLOC_TIME and
      QUEUE_FLAG_RQ_ALLOC_TIME so that it can be compiled out when there are
      no users and it's active only on queues which need it even when
      compiled in.
      
      v2: s/pre_start_time/alloc_time/ and add CONFIG_BLK_RQ_ALLOC_TIME
          gating as suggested by Jens.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6f816b4b
  19. 28 8月, 2019 1 次提交
    • M
      blk-mq: don't hold q->sysfs_lock in blk_mq_map_swqueue · c6ba9333
      Ming Lei 提交于
      blk_mq_map_swqueue() is called from blk_mq_init_allocated_queue()
      and blk_mq_update_nr_hw_queues(). For the former caller, the kobject
      isn't exposed to userspace yet. For the latter caller, hctx sysfs entries
      and debugfs are un-registered before updating nr_hw_queues.
      
      On the other hand, commit 2f8f1336 ("blk-mq: always free hctx after
      request queue is freed") moves freeing hctx into queue's release
      handler, so there won't be race with queue release path too.
      
      So don't hold q->sysfs_lock in blk_mq_map_swqueue().
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c6ba9333
  20. 16 8月, 2019 1 次提交
    • J
      block: remove REQ_NOWAIT_INLINE · 7b6620d7
      Jens Axboe 提交于
      We had a few issues with this code, and there's still a problem around
      how we deal with error handling for chained/split bios. For now, just
      revert the code and we'll try again with a thoroug solution. This
      reverts commits:
      
      e15c2ffa ("block: fix O_DIRECT error handling for bio fragments")
      0eb6ddfb ("block: Fix __blkdev_direct_IO() for bio fragments")
      6a43074e ("block: properly handle IOCB_NOWAIT for async O_DIRECT IO")
      893a1c97 ("blk-mq: allow REQ_NOWAIT to return an error inline")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7b6620d7
  21. 12 8月, 2019 2 次提交
  22. 05 8月, 2019 2 次提交
  23. 01 8月, 2019 2 次提交
  24. 31 7月, 2019 1 次提交