1. 01 2月, 2019 15 次提交
    • J
      blk-mq: save queue mapping result into ctx directly · 8ccdf4a3
      Jianchao Wang 提交于
      Currently, the queue mapping result is saved in a two-dimensional
      array. In the hot path, to get a hctx, we need do following:
      
        q->queue_hw_ctx[q->tag_set->map[type].mq_map[cpu]]
      
      This isn't very efficient. We could save the queue mapping result into
      ctx directly with different hctx type, like,
      
        ctx->hctxs[type]
      Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8ccdf4a3
    • P
      block, bfq: fix in-service-queue check for queue merging · 058fdecc
      Paolo Valente 提交于
      When a new I/O request arrives for a bfq_queue, say Q, bfq checks
      whether that request is close to
      (a) the head request of some other queue waiting to be served, or
      (b) the last request dispatched for the in-service queue (in case Q
      itself is not the in-service queue)
      
      If a queue, say Q2, is found for which the above condition holds, then
      bfq merges Q and Q2, to hopefully get a more sequential I/O in the
      resulting merged queue, and thus a possibly higher throughput.
      
      Case (b) is checked by comparing the new request for Q with the last
      request dispatched, assuming that the latter necessarily belonged to the
      in-service queue. Unfortunately, this assumption is no longer always
      correct, since commit d0edc247 ("block, bfq: inject other-queue I/O
      into seeky idle queues on NCQ flash").
      
      When the assumption does not hold, queues that must not be merged may be
      merged, causing unexpected loss of control on per-queue service
      guarantees.
      
      This commit solves this problem by adding an extra field, which stores
      the actual last request dispatched for the in-service queue, and by
      using this new field to correctly check case (b).
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      058fdecc
    • P
      block, bfq: do not overcharge writes in asymmetric scenarios · 02a6d787
      Paolo Valente 提交于
      Writes tend to starve reads. bfq counters this problem by overcharging
      writes with an inflated service w.r.t. the actual service (number of
      sector written) they receive.
      
      Yet his overcharging is useless, and actually causes unfairness in the
      opposite direction, when bfq happens to be enforcing strong I/O control.
      bfq does this enforcing when the scenario is asymmetric, i.e., when some
      bfq_queue or group of bfq_queues is to be granted a different bandwidth
      than some other bfq_queue or group of bfq_queues. So, in such a
      scenario, this commit disables write overcharging.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      02a6d787
    • P
      block, bfq: port commit "cfq-iosched: improve hw_tag detection" · b3c34981
      Paolo Valente 提交于
      The original commit is commit 1a1238a7 ("cfq-iosched: improve hw_tag
      detection") and has the following commit message:
      
      If active queue hasn't enough requests and idle window opens, cfq will
      not dispatch sufficient requests to hardware. In such situation, current
      code will zero hw_tag. But this is because cfq doesn't dispatch enough
      requests instead of hardware queue doesn't work. Don't zero hw_tag in
      such case.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b3c34981
    • P
      block, bfq: reduce threshold for detecting command queueing · a3c92560
      Paolo Valente 提交于
      bfq simple heuristic from cfq for detecting whether the drive performs
      command queueing: check whether the average number of in-flight requests
      is above a given threshold. Unfortunately this heuristic does fail to
      detect queueing (on drives with queueing) if processes doing I/O are few
      and issue I/O with a low depth.
      
      To reduce false negatives, this commit lowers the threshold.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a3c92560
    • P
      block, bfq: fix queue removal from weights tree · 9dee8b3b
      Paolo Valente 提交于
      bfq maintains an ordered list, through a red-black tree, of unique
      weights of active bfq_queues. This list is used to detect whether there
      are active queues with differentiated weights. The weight of a queue is
      removed from the list when both the following two conditions become
      true:
      
      (1) the bfq_queue is flagged as inactive
      (2) the has no in-flight request any longer;
      
      Unfortunately, in the rare cases where condition (2) becomes true before
      condition (1), the removal fails, because the function to remove the
      weight of the queue (bfq_weights_tree_remove) is rightly invoked in the
      path that deactivates the bfq_queue, but mistakenly invoked *before* the
      function that actually performs the deactivation (bfq_deactivate_bfqq).
      
      This commits moves the invocation of bfq_weights_tree_remove for
      condition (1) to after bfq_deactivate_bfqq. As a consequence of this
      move, it is necessary to add a further reference to the queue when the
      weight of a queue is added, because the queue might otherwise be freed
      before bfq_weights_tree_remove is invoked. This commit adds this
      reference and makes all related modifications.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9dee8b3b
    • P
      block, bfq: fix sequential rq detection in rate estimation · d87447d8
      Paolo Valente 提交于
      In bfq_update_peak_rate, to check whether an I/O request rq is
      sequential, only the seek distance of rq w.r.t. the last request
      dispatched is controlled. This is not sufficient for non-rotational
      storage, where the size of rq is at least as relevant. This commit adds
      the missing control.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d87447d8
    • P
      block, bfq: unconditionally plug I/O in asymmetric scenarios · 530c4cbb
      Paolo Valente 提交于
      bfq detects the creation of multiple bfq_queues shortly after each
      other, namely a burst of queue creations in the terminology used in the
      code. If the burst is large, then no queue in the burst is granted
      - either I/O-dispatch plugging when the queue remains temporarily idle
        while in service;
      - or weight raising, because it causes even longer plugging.
      
      In fact, such a plugging tends to lower throughput, while these bursts
      are typically due to applications or services that spawn multiple
      processes, to reach a common goal as soon as possible. Examples are a
      "git grep" or the booting of a system.
      
      Unfortunately, disabling plugging may cause a loss of service guarantees
      in asymmetric scenarios, i.e., if queue weights are differentiated or if
      more than one group is active.
      
      This commit addresses this issue by no longer disabling I/O-dispatch
      plugging for queues in large bursts.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      530c4cbb
    • P
      block, bfq: do not plug I/O of in-service queue when harmful · ac8b0cb4
      Paolo Valente 提交于
      If the in-service bfq_queue is sync and remains temporarily idle, then
      I/O dispatching (from other queues) may be plugged. It may be dome for
      two reasons: either to boost throughput, or to preserve the bandwidth
      share of the in-service queue. In the first case, if the I/O of the
      in-service queue, when it finally arrives, consists only of one small
      I/O request, then it makes sense to plug even the I/O of the in-service
      queue. In fact, serving such a small request immediately is likely to
      lower throughput instead of boosting it, whereas waiting a little bit is
      likely to let that request grow, thanks to request merging, and become
      more profitable in terms of throughput (this is likely to happen exactly
      because the I/O of the queue has been detected to boost throughput).
      
      On the opposite end, if I/O dispatching is being plugged only to
      preserve the bandwidth of the in-service queue, then it would be better
      not to plug also the I/O of the in-service queue, because such a
      plugging is likely to cause only loss of bandwidth for the queue.
      
      Unfortunately, no distinction is made between the two cases, and the I/O
      of the in-service queue is always plugged in case just a small I/O
      request arrives. This commit draws this missing distinction and does not
      perform harmful plugging.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ac8b0cb4
    • P
      block, bfq: split function bfq_better_to_idle · 05c2f5c3
      Paolo Valente 提交于
      This is a preparatory commit for commits that need to check only one of
      the two main reasons for idling. This change should also improve the
      quality of the code a little bit, by splitting a function that contains
      very long, non-trivial and little related comments.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      05c2f5c3
    • P
      block, bfq: consider also ioprio classes in symmetry detection · 73d58118
      Paolo Valente 提交于
      In asymmetric scenarios, i.e., when some bfq_queue or bfq_group needs to
      be guaranteed a different bandwidth than other bfq_queues or bfq_groups,
      these service guaranteed can be provided only by plugging I/O dispatch,
      completely or partially, when the queue in service remains temporarily
      empty. A case where asymmetry is particularly strong is when some active
      bfq_queues belong to a higher-priority class than some other active
      bfq_queues. Unfortunately, this important case is not considered at all
      in the code for detecting asymmetric scenarios. This commit adds the
      missing logic.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      73d58118
    • P
      block, bfq: remove case of redirected bic from insert_request · 03e565e4
      Paolo Valente 提交于
      Before commit 18e5a57d ("block, bfq: postpone rq preparation to
      insert or merge"), the destination queue for a request was chosen by a
      different hook than the one that then inserted the request. So, between
      the execution of the two hooks, the bic of the process generating the
      request could happen to be redirected to a different bfq_queue. As a
      consequence, the destination bfq_queue stored in the request could be
      wrong. Such an event does not need to ba handled any longer.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      03e565e4
    • P
      block, bfq: make sure queue budgets are not below service received · f3218ad8
      Paolo Valente 提交于
      With some unlucky sequences of events, the function bfq_updated_next_req
      updates the current budget of a bfq_queue to a lower value than the
      service received by the queue using such a budget. Unfortunately, if
      this happens, then the return value of the function bfq_bfqq_budget_left
      becomes inconsistent. This commit solves this problem by lower-bounding
      the budget computed in bfq_updated_next_req to the service currently
      charged to the queue.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f3218ad8
    • P
      block, bfq: avoid selecting a queue w/o budget · 218cb897
      Paolo Valente 提交于
      To boost throughput on devices with internal queueing and in scenarios
      where device idling is not strictly needed, bfq immediately starts
      serving a new bfq_queue if the in-service bfq_queue remains without
      pending I/O, even if new I/O may arrive soon for the latter queue. Then,
      if such I/O actually arrives soon, bfq preempts the new in-service
      bfq_queue so as to give the previous queue a chance to go on being
      served (in case the previous queue should actually be the one to be
      served, according to its timestamps).
      
      However, the in-service bfq_queue, say Q, may also be without further
      budget when it remains also pending I/O. Since bfq changes budgets
      dynamically to fit the needs of bfq_queues, this happens more often than
      one may expect. If this happens, then there is no point in trying to go
      on serving Q when new I/O arrives for it soon: Q would be expired
      immediately after being selected for service. This would only cause
      useless overhead. This commit avoids such a useless selection.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      218cb897
    • P
      block, bfq: do not consider interactive queues in srt filtering · 20cd3245
      Paolo Valente 提交于
      The speed at which a bfq_queue receives I/O is one of the parameters by
      which bfq decides whether the queue is soft real-time (i.e., whether the
      queue contains the I/O of a soft real-time application). In particular,
      when a bfq_queue remains without outstanding I/O requests, bfq computes
      the minimum time instant, named soft_rt_next_start, at which the next
      request of the queue may arrive for the queue to be deemed as soft real
      time.
      
      Unfortunately this filtering may cause problems with a queue in
      interactive weight raising. In fact, such a queue may be conveying the
      I/O needed to load a soft real-time application. The latter will
      actually exhibit a soft real-time I/O pattern after it finally starts
      doing its job. But, if soft_rt_next_start is updated for an interactive
      bfq_queue, and the queue has received a lot of service before remaining
      with no outstanding request (likely to happen on a fast device), then
      soft_rt_next_start is assigned such a high value that, for a very long
      time, the queue is prevented from being possibly considered as soft real
      time.
      
      This commit removes the updating of soft_rt_next_start for bfq_queues in
      interactive weight raising.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      20cd3245
  2. 27 1月, 2019 1 次提交
  3. 25 1月, 2019 1 次提交
    • B
      blk-wbt: Declare local functions static · c83f536a
      Bart Van Assche 提交于
      This patch avoids that sparse reports the following warnings:
      
        CHECK   block/blk-wbt.c
      block/blk-wbt.c:600:6: warning: symbol 'wbt_issue' was not declared. Should it be static?
      block/blk-wbt.c:620:6: warning: symbol 'wbt_requeue' was not declared. Should it be static?
        CC      block/blk-wbt.o
      block/blk-wbt.c:600:6: warning: no previous prototype for wbt_issue [-Wmissing-prototypes]
       void wbt_issue(struct rq_qos *rqos, struct request *rq)
            ^~~~~~~~~
      block/blk-wbt.c:620:6: warning: no previous prototype for wbt_requeue [-Wmissing-prototypes]
       void wbt_requeue(struct rq_qos *rqos, struct request *rq)
            ^~~~~~~~~~~
      Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c83f536a
  4. 24 1月, 2019 1 次提交
  5. 23 1月, 2019 1 次提交
    • M
      block: cover another queue enter recursion via BIO_QUEUE_ENTERED · 698cef17
      Ming Lei 提交于
      Except for blk_queue_split(), bio_split() is used for splitting bio too,
      then the remained bio is often resubmit to queue via generic_make_request().
      So the same queue enter recursion exits in this case too. Unfortunatley
      commit cd4a4ae4 doesn't help this case.
      
      This patch covers the above case by setting BIO_QUEUE_ENTERED before calling
      q->make_request_fn.
      
      In theory the per-bio flag is used to simulate one stack variable, it is
      just fine to clear it after q->make_request_fn is returned. Especially
      the same bio can't be submitted from another context.
      
      Fixes: cd4a4ae4 ("block: don't use blocking queue entered for recursive bio submits")
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: NeilBrown <neilb@suse.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      698cef17
  6. 18 1月, 2019 1 次提交
    • T
      block: Cleanup license notice · 38197ca1
      Thomas Gleixner 提交于
      Remove the imprecise and sloppy:
      
        "This files is licensed under the GPL."
      
      license notice in the top level comment.
      
      1) The file already contains a SPDX license identifier which clearly
         states that the license of the file is GPL V2 only
      
      2) The notice resolves to GPL v1 or later for scanners which is just
         contrary to the intent of SPDX identifiers to provide clear and non
         ambiguous license information. Aside of that the value add of this
         notice is below zero,
      
      Cc: Damien Le Moal <damien.lemoal@wdc.com>
      Cc: Matias Bjorling <mb@lightnvm.io>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Fixes: 6a5ac984 ("block: Make struct request_queue smaller for CONFIG_BLK_DEV_ZONED=n")
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      38197ca1
  7. 16 1月, 2019 1 次提交
  8. 14 1月, 2019 1 次提交
    • P
      block, bfq: fix comments on __bfq_deactivate_entity · 5bf85908
      Paolo Valente 提交于
      Comments on function __bfq_deactivate_entity contains two imprecise or
      wrong statements:
      1) The function performs the deactivation of the entity.
      2) The function must be invoked only if the entity is on a service tree.
      
      This commits replaces both statements with the correct ones:
      1) The functions updates sched_data and service trees for the entity,
      so as to represent entity as inactive (which is only part of the steps
      needed for the deactivation of the entity).
      2) The function must be invoked on every entity being deactivated.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5bf85908
  9. 10 1月, 2019 1 次提交
  10. 09 1月, 2019 1 次提交
  11. 21 12月, 2018 5 次提交
  12. 20 12月, 2018 1 次提交
    • M
      block: save irq state in blkg_lookup_create() · 3a762de5
      Ming Lei 提交于
      blkg_lookup_create() may be called from pool_map() in which
      irq state is saved, so we have to do that in blkg_lookup_create().
      
      Otherwise, the following lockdep warning can be triggered:
      
      [  104.258537] ================================
      [  104.259129] WARNING: inconsistent lock state
      [  104.259725] 4.20.0-rc6+ #545 Not tainted
      [  104.260268] --------------------------------
      [  104.260865] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [  104.261727] swapper/49/0 [HC0[0]:SC1[1]:HE0:SE0] takes:
      [  104.262444] 00000000db365b5d (&(&pool->lock)->rlock#3){+.?.}, at: thin_endio+0xcf/0x2a3 [dm_thin_pool]
      [  104.263747] {SOFTIRQ-ON-W} state was registered at:
      [  104.264417]   _raw_spin_unlock_irq+0x29/0x4c
      [  104.265014]   blkg_lookup_create+0xdc/0xe6
      [  104.265609]   bio_associate_blkg_from_css+0xd3/0x13f
      [  104.266312]   bio_associate_blkg+0x15a/0x1bb
      [  104.266913]   pool_map+0xe8/0x103 [dm_thin_pool]
      [  104.267572]   __map_bio+0x98/0x29c [dm_mod]
      [  104.268162]   __split_and_process_non_flush+0x29e/0x306 [dm_mod]
      [  104.269003]   __split_and_process_bio+0x16a/0x25b [dm_mod]
      [  104.269971]   __dm_make_request.isra.14+0xdc/0x124 [dm_mod]
      [  104.270973]   generic_make_request+0x3f5/0x68b
      [  104.271676]   process_prepared_mapping+0x166/0x1ef [dm_thin_pool]
      [  104.272531]   schedule_zero+0x239/0x273 [dm_thin_pool]
      [  104.273245]   process_cell+0x60c/0x6f1 [dm_thin_pool]
      [  104.273967]   do_worker+0x60c/0xca8 [dm_thin_pool]
      [  104.274635]   process_one_work+0x4eb/0x834
      [  104.275203]   worker_thread+0x318/0x484
      [  104.275740]   kthread+0x1d1/0x1e1
      [  104.276203]   ret_from_fork+0x3a/0x50
      [  104.276714] irq event stamp: 170003
      [  104.277201] hardirqs last  enabled at (170002): [<ffffffff81bcc33e>] _raw_spin_unlock_irqrestore+0x44/0x6b
      [  104.278535] hardirqs last disabled at (170003): [<ffffffff81bcc1ad>] _raw_spin_lock_irqsave+0x20/0x55
      [  104.280273] softirqs last  enabled at (169978): [<ffffffff810d13d4>] irq_enter+0x4c/0x73
      [  104.281617] softirqs last disabled at (169979): [<ffffffff810d1479>] irq_exit+0x7e/0x11d
      [  104.282744]
      [  104.282744] other info that might help us debug this:
      [  104.283640]  Possible unsafe locking scenario:
      [  104.283640]
      [  104.284452]        CPU0
      [  104.284803]        ----
      [  104.285150]   lock(&(&pool->lock)->rlock#3);
      [  104.285762]   <Interrupt>
      [  104.286130]     lock(&(&pool->lock)->rlock#3);
      [  104.286750]
      [  104.286750]  *** DEADLOCK ***
      [  104.286750]
      [  104.287564] no locks held by swapper/49/0.
      [  104.288129]
      [  104.288129] stack backtrace:
      [  104.288738] CPU: 49 PID: 0 Comm: swapper/49 Not tainted 4.20.0-rc6+ #545
      [  104.289700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
      [  104.290858] Call Trace:
      [  104.291204]  <IRQ>
      [  104.291502]  dump_stack+0x9a/0xe6
      [  104.291968]  mark_lock+0x56c/0x7a6
      [  104.292442]  ? check_usage_backwards+0x209/0x209
      [  104.293086]  __lock_acquire+0x400/0x15bf
      [  104.293662]  ? check_chain_key+0x150/0x1aa
      [  104.294236]  lock_acquire+0x1a6/0x1e3
      [  104.294768]  ? thin_endio+0xcf/0x2a3 [dm_thin_pool]
      [  104.295444]  ? _raw_spin_unlock_irqrestore+0x44/0x6b
      [  104.296143]  ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
      [  104.297031]  _raw_spin_lock_irqsave+0x46/0x55
      [  104.297659]  ? thin_endio+0xcf/0x2a3 [dm_thin_pool]
      [  104.298335]  thin_endio+0xcf/0x2a3 [dm_thin_pool]
      [  104.298997]  ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
      [  104.299886]  ? check_flags+0x20a/0x20a
      [  104.300408]  ? lock_acquire+0x1a6/0x1e3
      [  104.300954]  ? process_prepared_discard_fail+0x36/0x36 [dm_thin_pool]
      [  104.301865]  clone_endio+0x1bb/0x22d [dm_mod]
      [  104.302491]  ? disable_write_zeroes+0x20/0x20 [dm_mod]
      [  104.303200]  ? bio_disassociate_blkg+0xc6/0x15f
      [  104.303836]  ? bio_endio+0x2b2/0x2da
      [  104.304349]  clone_endio+0x1f3/0x22d [dm_mod]
      [  104.304978]  ? disable_write_zeroes+0x20/0x20 [dm_mod]
      [  104.305709]  ? bio_disassociate_blkg+0xc6/0x15f
      [  104.306333]  ? bio_endio+0x2b2/0x2da
      [  104.306853]  clone_endio+0x1f3/0x22d [dm_mod]
      [  104.307476]  ? disable_write_zeroes+0x20/0x20 [dm_mod]
      [  104.308185]  ? bio_disassociate_blkg+0xc6/0x15f
      [  104.308817]  ? bio_endio+0x2b2/0x2da
      [  104.309319]  blk_update_request+0x2de/0x4cc
      [  104.309927]  blk_mq_end_request+0x2a/0x183
      [  104.310498]  blk_done_softirq+0x16a/0x1a6
      [  104.311051]  ? blk_softirq_cpu_dead+0xe2/0xe2
      [  104.311653]  ? __lock_is_held+0x2a/0x87
      [  104.312186]  __do_softirq+0x250/0x4e8
      [  104.312705]  irq_exit+0x7e/0x11d
      [  104.313157]  call_function_single_interrupt+0xf/0x20
      [  104.313860]  </IRQ>
      [  104.314163] RIP: 0010:native_safe_halt+0x2/0x3
      [  104.314792] Code: 63 02 df f0 83 44 24 fc 00 48 89 df e8 cc 3f 7a ff 48 8b 03 a8 08 74 0b 65 81 25 9d 31 45 7e ff ff ff 7f 5b 5d 41 5c c3 fb f4 <c3> f4 c3 0f 1f 44 00 00 41 56 41 55 41 54 55 53 e8 a2 0d 5c ff e8
      [  104.317339] RSP: 0018:ffff888106c9fdc0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04
      [  104.318390] RAX: 1ffff11020d92100 RBX: 0000000000000000 RCX: ffffffff81159ac7
      [  104.319366] RDX: 1ffffffff05d5e69 RSI: 0000000000000007 RDI: ffff888106c90d1c
      [  104.320339] RBP: 0000000000000000 R08: dffffc0000000000 R09: 0000000000000001
      [  104.321313] R10: ffffed1025d57ba0 R11: ffffed1025d57b9f R12: 1ffff11020d93fbf
      [  104.322328] R13: 0000000000000031 R14: ffff888106c90040 R15: 0000000000000000
      [  104.323307]  ? lockdep_hardirqs_on+0x26b/0x278
      [  104.323927]  default_idle+0xd9/0x1a8
      [  104.324427]  do_idle+0x162/0x2b2
      [  104.324891]  ? arch_cpu_idle_exit+0x28/0x28
      [  104.325467]  ? mark_held_locks+0x28/0x7f
      [  104.326031]  ? _raw_spin_unlock_irqrestore+0x44/0x6b
      [  104.326719]  cpu_startup_entry+0x1d/0x1f
      [  104.327261]  start_secondary+0x2cb/0x308
      [  104.327806]  ? set_cpu_sibling_map+0x8a3/0x8a3
      [  104.328421]  secondary_startup_64+0xa4/0xb0
      
      Fixes: b978962a ("blkcg: update blkg_lookup_create() to do locking")
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3a762de5
  13. 19 12月, 2018 2 次提交
  14. 18 12月, 2018 6 次提交
    • M
      blk-mq: enable IO poll if .nr_queues of type poll > 0 · cd19181b
      Ming Lei 提交于
      The queue mapping of type poll only exists when set->map[HCTX_TYPE_POLL].nr_queues
      is bigger than zero, so enhance the constraint by checking .nr_queues of type poll
      before enabling IO poll.
      
      Otherwise IO race & timeout can be observed when running block/007.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cd19181b
    • J
      blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight() · 3c94d83c
      Jens Axboe 提交于
      There's a single user of this function, dm, and dm just wants
      to check if IO is inflight, not that it's just allocated.
      
      This fixes a hang with srp/002 in blktests with dm, where it tries
      to suspend but waits for inflight IO to finish first. As it checks
      for just allocated requests, this fails.
      Tested-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3c94d83c
    • M
      blk-mq: skip zero-queue maps in blk_mq_map_swqueue · e5edd5f2
      Ming Lei 提交于
      From 7e849dd9 ("nvme-pci: don't share queue maps"), the mapping
      table won't be initialized actually if map->nr_queues is zero, so
      we can't use blk_mq_map_queue_type() to retrieve hctx any more.
      
      This way still may cause broken mapping, fix it by skipping zero-queues
      maps in blk_mq_map_swqueue().
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e5edd5f2
    • D
      block: fix blk-iolatency accounting underflow · 13369816
      Dennis Zhou 提交于
      The blk-iolatency controller measures the time from rq_qos_throttle() to
      rq_qos_done_bio() and attributes this time to the first bio that needs
      to create the request. This means if a bio is plug-mergeable or
      bio-mergeable, it gets to bypass the blk-iolatency controller.
      
      The recent series [1], to tag all bios w/ blkgs undermined how iolatency
      was determining which bios it was charging and should process in
      rq_qos_done_bio(). Because all bios are being tagged, this caused the
      atomic_t for the struct rq_wait inflight count to underflow and result
      in a stall.
      
      This patch adds a new flag BIO_TRACKED to let controllers know that a
      bio is going through the rq_qos path. blk-iolatency now checks if this
      flag is set to see if it should process the bio in rq_qos_done_bio().
      
      Overloading BLK_QUEUE_ENTERED works, but makes the flag rules confusing.
      BIO_THROTTLED was another candidate, but the flag is set for all bios
      that have gone through blk-throttle code. Overloading a flag comes with
      the burden of making sure that when either implementation changes, a
      change in setting rules for one doesn't cause a bug in the other. So
      here, we unfortunately opt for adding a new flag.
      
      [1] https://lore.kernel.org/lkml/20181205171039.73066-1-dennis@kernel.org/
      
      Fixes: 5cdf2e3f ("blkcg: associate blkg when associating a device")
      Signed-off-by: NDennis Zhou <dennis@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      13369816
    • M
      blk-mq: fix dispatch from sw queue · c16d6b5a
      Ming Lei 提交于
      When a request is added to rq list of sw queue(ctx), the rq may be from
      a different type of hctx, especially after multi queue mapping is
      introduced.
      
      So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
      blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
      hctx can be dispatched to current hctx in case that read queue or poll
      queue is enabled.
      
      This patch fixes this issue by introducing per-queue-type list.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      
      Changed by me to not use separately cacheline aligned lists, just
      place them all in the same cacheline where we had just the one list
      and lock before.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c16d6b5a
    • D
      block: mq-deadline: Fix write completion handling · 7211aef8
      Damien Le Moal 提交于
      For a zoned block device using mq-deadline, if a write request for a
      zone is received while another write was already dispatched for the same
      zone, dd_dispatch_request() will return NULL and the newly inserted
      write request is kept in the scheduler queue waiting for the ongoing
      zone write to complete. With this behavior, when no other request has
      been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty
      and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to
      __blk_mq_free_request() call of blk_mq_sched_restart() to not run the
      queue when the already dispatched write request completes. The newly
      dispatched request stays stuck in the scheduler queue until eventually
      another request is submitted.
      
      This problem does not affect SCSI disk as the SCSI stack handles queue
      restart on request completion. However, this problem is can be triggered
      the nullblk driver with zoned mode enabled.
      
      Fix this by always requesting a queue restart in dd_dispatch_request()
      if no request was dispatched while WRITE requests are queued.
      
      Fixes: 5700f691 ("mq-deadline: Introduce zone locking support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      
      Add missing export of blk_mq_sched_restart()
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7211aef8
  15. 17 12月, 2018 2 次提交