1. 25 1月, 2021 3 次提交
  2. 06 1月, 2021 1 次提交
    • J
      bfq: Fix computation of shallow depth · 6d4d2735
      Jan Kara 提交于
      BFQ computes number of tags it allows to be allocated for each request type
      based on tag bitmap. However it uses 1 << bitmap.shift as number of
      available tags which is wrong. 'shift' is just an internal bitmap value
      containing logarithm of how many bits bitmap uses in each bitmap word.
      Thus number of tags allowed for some request types can be far to low.
      Use proper bitmap.depth which has the number of tags instead.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6d4d2735
  3. 09 9月, 2020 1 次提交
    • O
      block: only call sched requeue_request() for scheduled requests · e8a8a185
      Omar Sandoval 提交于
      Yang Yang reported the following crash caused by requeueing a flush
      request in Kyber:
      
        [    2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00
        ...
        [    2.517468] pc : clear_bit+0x18/0x2c
        [    2.517502] lr : sbitmap_queue_clear+0x40/0x228
        [    2.517503] sp : ffffff800832bc60 pstate : 00c00145
        ...
        [    2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000)
        [    2.517602] Call trace:
        [    2.517606]  clear_bit+0x18/0x2c
        [    2.517619]  kyber_finish_request+0x74/0x80
        [    2.517627]  blk_mq_requeue_request+0x3c/0xc0
        [    2.517637]  __scsi_queue_insert+0x11c/0x148
        [    2.517640]  scsi_softirq_done+0x114/0x130
        [    2.517643]  blk_done_softirq+0x7c/0xb0
        [    2.517651]  __do_softirq+0x208/0x3bc
        [    2.517657]  run_ksoftirqd+0x34/0x60
        [    2.517663]  smpboot_thread_fn+0x1c4/0x2c0
        [    2.517667]  kthread+0x110/0x120
        [    2.517669]  ret_from_fork+0x10/0x18
      
      This happens because Kyber doesn't track flush requests, so
      kyber_finish_request() reads a garbage domain token. Only call the
      scheduler's requeue_request() hook if RQF_ELVPRIV is set (like we do for
      the finish_request() hook in blk_mq_free_request()). Now that we're
      handling it in blk-mq, also remove the check from BFQ.
      Reported-by: NYang Yang <yang.yang@vivo.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e8a8a185
  4. 04 9月, 2020 2 次提交
  5. 24 8月, 2020 1 次提交
  6. 01 8月, 2020 1 次提交
  7. 30 5月, 2020 1 次提交
  8. 10 5月, 2020 1 次提交
  9. 22 3月, 2020 2 次提交
    • P
      block, bfq: turn put_queue into release_process_ref in __bfq_bic_change_cgroup · c8997736
      Paolo Valente 提交于
      A bfq_put_queue() may be invoked in __bfq_bic_change_cgroup(). The
      goal of this put is to release a process reference to a bfq_queue. But
      process-reference releases may trigger also some extra operation, and,
      to this goal, are handled through bfq_release_process_ref(). So, turn
      the invocation of bfq_put_queue() into an invocation of
      bfq_release_process_ref().
      
      Tested-by: cki-project@redhat.com
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c8997736
    • Z
      block, bfq: fix use-after-free in bfq_idle_slice_timer_body · 2f95fa5c
      Zhiqiang Liu 提交于
      In bfq_idle_slice_timer func, bfqq = bfqd->in_service_queue is
      not in bfqd-lock critical section. The bfqq, which is not
      equal to NULL in bfq_idle_slice_timer, may be freed after passing
      to bfq_idle_slice_timer_body. So we will access the freed memory.
      
      In addition, considering the bfqq may be in race, we should
      firstly check whether bfqq is in service before doing something
      on it in bfq_idle_slice_timer_body func. If the bfqq in race is
      not in service, it means the bfqq has been expired through
      __bfq_bfqq_expire func, and wait_request flags has been cleared in
      __bfq_bfqd_reset_in_service func. So we do not need to re-clear the
      wait_request of bfqq which is not in service.
      
      KASAN log is given as follows:
      [13058.354613] ==================================================================
      [13058.354640] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0xac/0x290
      [13058.354644] Read of size 8 at addr ffffa02cf3e63f78 by task fork13/19767
      [13058.354646]
      [13058.354655] CPU: 96 PID: 19767 Comm: fork13
      [13058.354661] Call trace:
      [13058.354667]  dump_backtrace+0x0/0x310
      [13058.354672]  show_stack+0x28/0x38
      [13058.354681]  dump_stack+0xd8/0x108
      [13058.354687]  print_address_description+0x68/0x2d0
      [13058.354690]  kasan_report+0x124/0x2e0
      [13058.354697]  __asan_load8+0x88/0xb0
      [13058.354702]  bfq_idle_slice_timer+0xac/0x290
      [13058.354707]  __hrtimer_run_queues+0x298/0x8b8
      [13058.354710]  hrtimer_interrupt+0x1b8/0x678
      [13058.354716]  arch_timer_handler_phys+0x4c/0x78
      [13058.354722]  handle_percpu_devid_irq+0xf0/0x558
      [13058.354731]  generic_handle_irq+0x50/0x70
      [13058.354735]  __handle_domain_irq+0x94/0x110
      [13058.354739]  gic_handle_irq+0x8c/0x1b0
      [13058.354742]  el1_irq+0xb8/0x140
      [13058.354748]  do_wp_page+0x260/0xe28
      [13058.354752]  __handle_mm_fault+0x8ec/0x9b0
      [13058.354756]  handle_mm_fault+0x280/0x460
      [13058.354762]  do_page_fault+0x3ec/0x890
      [13058.354765]  do_mem_abort+0xc0/0x1b0
      [13058.354768]  el0_da+0x24/0x28
      [13058.354770]
      [13058.354773] Allocated by task 19731:
      [13058.354780]  kasan_kmalloc+0xe0/0x190
      [13058.354784]  kasan_slab_alloc+0x14/0x20
      [13058.354788]  kmem_cache_alloc_node+0x130/0x440
      [13058.354793]  bfq_get_queue+0x138/0x858
      [13058.354797]  bfq_get_bfqq_handle_split+0xd4/0x328
      [13058.354801]  bfq_init_rq+0x1f4/0x1180
      [13058.354806]  bfq_insert_requests+0x264/0x1c98
      [13058.354811]  blk_mq_sched_insert_requests+0x1c4/0x488
      [13058.354818]  blk_mq_flush_plug_list+0x2d4/0x6e0
      [13058.354826]  blk_flush_plug_list+0x230/0x548
      [13058.354830]  blk_finish_plug+0x60/0x80
      [13058.354838]  read_pages+0xec/0x2c0
      [13058.354842]  __do_page_cache_readahead+0x374/0x438
      [13058.354846]  ondemand_readahead+0x24c/0x6b0
      [13058.354851]  page_cache_sync_readahead+0x17c/0x2f8
      [13058.354858]  generic_file_buffered_read+0x588/0xc58
      [13058.354862]  generic_file_read_iter+0x1b4/0x278
      [13058.354965]  ext4_file_read_iter+0xa8/0x1d8 [ext4]
      [13058.354972]  __vfs_read+0x238/0x320
      [13058.354976]  vfs_read+0xbc/0x1c0
      [13058.354980]  ksys_read+0xdc/0x1b8
      [13058.354984]  __arm64_sys_read+0x50/0x60
      [13058.354990]  el0_svc_common+0xb4/0x1d8
      [13058.354994]  el0_svc_handler+0x50/0xa8
      [13058.354998]  el0_svc+0x8/0xc
      [13058.354999]
      [13058.355001] Freed by task 19731:
      [13058.355007]  __kasan_slab_free+0x120/0x228
      [13058.355010]  kasan_slab_free+0x10/0x18
      [13058.355014]  kmem_cache_free+0x288/0x3f0
      [13058.355018]  bfq_put_queue+0x134/0x208
      [13058.355022]  bfq_exit_icq_bfqq+0x164/0x348
      [13058.355026]  bfq_exit_icq+0x28/0x40
      [13058.355030]  ioc_exit_icq+0xa0/0x150
      [13058.355035]  put_io_context_active+0x250/0x438
      [13058.355038]  exit_io_context+0xd0/0x138
      [13058.355045]  do_exit+0x734/0xc58
      [13058.355050]  do_group_exit+0x78/0x220
      [13058.355054]  __wake_up_parent+0x0/0x50
      [13058.355058]  el0_svc_common+0xb4/0x1d8
      [13058.355062]  el0_svc_handler+0x50/0xa8
      [13058.355066]  el0_svc+0x8/0xc
      [13058.355067]
      [13058.355071] The buggy address belongs to the object at ffffa02cf3e63e70#012 which belongs to the cache bfq_queue of size 464
      [13058.355075] The buggy address is located 264 bytes inside of#012 464-byte region [ffffa02cf3e63e70, ffffa02cf3e64040)
      [13058.355077] The buggy address belongs to the page:
      [13058.355083] page:ffff7e80b3cf9800 count:1 mapcount:0 mapping:ffff802db5c90780 index:0xffffa02cf3e606f0 compound_mapcount: 0
      [13058.366175] flags: 0x2ffffe0000008100(slab|head)
      [13058.370781] raw: 2ffffe0000008100 ffff7e80b53b1408 ffffa02d730c1c90 ffff802db5c90780
      [13058.370787] raw: ffffa02cf3e606f0 0000000000370023 00000001ffffffff 0000000000000000
      [13058.370789] page dumped because: kasan: bad access detected
      [13058.370791]
      [13058.370792] Memory state around the buggy address:
      [13058.370797]  ffffa02cf3e63e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb
      [13058.370801]  ffffa02cf3e63e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370805] >ffffa02cf3e63f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370808]                                                                 ^
      [13058.370811]  ffffa02cf3e63f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370815]  ffffa02cf3e64000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      [13058.370817] ==================================================================
      [13058.370820] Disabling lock debugging due to kernel taint
      
      Here, we directly pass the bfqd to bfq_idle_slice_timer_body func.
      --
      V2->V3: rewrite the comment as suggested by Paolo Valente
      V1->V2: add one comment, and add Fixes and Reported-by tag.
      
      Fixes: aee69d78 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
      Acked-by: NPaolo Valente <paolo.valente@linaro.org>
      Reported-by: NWang Wang <wangwang2@huawei.com>
      Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Signed-off-by: NFeilong Lin <linfeilong@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2f95fa5c
  10. 03 2月, 2020 5 次提交
  11. 23 1月, 2020 1 次提交
  12. 14 11月, 2019 1 次提交
  13. 08 11月, 2019 1 次提交
    • T
      bfq-iosched: stop using blkg->stat_bytes and ->stat_ios · fd41e603
      Tejun Heo 提交于
      When used on cgroup1, bfq uses the blkg->stat_bytes and ->stat_ios
      from blk-cgroup core to populate six stat knobs.  blk-cgroup core is
      moving away from blkg_rwstat to improve scalability and won't be able
      to support this usage.
      
      It isn't like the sharing gains all that much.  Let's break it out to
      dedicated rwstat counters which are updated when on cgroup1.  This
      makes use of bfqg_*rwstat*() helpers outside of
      CONFIG_BFQ_CGROUP_DEBUG.  Move them out.
      
      v2: Compile fix when !CONFIG_BFQ_CGROUP_DEBUG.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Paolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fd41e603
  14. 18 9月, 2019 4 次提交
  15. 08 8月, 2019 3 次提交
  16. 18 7月, 2019 1 次提交
    • P
      block, bfq: check also in-flight I/O in dispatch plugging · b5e02b48
      Paolo Valente 提交于
      Consider a sync bfq_queue Q that remains empty while in service, and
      suppose that, when this happens, there is a fair amount of already
      in-flight I/O not belonging to Q. In such a situation, I/O dispatching
      may need to be plugged (until new I/O arrives for Q), for the
      following reason.
      
      The drive may decide to serve in-flight non-Q's I/O requests before
      Q's ones, thereby delaying the arrival of new I/O requests for Q
      (recall that Q is sync). If I/O-dispatching is not plugged, then,
      while Q remains empty, a basically uncontrolled amount of I/O from
      other queues may be dispatched too, possibly causing the service of
      Q's I/O to be delayed even longer in the drive. This problem gets more
      and more serious as the speed and the queue depth of the drive grow,
      because, as these two quantities grow, the probability to find no
      queue busy but many requests in flight grows too.
      
      If Q has the same weight and priority as the other queues, then the
      above delay is unlikely to cause any issue, because all queues tend to
      undergo the same treatment. So, since not plugging I/O dispatching is
      convenient for throughput, it is better not to plug. Things change in
      case Q has a higher weight or priority than some other queue, because
      Q's service guarantees may simply be violated. For this reason,
      commit 1de0c4cd ("block, bfq: reduce idling only in symmetric
      scenarios") does plug I/O in such an asymmetric scenario. Plugging
      minimizes the delay induced by already in-flight I/O, and enables Q to
      recover the bandwidth it may lose because of this delay.
      
      Yet the above commit does not cover the case of weight-raised queues,
      for efficiency concerns. For weight-raised queues, I/O-dispatch
      plugging is activated simply if not all bfq_queues are
      weight-raised. But this check does not handle the case of in-flight
      requests, because a bfq_queue may become non busy *before* all its
      in-flight requests are completed.
      
      This commit performs I/O-dispatch plugging for weight-raised queues if
      there are some in-flight requests.
      
      As a practical example of the resulting recover of control, under
      write load on a Samsung SSD 970 PRO, gnome-terminal starts in 1.5
      seconds after this fix, against 15 seconds before the fix (as a
      reference, gnome-terminal takes about 35 seconds to start with any of
      the other I/O schedulers).
      
      Fixes: 1de0c4cd ("block, bfq: reduce idling only in symmetric scenarios")
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b5e02b48
  17. 15 7月, 2019 1 次提交
  18. 28 6月, 2019 1 次提交
    • D
      block, bfq: NULL out the bic when it's no longer valid · dbc3117d
      Douglas Anderson 提交于
      In reboot tests on several devices we were seeing a "use after free"
      when slub_debug or KASAN was enabled.  The kernel complained about:
      
        Unable to handle kernel paging request at virtual address 6b6b6c2b
      
      ...which is a classic sign of use after free under slub_debug.  The
      stack crawl in kgdb looked like:
      
       0  test_bit (addr=<optimized out>, nr=<optimized out>)
       1  bfq_bfqq_busy (bfqq=<optimized out>)
       2  bfq_select_queue (bfqd=<optimized out>)
       3  __bfq_dispatch_request (hctx=<optimized out>)
       4  bfq_dispatch_request (hctx=<optimized out>)
       5  0xc056ef00 in blk_mq_do_dispatch_sched (hctx=0xed249440)
       6  0xc056f728 in blk_mq_sched_dispatch_requests (hctx=0xed249440)
       7  0xc0568d24 in __blk_mq_run_hw_queue (hctx=0xed249440)
       8  0xc0568d94 in blk_mq_run_work_fn (work=<optimized out>)
       9  0xc024c5c4 in process_one_work (worker=0xec6d4640, work=0xed249480)
       10 0xc024cff4 in worker_thread (__worker=0xec6d4640)
      
      Digging in kgdb, it could be found that, though bfqq looked fine,
      bfqq->bic had been freed.
      
      Through further digging, I postulated that perhaps it is illegal to
      access a "bic" (AKA an "icq") after bfq_exit_icq() had been called
      because the "bic" can be freed at some point in time after this call
      is made.  I confirmed that there certainly were cases where the exact
      crashing code path would access the "bic" after bfq_exit_icq() had
      been called.  Sspecifically I set the "bfqq->bic" to (void *)0x7 and
      saw that the bic was 0x7 at the time of the crash.
      
      To understand a bit more about why this crash was fairly uncommon (I
      saw it only once in a few hundred reboots), you can see that much of
      the time bfq_exit_icq_fbqq() fully frees the bfqq and thus it can't
      access the ->bic anymore.  The only case it doesn't is if
      bfq_put_queue() sees a reference still held.
      
      However, even in the case when bfqq isn't freed, the crash is still
      rare.  Why?  I tracked what happened to the "bic" after the exit
      routine.  It doesn't get freed right away.  Rather,
      put_io_context_active() eventually called put_io_context() which
      queued up freeing on a workqueue.  The freeing then actually happened
      later than that through call_rcu().  Despite all these delays, some
      extra debugging showed that all the hoops could be jumped through in
      time and the memory could be freed causing the original crash.  Phew!
      
      To make a long story short, assuming it truly is illegal to access an
      icq after the "exit_icq" callback is finished, this patch is needed.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NPaolo Valente <paolo.valente@unimore.it>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dbc3117d
  19. 27 6月, 2019 1 次提交
  20. 26 6月, 2019 1 次提交
  21. 25 6月, 2019 7 次提交
    • P
      block, bfq: re-schedule empty queues if they deserve I/O plugging · 3726112e
      Paolo Valente 提交于
      Consider, on one side, a bfq_queue Q that remains empty while in
      service, and, on the other side, the pending I/O of bfq_queues that,
      according to their timestamps, have to be served after Q.  If an
      uncontrolled amount of I/O from the latter bfq_queues were dispatched
      while Q is waiting for its new I/O to arrive, then Q's bandwidth
      guarantees would be violated. To prevent this, I/O dispatch is plugged
      until Q receives new I/O (except for a properly controlled amount of
      injected I/O). Unfortunately, preemption breaks I/O-dispatch plugging,
      for the following reason.
      
      Preemption is performed in two steps. First, Q is expired and
      re-scheduled. Second, the new bfq_queue to serve is chosen. The first
      step is needed by the second, as the second can be performed only
      after Q's timestamps have been properly updated (done in the
      expiration step), and Q has been re-queued for service. This
      dependency is a consequence of the way how BFQ's scheduling algorithm
      is currently implemented.
      
      But Q is not re-scheduled at all in the first step, because Q is
      empty. As a consequence, an uncontrolled amount of I/O may be
      dispatched until Q becomes non empty again. This breaks Q's service
      guarantees.
      
      This commit addresses this issue by re-scheduling Q even if it is
      empty. This in turn breaks the assumption that all scheduled queues
      are non empty. Then a few extra checks are now needed.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3726112e
    • P
      block, bfq: preempt lower-weight or lower-priority queues · 96a291c3
      Paolo Valente 提交于
      BFQ enqueues the I/O coming from each process into a separate
      bfq_queue, and serves bfq_queues one at a time. Each bfq_queue may be
      served for at most timeout_sync milliseconds (default: 125 ms). This
      service scheme is prone to the following inaccuracy.
      
      While a bfq_queue Q1 is in service, some empty bfq_queue Q2 may
      receive I/O, and, according to BFQ's scheduling policy, may become the
      right bfq_queue to serve, in place of the currently in-service
      bfq_queue. In this respect, postponing the service of Q2 to after the
      service of Q1 finishes may delay the completion of Q2's I/O, compared
      with an ideal service in which all non-empty bfq_queues are served in
      parallel, and every non-empty bfq_queue is served at a rate
      proportional to the bfq_queue's weight. This additional delay is equal
      at most to the time Q1 may unjustly remain in service before switching
      to Q2.
      
      If Q1 and Q2 have the same weight, then this time is most likely
      negligible compared with the completion time to be guaranteed to Q2's
      I/O. In addition, first, one of the reasons why BFQ may want to serve
      Q1 for a while is that this boosts throughput and, second, serving Q1
      longer reduces BFQ's overhead. As a conclusion, it is usually better
      not to preempt Q1 if both Q1 and Q2 have the same weight.
      
      In contrast, as Q2's weight or priority becomes higher and higher
      compared with that of Q1, the above delay becomes larger and larger,
      compared with the I/O completion times that have to be guaranteed to
      Q2 according to Q2's weight. So reducing this delay may be more
      important than avoiding the costs of preempting Q1.
      
      Accordingly, this commit preempts Q1 if Q2 has a higher weight or a
      higher priority than Q1. Preemption causes Q1 to be re-scheduled, and
      triggers a new choice of the next bfq_queue to serve. If Q2 really is
      the next bfq_queue to serve, then Q2 will be set in service
      immediately.
      
      This change reduces the component of the I/O latency caused by the
      above delay by about 80%. For example, on an (old) PLEXTOR PX-256M5
      SSD, the maximum latency reported by fio drops from 15.1 to 3.2 ms for
      a process doing sporadic random reads while another process is doing
      continuous sequential reads.
      Signed-off-by: NNicola Bottura <bottura.nicola95@gmail.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      96a291c3
    • P
      block, bfq: detect wakers and unconditionally inject their I/O · 13a857a4
      Paolo Valente 提交于
      A bfq_queue Q may happen to be synchronized with another
      bfq_queue Q2, i.e., the I/O of Q2 may need to be completed for Q to
      receive new I/O. We call Q2 "waker queue".
      
      If I/O plugging is being performed for Q, and Q is not receiving any
      more I/O because of the above synchronization, then, thanks to BFQ's
      injection mechanism, the waker queue is likely to get served before
      the I/O-plugging timeout fires.
      
      Unfortunately, this fact may not be sufficient to guarantee a high
      throughput during the I/O plugging, because the inject limit for Q may
      be too low to guarantee a lot of injected I/O. In addition, the
      duration of the plugging, i.e., the time before Q finally receives new
      I/O, may not be minimized, because the waker queue may happen to be
      served only after other queues.
      
      To address these issues, this commit introduces the explicit detection
      of the waker queue, and the unconditional injection of a pending I/O
      request of the waker queue on each invocation of
      bfq_dispatch_request().
      
      One may be concerned that this systematic injection of I/O from the
      waker queue delays the service of Q's I/O. Fortunately, it doesn't. On
      the contrary, next Q's I/O is brought forward dramatically, for it is
      not blocked for milliseconds.
      Reported-by: NSrivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
      Tested-by: NSrivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      13a857a4
    • P
      block, bfq: bring forward seek&think time update · a3f9bce3
      Paolo Valente 提交于
      Until the base value for request service times gets finally computed
      for a bfq_queue, the inject limit for that queue does depend on the
      think-time state (short|long) of the queue. A timely update of the
      think time then guarantees a quicker activation or deactivation of the
      injection. Fortunately, the think time of a bfq_queue is updated in
      the same code path as the inject limit; but after the inject limit.
      
      This commits moves the update of the think time before the update of
      the inject limit. For coherence, it moves the update of the seek time
      too.
      Reported-by: NSrivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
      Tested-by: NSrivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a3f9bce3
    • P
      block, bfq: update base request service times when possible · 24792ad0
      Paolo Valente 提交于
      I/O injection gets reduced if it increases the request service times
      of the victim queue beyond a certain threshold.  The threshold, in its
      turn, is computed as a function of the base service time enjoyed by
      the queue when it undergoes no injection.
      
      As a consequence, for injection to work properly, the above base value
      has to be accurate. In this respect, such a value may vary over
      time. For example, it varies if the size or the spatial locality of
      the I/O requests in the queue change. It is then important to update
      this value whenever possible. This commit performs this update.
      Reported-by: NSrivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
      Tested-by: NSrivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      24792ad0
    • P
      block, bfq: fix rq_in_driver check in bfq_update_inject_limit · db599f9e
      Paolo Valente 提交于
      One of the cases where the parameters for injection may be updated is
      when there are no more in-flight I/O requests. The number of in-flight
      requests is stored in the field bfqd->rq_in_driver of the descriptor
      bfqd of the device. So, the controlled condition is
      bfqd->rq_in_driver == 0.
      
      Unfortunately, this is wrong because, the instruction that checks this
      condition is in the code path that handles the completion of a
      request, and, in particular, the instruction is executed before
      bfqd->rq_in_driver is decremented in such a code path.
      
      This commit fixes this issue by just replacing 0 with 1 in the
      comparison.
      Reported-by: NSrivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
      Tested-by: NSrivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      db599f9e
    • P
      block, bfq: reset inject limit when think-time state changes · 766d6141
      Paolo Valente 提交于
      Until the base value of the request service times gets finally
      computed for a bfq_queue, the inject limit does depend on the
      think-time state (short|long). The limit must be 0 or 1 if the think
      time is deemed, respectively, as short or long. However, such a check
      and possible limit update is performed only periodically, once per
      second. So, to make the injection mechanism much more reactive, this
      commit performs the update also every time the think-time state
      changes.
      
      In addition, in the following special case, this commit lets the
      inject limit of a bfq_queue bfqq remain equal to 1 even if bfqq's
      think time is short: bfqq's I/O is synchronized with that of some
      other queue, i.e., bfqq may receive new I/O only after the I/O of the
      other queue is completed. Keeping the inject limit to 1 allows the
      blocking I/O to be served while bfqq is in service. And this is very
      convenient both for bfqq and for the total throughput, as explained
      in detail in the comments in bfq_update_has_short_ttime().
      Reported-by: NSrivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
      Tested-by: NSrivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      766d6141