1. 26 1月, 2021 5 次提交
  2. 25 1月, 2021 7 次提交
    • J
      Revert "blk-mq, elevator: Count requests per hctx to improve performance" · 5ac83c64
      Jan Kara 提交于
      This reverts commit b445547e.
      
      Since both mq-deadline and BFQ completely ignore hctx they are passed to
      their dispatch function and dispatch whatever request they deem fit
      checking whether any request for a particular hctx is queued is just
      pointless since we'll very likely get a request from a different hctx
      anyway. In the following commit we'll deal with lock contention in these
      IO schedulers in presence of multiple HW queues in a different way.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5ac83c64
    • P
      block, bfq: do not expire a queue when it is the only busy one · 2391d13e
      Paolo Valente 提交于
      This commits preserves I/O-dispatch plugging for a special symmetric
      case that may suddenly turn into asymmetric: the case where only one
      bfq_queue, say bfqq, is busy. In this case, not expiring bfqq does not
      cause any harm to any other queues in terms of service guarantees. In
      contrast, it avoids the following unlucky sequence of events: (1) bfqq
      is expired, (2) a new queue with a lower weight than bfqq becomes busy
      (or more queues), (3) the new queue is served until a new request
      arrives for bfqq, (4) when bfqq is finally served, there are so many
      requests of the new queue in the drive that the pending requests for
      bfqq take a lot of time to be served. In particular, event (2) may
      case even already dispatched requests of bfqq to be delayed, inside
      the drive. So, to avoid this series of events, the scenario is
      preventively declared as asymmetric also if bfqq is the only busy
      queues. By doing so, I/O-dispatch plugging is performed for bfqq.
      Tested-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2391d13e
    • P
      block, bfq: avoid spurious switches to soft_rt of interactive queues · 3c337690
      Paolo Valente 提交于
      BFQ tags some bfq_queues as interactive or soft_rt if it deems that
      these bfq_queues contain the I/O of, respectively, interactive or soft
      real-time applications. BFQ privileges both these special types of
      bfq_queues over normal bfq_queues. To privilege a bfq_queue, BFQ
      mainly raises the weight of the bfq_queue. In particular, soft_rt
      bfq_queues get a higher weight than interactive bfq_queues.
      
      A bfq_queue may turn from interactive to soft_rt. And this leads to a
      tricky issue. Soft real-time applications usually start with an
      I/O-bound, interactive phase, in which they load themselves into main
      memory. BFQ correctly detects this phase, and keeps the bfq_queues
      associated with the application in interactive mode for a
      while. Problems arise when the I/O pattern of the application finally
      switches to soft real-time. One of the conditions for a bfq_queue to
      be deemed as soft_rt is that the bfq_queue does not consume too much
      bandwidth. But the bfq_queues associated with a soft real-time
      application consume as much bandwidth as they can in the loading phase
      of the application. So, after the application becomes truly soft
      real-time, a lot of time should pass before the average bandwidth
      consumed by its bfq_queues finally drops to a value acceptable for
      soft_rt bfq_queues. As a consequence, there might be a time gap during
      which the application is not privileged at all, because its bfq_queues
      are not interactive any longer, but cannot be deemed as soft_rt yet.
      
      To avoid this problem, BFQ pretends that an interactive bfq_queue
      consumes zero bandwidth, and allows an interactive bfq_queue to switch
      to soft_rt. Yet, this fake zero-bandwidth consumption easily causes
      the bfq_queue to often switch to soft_rt deceptively, during its
      loading phase. As in soft_rt mode, the bfq_queue gets its bandwidth
      correctly computed, and therefore soon switches back to
      interactive. Then it switches again to soft_rt, and so on. These
      spurious fluctuations usually cause losses of throughput, because they
      deceive BFQ's mechanisms for boosting throughput (injection,
      I/O-plugging avoidance, ...).
      
      This commit addresses this issue as follows:
      1) It does compute actual bandwidth consumption also for interactive
         bfq_queues. This avoids the above false positives.
      2) When a bfq_queue switches from interactive to normal mode, the
         consumed bandwidth is reset (forgotten). This allows the
         bfq_queue to enjoy soft_rt very quickly. In particular, two
         alternatives are possible in this switch:
          - the bfq_queue still has backlog, and therefore there is a budget
            already scheduled to serve the bfq_queue; in this case, the
            scheduling of the current budget of the bfq_queue is not
            hindered, because only the scheduling of the next budget will
            be affected by the weight drop. After that, if the bfq_queue is
            actually in a soft_rt phase, and becomes empty during the
            service of its current budget, which is the natural behavior of
            a soft_rt bfq_queue, then the bfq_queue will be considered as
            soft_rt when its next I/O arrives. If, in contrast, the
            bfq_queue remains constantly non-empty, then its next budget
            will be scheduled with a low weight, which is the natural
            treatment for an I/O-bound (non soft_rt) bfq_queue.
          - the bfq_queue is empty; in this case, the bfq_queue may be
            considered unjustly soft_rt when its new I/O arrives. Yet
            the problem is now much smaller than before, because it is
            unlikely that more than one spurious fluctuation occurs.
      Tested-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3c337690
    • P
      block, bfq: do not raise non-default weights · 91b896f6
      Paolo Valente 提交于
      BFQ heuristics try to detect interactive I/O, and raise the weight of
      the queues containing such an I/O. Yet, if also the user changes the
      weight of a queue (i.e., the user changes the ioprio of the process
      associated with that queue), then it is most likely better to prevent
      BFQ heuristics from silently changing the same weight.
      Tested-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      91b896f6
    • P
      block, bfq: increase time window for waker detection · ab1fb47e
      Paolo Valente 提交于
      Tests on slower machines showed current window to be way too
      small. This commit increases it.
      Tested-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ab1fb47e
    • J
      block, bfq: set next_rq to waker_bfqq->next_rq in waker injection · d4fc3640
      Jia Cheng Hu 提交于
      Since commit c5089591c3ba ("block, bfq: detect wakers and
      unconditionally inject their I/O"), when the in-service bfq_queue, say
      Q, is temporarily empty, BFQ checks whether there are I/O requests to
      inject (also) from the waker bfq_queue for Q. To this goal, the value
      pointed by bfqq->waker_bfqq->next_rq must be controlled. However, the
      current implementation mistakenly looks at bfqq->next_rq, which
      instead points to the next request of the currently served queue.
      
      This mistake evidently causes losses of throughput in scenarios with
      waker bfq_queues.
      
      This commit corrects this mistake.
      
      Fixes: c5089591c3ba ("block, bfq: detect wakers and unconditionally inject their I/O")
      Signed-off-by: NJia Cheng Hu <jia.jiachenghu@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d4fc3640
    • P
      block, bfq: use half slice_idle as a threshold to check short ttime · b5f74eca
      Paolo Valente 提交于
      The value of the I/O plugging (idling) timeout is used also as the
      think-time threshold to decide whether a process has a short think
      time.  In this respect, a good value of this timeout for rotational
      drives is un the order of several ms. Yet, this is often too long a
      time interval to be effective as a think-time threshold. This commit
      mitigates this problem (by a lot, according to tests), by halving the
      threshold.
      Tested-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b5f74eca
  3. 06 1月, 2021 1 次提交
    • J
      bfq: Fix computation of shallow depth · 6d4d2735
      Jan Kara 提交于
      BFQ computes number of tags it allows to be allocated for each request type
      based on tag bitmap. However it uses 1 << bitmap.shift as number of
      available tags which is wrong. 'shift' is just an internal bitmap value
      containing logarithm of how many bits bitmap uses in each bitmap word.
      Thus number of tags allowed for some request types can be far to low.
      Use proper bitmap.depth which has the number of tags instead.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6d4d2735
  4. 09 9月, 2020 1 次提交
    • O
      block: only call sched requeue_request() for scheduled requests · e8a8a185
      Omar Sandoval 提交于
      Yang Yang reported the following crash caused by requeueing a flush
      request in Kyber:
      
        [    2.517297] Unable to handle kernel paging request at virtual address ffffffd8071c0b00
        ...
        [    2.517468] pc : clear_bit+0x18/0x2c
        [    2.517502] lr : sbitmap_queue_clear+0x40/0x228
        [    2.517503] sp : ffffff800832bc60 pstate : 00c00145
        ...
        [    2.517599] Process ksoftirqd/5 (pid: 51, stack limit = 0xffffff8008328000)
        [    2.517602] Call trace:
        [    2.517606]  clear_bit+0x18/0x2c
        [    2.517619]  kyber_finish_request+0x74/0x80
        [    2.517627]  blk_mq_requeue_request+0x3c/0xc0
        [    2.517637]  __scsi_queue_insert+0x11c/0x148
        [    2.517640]  scsi_softirq_done+0x114/0x130
        [    2.517643]  blk_done_softirq+0x7c/0xb0
        [    2.517651]  __do_softirq+0x208/0x3bc
        [    2.517657]  run_ksoftirqd+0x34/0x60
        [    2.517663]  smpboot_thread_fn+0x1c4/0x2c0
        [    2.517667]  kthread+0x110/0x120
        [    2.517669]  ret_from_fork+0x10/0x18
      
      This happens because Kyber doesn't track flush requests, so
      kyber_finish_request() reads a garbage domain token. Only call the
      scheduler's requeue_request() hook if RQF_ELVPRIV is set (like we do for
      the finish_request() hook in blk_mq_free_request()). Now that we're
      handling it in blk-mq, also remove the check from BFQ.
      Reported-by: NYang Yang <yang.yang@vivo.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e8a8a185
  5. 04 9月, 2020 2 次提交
  6. 24 8月, 2020 1 次提交
  7. 01 8月, 2020 1 次提交
  8. 30 5月, 2020 1 次提交
  9. 10 5月, 2020 1 次提交
  10. 22 3月, 2020 2 次提交
    • P
      block, bfq: turn put_queue into release_process_ref in __bfq_bic_change_cgroup · c8997736
      Paolo Valente 提交于
      A bfq_put_queue() may be invoked in __bfq_bic_change_cgroup(). The
      goal of this put is to release a process reference to a bfq_queue. But
      process-reference releases may trigger also some extra operation, and,
      to this goal, are handled through bfq_release_process_ref(). So, turn
      the invocation of bfq_put_queue() into an invocation of
      bfq_release_process_ref().
      
      Tested-by: cki-project@redhat.com
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c8997736
    • Z
      block, bfq: fix use-after-free in bfq_idle_slice_timer_body · 2f95fa5c
      Zhiqiang Liu 提交于
      In bfq_idle_slice_timer func, bfqq = bfqd->in_service_queue is
      not in bfqd-lock critical section. The bfqq, which is not
      equal to NULL in bfq_idle_slice_timer, may be freed after passing
      to bfq_idle_slice_timer_body. So we will access the freed memory.
      
      In addition, considering the bfqq may be in race, we should
      firstly check whether bfqq is in service before doing something
      on it in bfq_idle_slice_timer_body func. If the bfqq in race is
      not in service, it means the bfqq has been expired through
      __bfq_bfqq_expire func, and wait_request flags has been cleared in
      __bfq_bfqd_reset_in_service func. So we do not need to re-clear the
      wait_request of bfqq which is not in service.
      
      KASAN log is given as follows:
      [13058.354613] ==================================================================
      [13058.354640] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0xac/0x290
      [13058.354644] Read of size 8 at addr ffffa02cf3e63f78 by task fork13/19767
      [13058.354646]
      [13058.354655] CPU: 96 PID: 19767 Comm: fork13
      [13058.354661] Call trace:
      [13058.354667]  dump_backtrace+0x0/0x310
      [13058.354672]  show_stack+0x28/0x38
      [13058.354681]  dump_stack+0xd8/0x108
      [13058.354687]  print_address_description+0x68/0x2d0
      [13058.354690]  kasan_report+0x124/0x2e0
      [13058.354697]  __asan_load8+0x88/0xb0
      [13058.354702]  bfq_idle_slice_timer+0xac/0x290
      [13058.354707]  __hrtimer_run_queues+0x298/0x8b8
      [13058.354710]  hrtimer_interrupt+0x1b8/0x678
      [13058.354716]  arch_timer_handler_phys+0x4c/0x78
      [13058.354722]  handle_percpu_devid_irq+0xf0/0x558
      [13058.354731]  generic_handle_irq+0x50/0x70
      [13058.354735]  __handle_domain_irq+0x94/0x110
      [13058.354739]  gic_handle_irq+0x8c/0x1b0
      [13058.354742]  el1_irq+0xb8/0x140
      [13058.354748]  do_wp_page+0x260/0xe28
      [13058.354752]  __handle_mm_fault+0x8ec/0x9b0
      [13058.354756]  handle_mm_fault+0x280/0x460
      [13058.354762]  do_page_fault+0x3ec/0x890
      [13058.354765]  do_mem_abort+0xc0/0x1b0
      [13058.354768]  el0_da+0x24/0x28
      [13058.354770]
      [13058.354773] Allocated by task 19731:
      [13058.354780]  kasan_kmalloc+0xe0/0x190
      [13058.354784]  kasan_slab_alloc+0x14/0x20
      [13058.354788]  kmem_cache_alloc_node+0x130/0x440
      [13058.354793]  bfq_get_queue+0x138/0x858
      [13058.354797]  bfq_get_bfqq_handle_split+0xd4/0x328
      [13058.354801]  bfq_init_rq+0x1f4/0x1180
      [13058.354806]  bfq_insert_requests+0x264/0x1c98
      [13058.354811]  blk_mq_sched_insert_requests+0x1c4/0x488
      [13058.354818]  blk_mq_flush_plug_list+0x2d4/0x6e0
      [13058.354826]  blk_flush_plug_list+0x230/0x548
      [13058.354830]  blk_finish_plug+0x60/0x80
      [13058.354838]  read_pages+0xec/0x2c0
      [13058.354842]  __do_page_cache_readahead+0x374/0x438
      [13058.354846]  ondemand_readahead+0x24c/0x6b0
      [13058.354851]  page_cache_sync_readahead+0x17c/0x2f8
      [13058.354858]  generic_file_buffered_read+0x588/0xc58
      [13058.354862]  generic_file_read_iter+0x1b4/0x278
      [13058.354965]  ext4_file_read_iter+0xa8/0x1d8 [ext4]
      [13058.354972]  __vfs_read+0x238/0x320
      [13058.354976]  vfs_read+0xbc/0x1c0
      [13058.354980]  ksys_read+0xdc/0x1b8
      [13058.354984]  __arm64_sys_read+0x50/0x60
      [13058.354990]  el0_svc_common+0xb4/0x1d8
      [13058.354994]  el0_svc_handler+0x50/0xa8
      [13058.354998]  el0_svc+0x8/0xc
      [13058.354999]
      [13058.355001] Freed by task 19731:
      [13058.355007]  __kasan_slab_free+0x120/0x228
      [13058.355010]  kasan_slab_free+0x10/0x18
      [13058.355014]  kmem_cache_free+0x288/0x3f0
      [13058.355018]  bfq_put_queue+0x134/0x208
      [13058.355022]  bfq_exit_icq_bfqq+0x164/0x348
      [13058.355026]  bfq_exit_icq+0x28/0x40
      [13058.355030]  ioc_exit_icq+0xa0/0x150
      [13058.355035]  put_io_context_active+0x250/0x438
      [13058.355038]  exit_io_context+0xd0/0x138
      [13058.355045]  do_exit+0x734/0xc58
      [13058.355050]  do_group_exit+0x78/0x220
      [13058.355054]  __wake_up_parent+0x0/0x50
      [13058.355058]  el0_svc_common+0xb4/0x1d8
      [13058.355062]  el0_svc_handler+0x50/0xa8
      [13058.355066]  el0_svc+0x8/0xc
      [13058.355067]
      [13058.355071] The buggy address belongs to the object at ffffa02cf3e63e70#012 which belongs to the cache bfq_queue of size 464
      [13058.355075] The buggy address is located 264 bytes inside of#012 464-byte region [ffffa02cf3e63e70, ffffa02cf3e64040)
      [13058.355077] The buggy address belongs to the page:
      [13058.355083] page:ffff7e80b3cf9800 count:1 mapcount:0 mapping:ffff802db5c90780 index:0xffffa02cf3e606f0 compound_mapcount: 0
      [13058.366175] flags: 0x2ffffe0000008100(slab|head)
      [13058.370781] raw: 2ffffe0000008100 ffff7e80b53b1408 ffffa02d730c1c90 ffff802db5c90780
      [13058.370787] raw: ffffa02cf3e606f0 0000000000370023 00000001ffffffff 0000000000000000
      [13058.370789] page dumped because: kasan: bad access detected
      [13058.370791]
      [13058.370792] Memory state around the buggy address:
      [13058.370797]  ffffa02cf3e63e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb
      [13058.370801]  ffffa02cf3e63e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370805] >ffffa02cf3e63f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370808]                                                                 ^
      [13058.370811]  ffffa02cf3e63f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370815]  ffffa02cf3e64000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      [13058.370817] ==================================================================
      [13058.370820] Disabling lock debugging due to kernel taint
      
      Here, we directly pass the bfqd to bfq_idle_slice_timer_body func.
      --
      V2->V3: rewrite the comment as suggested by Paolo Valente
      V1->V2: add one comment, and add Fixes and Reported-by tag.
      
      Fixes: aee69d78 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
      Acked-by: NPaolo Valente <paolo.valente@linaro.org>
      Reported-by: NWang Wang <wangwang2@huawei.com>
      Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Signed-off-by: NFeilong Lin <linfeilong@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2f95fa5c
  11. 03 2月, 2020 5 次提交
  12. 23 1月, 2020 1 次提交
  13. 14 11月, 2019 1 次提交
  14. 08 11月, 2019 1 次提交
    • T
      bfq-iosched: stop using blkg->stat_bytes and ->stat_ios · fd41e603
      Tejun Heo 提交于
      When used on cgroup1, bfq uses the blkg->stat_bytes and ->stat_ios
      from blk-cgroup core to populate six stat knobs.  blk-cgroup core is
      moving away from blkg_rwstat to improve scalability and won't be able
      to support this usage.
      
      It isn't like the sharing gains all that much.  Let's break it out to
      dedicated rwstat counters which are updated when on cgroup1.  This
      makes use of bfqg_*rwstat*() helpers outside of
      CONFIG_BFQ_CGROUP_DEBUG.  Move them out.
      
      v2: Compile fix when !CONFIG_BFQ_CGROUP_DEBUG.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Paolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fd41e603
  15. 18 9月, 2019 4 次提交
  16. 08 8月, 2019 3 次提交
  17. 18 7月, 2019 1 次提交
    • P
      block, bfq: check also in-flight I/O in dispatch plugging · b5e02b48
      Paolo Valente 提交于
      Consider a sync bfq_queue Q that remains empty while in service, and
      suppose that, when this happens, there is a fair amount of already
      in-flight I/O not belonging to Q. In such a situation, I/O dispatching
      may need to be plugged (until new I/O arrives for Q), for the
      following reason.
      
      The drive may decide to serve in-flight non-Q's I/O requests before
      Q's ones, thereby delaying the arrival of new I/O requests for Q
      (recall that Q is sync). If I/O-dispatching is not plugged, then,
      while Q remains empty, a basically uncontrolled amount of I/O from
      other queues may be dispatched too, possibly causing the service of
      Q's I/O to be delayed even longer in the drive. This problem gets more
      and more serious as the speed and the queue depth of the drive grow,
      because, as these two quantities grow, the probability to find no
      queue busy but many requests in flight grows too.
      
      If Q has the same weight and priority as the other queues, then the
      above delay is unlikely to cause any issue, because all queues tend to
      undergo the same treatment. So, since not plugging I/O dispatching is
      convenient for throughput, it is better not to plug. Things change in
      case Q has a higher weight or priority than some other queue, because
      Q's service guarantees may simply be violated. For this reason,
      commit 1de0c4cd ("block, bfq: reduce idling only in symmetric
      scenarios") does plug I/O in such an asymmetric scenario. Plugging
      minimizes the delay induced by already in-flight I/O, and enables Q to
      recover the bandwidth it may lose because of this delay.
      
      Yet the above commit does not cover the case of weight-raised queues,
      for efficiency concerns. For weight-raised queues, I/O-dispatch
      plugging is activated simply if not all bfq_queues are
      weight-raised. But this check does not handle the case of in-flight
      requests, because a bfq_queue may become non busy *before* all its
      in-flight requests are completed.
      
      This commit performs I/O-dispatch plugging for weight-raised queues if
      there are some in-flight requests.
      
      As a practical example of the resulting recover of control, under
      write load on a Samsung SSD 970 PRO, gnome-terminal starts in 1.5
      seconds after this fix, against 15 seconds before the fix (as a
      reference, gnome-terminal takes about 35 seconds to start with any of
      the other I/O schedulers).
      
      Fixes: 1de0c4cd ("block, bfq: reduce idling only in symmetric scenarios")
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b5e02b48
  18. 15 7月, 2019 1 次提交
  19. 28 6月, 2019 1 次提交
    • D
      block, bfq: NULL out the bic when it's no longer valid · dbc3117d
      Douglas Anderson 提交于
      In reboot tests on several devices we were seeing a "use after free"
      when slub_debug or KASAN was enabled.  The kernel complained about:
      
        Unable to handle kernel paging request at virtual address 6b6b6c2b
      
      ...which is a classic sign of use after free under slub_debug.  The
      stack crawl in kgdb looked like:
      
       0  test_bit (addr=<optimized out>, nr=<optimized out>)
       1  bfq_bfqq_busy (bfqq=<optimized out>)
       2  bfq_select_queue (bfqd=<optimized out>)
       3  __bfq_dispatch_request (hctx=<optimized out>)
       4  bfq_dispatch_request (hctx=<optimized out>)
       5  0xc056ef00 in blk_mq_do_dispatch_sched (hctx=0xed249440)
       6  0xc056f728 in blk_mq_sched_dispatch_requests (hctx=0xed249440)
       7  0xc0568d24 in __blk_mq_run_hw_queue (hctx=0xed249440)
       8  0xc0568d94 in blk_mq_run_work_fn (work=<optimized out>)
       9  0xc024c5c4 in process_one_work (worker=0xec6d4640, work=0xed249480)
       10 0xc024cff4 in worker_thread (__worker=0xec6d4640)
      
      Digging in kgdb, it could be found that, though bfqq looked fine,
      bfqq->bic had been freed.
      
      Through further digging, I postulated that perhaps it is illegal to
      access a "bic" (AKA an "icq") after bfq_exit_icq() had been called
      because the "bic" can be freed at some point in time after this call
      is made.  I confirmed that there certainly were cases where the exact
      crashing code path would access the "bic" after bfq_exit_icq() had
      been called.  Sspecifically I set the "bfqq->bic" to (void *)0x7 and
      saw that the bic was 0x7 at the time of the crash.
      
      To understand a bit more about why this crash was fairly uncommon (I
      saw it only once in a few hundred reboots), you can see that much of
      the time bfq_exit_icq_fbqq() fully frees the bfqq and thus it can't
      access the ->bic anymore.  The only case it doesn't is if
      bfq_put_queue() sees a reference still held.
      
      However, even in the case when bfqq isn't freed, the crash is still
      rare.  Why?  I tracked what happened to the "bic" after the exit
      routine.  It doesn't get freed right away.  Rather,
      put_io_context_active() eventually called put_io_context() which
      queued up freeing on a workqueue.  The freeing then actually happened
      later than that through call_rcu().  Despite all these delays, some
      extra debugging showed that all the hoops could be jumped through in
      time and the memory could be freed causing the original crash.  Phew!
      
      To make a long story short, assuming it truly is illegal to access an
      icq after the "exit_icq" callback is finished, this patch is needed.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NPaolo Valente <paolo.valente@unimore.it>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dbc3117d