1. 25 9月, 2018 6 次提交
  2. 22 9月, 2018 11 次提交
  3. 21 9月, 2018 1 次提交
  4. 20 9月, 2018 1 次提交
  5. 15 9月, 2018 3 次提交
    • P
      blok, bfq: do not plug I/O if all queues are weight-raised · c8765de0
      Paolo Valente 提交于
      To reduce latency for interactive and soft real-time applications, bfq
      privileges the bfq_queues containing the I/O of these
      applications. These privileged queues, referred-to as weight-raised
      queues, get a much higher share of the device throughput
      w.r.t. non-privileged queues. To preserve this higher share, the I/O
      of any non-weight-raised queue must be plugged whenever a sync
      weight-raised queue, while being served, remains temporarily empty. To
      attain this goal, bfq simply plugs any I/O (from any queue), if a sync
      weight-raised queue remains empty while in service.
      
      Unfortunately, this plugging typically lowers throughput with random
      I/O, on devices with internal queueing (because it reduces the filling
      level of the internal queues of the device).
      
      This commit addresses this issue by restricting the cases where
      plugging is performed: if a sync weight-raised queue remains empty
      while in service, then I/O plugging is performed only if some of the
      active bfq_queues are *not* weight-raised (which is actually the only
      circumstance where plugging is needed to preserve the higher share of
      the throughput of weight-raised queues). This restriction proved able
      to boost throughput in really many use cases needing only maximum
      throughput.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c8765de0
    • P
      block, bfq: inject other-queue I/O into seeky idle queues on NCQ flash · d0edc247
      Paolo Valente 提交于
      The Achilles' heel of BFQ is its failing to reach a high throughput
      with sync random I/O on flash storage with internal queueing, in case
      the processes doing I/O have differentiated weights.
      
      The cause of this failure is as follows. If at least two processes do
      sync I/O, and have a different weight from each other, then BFQ plugs
      I/O dispatching every time one of these processes, while it is being
      served, remains temporarily without pending I/O requests. This
      plugging is necessary to guarantee that every process enjoys a
      bandwidth proportional to its weight; but it empties the internal
      queue(s) of the drive. And this kills throughput with random I/O. So,
      if some processes have differentiated weights and do both sync and
      random I/O, the end result is a throughput collapse.
      
      This commit tries to counter this problem by injecting the service of
      other processes, in a controlled way, while the process in service
      happens to have no I/O. This injection is performed only if the medium
      is non rotational and performs internal queueing, and the process in
      service does random I/O (service injection might be beneficial for
      sequential I/O too, we'll work on that).
      
      As an example of the benefits of this commit, on a PLEXTOR PX-256M5S
      SSD, and with five processes having differentiated weights and doing
      sync random 4KB I/O, this commit makes the throughput with bfq grow by
      400%, from 25 to 100MB/s. This higher throughput is 10MB/s lower than
      that reached with none. As some less random I/O is added to the mix,
      the throughput becomes equal to or higher than that with none.
      
      This commit is a very first attempt to recover throughput without
      losing control, and certainly has many limitations. One is, e.g., that
      the processes whose service is injected are not chosen so as to
      distribute the extra bandwidth they receive in accordance to their
      weights. Thus there might be loss of weighted fairness in some
      cases. Anyway, this loss concerns extra service, which would not have
      been received at all without this commit. Other limitations and issues
      will probably show up with usage.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d0edc247
    • P
      block, bfq: correctly charge and reset entity service in all cases · cbeb869a
      Paolo Valente 提交于
      BFQ schedules entities (which represent either per-process queues or
      groups of queues) as a function of their timestamps. In particular, as
      a function of their (virtual) finish times. The finish time of an
      entity is computed as a function of the budget assigned to the entity,
      assuming, tentatively, that the entity, once in service, will receive
      an amount of service equal to its budget. Then, when the entity is
      expired because it finishes to be served, this finish time is updated
      as a function of the actual service received by the entity. This
      allows the entity to be correctly charged with only the service
      received, and then to be correctly re-scheduled.
      
      Yet an entity may receive service also while not being the entity in
      service (in the scheduling environment of its parent entity), for
      several reasons. If the entity remains with no backlog while receiving
      this 'unofficial' service, then it is expired. Also on such an
      expiration, the finish time of the entity should be updated to account
      for only the service actually received by the entity. Unfortunately,
      such an update is not performed for an entity expiring without being
      the entity in service.
      
      In a similar vein, the service counter of the entity in service is
      reset when the entity is expired, to be ready to be used for next
      service cycle. This reset too should be performed also in case an
      entity is expired because it remains empty after receiving service
      while not being the entity in service. But in this case the reset is
      not performed.
      
      This commit performs the above update of the finish time and reset of
      the service received, also for an entity expiring while not being the
      entity in service.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cbeb869a
  6. 14 9月, 2018 1 次提交
  7. 07 9月, 2018 2 次提交
  8. 06 9月, 2018 1 次提交
  9. 01 9月, 2018 3 次提交
    • D
      blkcg: use tryget logic when associating a blkg with a bio · 31118850
      Dennis Zhou (Facebook) 提交于
      There is a very small change a bio gets caught up in a really
      unfortunate race between a task migration, cgroup exiting, and itself
      trying to associate with a blkg. This is due to css offlining being
      performed after the css->refcnt is killed which triggers removal of
      blkgs that reach their blkg->refcnt of 0.
      
      To avoid this, association with a blkg should use tryget and fallback to
      using the root_blkg.
      
      Fixes: 08e18eab ("block: add bi_blkg to the bio for cgroups")
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
      Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      31118850
    • D
      blkcg: delay blkg destruction until after writeback has finished · 59b57717
      Dennis Zhou (Facebook) 提交于
      Currently, blkcg destruction relies on a sequence of events:
        1. Destruction starts. blkcg_css_offline() is called and blkgs
           release their reference to the blkcg. This immediately destroys
           the cgwbs (writeback).
        2. With blkgs giving up their reference, the blkcg ref count should
           become zero and eventually call blkcg_css_free() which finally
           frees the blkcg.
      
      Jiufei Xue reported that there is a race between blkcg_bio_issue_check()
      and cgroup_rmdir(). To remedy this, blkg destruction becomes contingent
      on the completion of all writeback associated with the blkcg. A count of
      the number of cgwbs is maintained and once that goes to zero, blkg
      destruction can follow. This should prevent premature blkg destruction
      related to writeback.
      
      The new process for blkcg cleanup is as follows:
        1. Destruction starts. blkcg_css_offline() is called which offlines
           writeback. Blkg destruction is delayed on the cgwb_refcnt count to
           avoid punting potentially large amounts of outstanding writeback
           to root while maintaining any ongoing policies. Here, the base
           cgwb_refcnt is put back.
        2. When the cgwb_refcnt becomes zero, blkcg_destroy_blkgs() is called
           and handles destruction of blkgs. This is where the css reference
           held by each blkg is released.
        3. Once the blkcg ref count goes to zero, blkcg_css_free() is called.
           This finally frees the blkg.
      
      It seems in the past blk-throttle didn't do the most understandable
      things with taking data from a blkg while associating with current. So,
      the simplification and unification of what blk-throttle is doing caused
      this.
      
      Fixes: 08e18eab ("block: add bi_blkg to the bio for cgroups")
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
      Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      59b57717
    • D
      Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()" · 6b065462
      Dennis Zhou (Facebook) 提交于
      This reverts commit 4c699480.
      
      Destroying blkgs is tricky because of the nature of the relationship. A
      blkg should go away when either a blkcg or a request_queue goes away.
      However, blkg's pin the blkcg to ensure they remain valid. To break this
      cycle, when a blkcg is offlined, blkgs put back their css ref. This
      eventually lets css_free() get called which frees the blkcg.
      
      The above commit (4c699480) breaks this order of events by trying to
      destroy blkgs in css_free(). As the blkgs still hold references to the
      blkcg, css_free() is never called.
      
      The race between blkcg_bio_issue_check() and cgroup_rmdir() will be
      addressed in the following patch by delaying destruction of a blkg until
      all writeback associated with the blkcg has been finished.
      
      Fixes: 4c699480 ("blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()")
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
      Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6b065462
  10. 28 8月, 2018 5 次提交
  11. 23 8月, 2018 4 次提交
  12. 21 8月, 2018 2 次提交
    • J
      blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter · f5bbbbe4
      Jianchao Wang 提交于
      For blk-mq, part_in_flight/rw will invoke blk_mq_in_flight/rw to
      account the inflight requests. It will access the queue_hw_ctx and
      nr_hw_queues w/o any protection. When updating nr_hw_queues and
      blk_mq_in_flight/rw occur concurrently, panic comes up.
      
      Before update nr_hw_queues, the q will be frozen. So we could use
      q_usage_counter to avoid the race. percpu_ref_is_zero is used here
      so that we will not miss any in-flight request. The access to
      nr_hw_queues and queue_hw_ctx in blk_mq_queue_tag_busy_iter are
      under rcu critical section, __blk_mq_update_nr_hw_queues could use
      synchronize_rcu to ensure the zeroed q_usage_counter to be globally
      visible.
      Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f5bbbbe4
    • J
      blk-mq: init hctx sched after update ctx and hctx mapping · d48ece20
      Jianchao Wang 提交于
      Currently, when update nr_hw_queues, IO scheduler's init_hctx will
      be invoked before the mapping between ctx and hctx is adapted
      correctly by blk_mq_map_swqueue. The IO scheduler init_hctx (kyber)
      may depend on this mapping and get wrong result and panic finally.
      A simply way to fix this is that switch the IO scheduler to 'none'
      before update the nr_hw_queues, and then switch it back after
      update nr_hw_queues. blk_mq_sched_init_/exit_hctx are removed due
      to nobody use them any more.
      Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d48ece20