1. 01 2月, 2019 5 次提交
    • P
      block, bfq: consider also ioprio classes in symmetry detection · 73d58118
      Paolo Valente 提交于
      In asymmetric scenarios, i.e., when some bfq_queue or bfq_group needs to
      be guaranteed a different bandwidth than other bfq_queues or bfq_groups,
      these service guaranteed can be provided only by plugging I/O dispatch,
      completely or partially, when the queue in service remains temporarily
      empty. A case where asymmetry is particularly strong is when some active
      bfq_queues belong to a higher-priority class than some other active
      bfq_queues. Unfortunately, this important case is not considered at all
      in the code for detecting asymmetric scenarios. This commit adds the
      missing logic.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      73d58118
    • P
      block, bfq: remove case of redirected bic from insert_request · 03e565e4
      Paolo Valente 提交于
      Before commit 18e5a57d ("block, bfq: postpone rq preparation to
      insert or merge"), the destination queue for a request was chosen by a
      different hook than the one that then inserted the request. So, between
      the execution of the two hooks, the bic of the process generating the
      request could happen to be redirected to a different bfq_queue. As a
      consequence, the destination bfq_queue stored in the request could be
      wrong. Such an event does not need to ba handled any longer.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      03e565e4
    • P
      block, bfq: make sure queue budgets are not below service received · f3218ad8
      Paolo Valente 提交于
      With some unlucky sequences of events, the function bfq_updated_next_req
      updates the current budget of a bfq_queue to a lower value than the
      service received by the queue using such a budget. Unfortunately, if
      this happens, then the return value of the function bfq_bfqq_budget_left
      becomes inconsistent. This commit solves this problem by lower-bounding
      the budget computed in bfq_updated_next_req to the service currently
      charged to the queue.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f3218ad8
    • P
      block, bfq: avoid selecting a queue w/o budget · 218cb897
      Paolo Valente 提交于
      To boost throughput on devices with internal queueing and in scenarios
      where device idling is not strictly needed, bfq immediately starts
      serving a new bfq_queue if the in-service bfq_queue remains without
      pending I/O, even if new I/O may arrive soon for the latter queue. Then,
      if such I/O actually arrives soon, bfq preempts the new in-service
      bfq_queue so as to give the previous queue a chance to go on being
      served (in case the previous queue should actually be the one to be
      served, according to its timestamps).
      
      However, the in-service bfq_queue, say Q, may also be without further
      budget when it remains also pending I/O. Since bfq changes budgets
      dynamically to fit the needs of bfq_queues, this happens more often than
      one may expect. If this happens, then there is no point in trying to go
      on serving Q when new I/O arrives for it soon: Q would be expired
      immediately after being selected for service. This would only cause
      useless overhead. This commit avoids such a useless selection.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      218cb897
    • P
      block, bfq: do not consider interactive queues in srt filtering · 20cd3245
      Paolo Valente 提交于
      The speed at which a bfq_queue receives I/O is one of the parameters by
      which bfq decides whether the queue is soft real-time (i.e., whether the
      queue contains the I/O of a soft real-time application). In particular,
      when a bfq_queue remains without outstanding I/O requests, bfq computes
      the minimum time instant, named soft_rt_next_start, at which the next
      request of the queue may arrive for the queue to be deemed as soft real
      time.
      
      Unfortunately this filtering may cause problems with a queue in
      interactive weight raising. In fact, such a queue may be conveying the
      I/O needed to load a soft real-time application. The latter will
      actually exhibit a soft real-time I/O pattern after it finally starts
      doing its job. But, if soft_rt_next_start is updated for an interactive
      bfq_queue, and the queue has received a lot of service before remaining
      with no outstanding request (likely to happen on a fast device), then
      soft_rt_next_start is assigned such a high value that, for a very long
      time, the queue is prevented from being possibly considered as soft real
      time.
      
      This commit removes the updating of soft_rt_next_start for bfq_queues in
      interactive weight raising.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      20cd3245
  2. 08 12月, 2018 1 次提交
    • D
      blkcg: fix ref count issue with bio_blkcg() using task_css · 0fe061b9
      Dennis Zhou 提交于
      The bio_blkcg() function turns out to be inconsistent and consequently
      dangerous to use. The first part returns a blkcg where a reference is
      owned by the bio meaning it does not need to be rcu protected. However,
      the third case, the last line, is problematic:
      
      	return css_to_blkcg(task_css(current, io_cgrp_id));
      
      This can race against task migration and the cgroup dying. It is also
      semantically different as it must be called rcu protected and is
      susceptible to failure when trying to get a reference to it.
      
      This patch adds association ahead of calling bio_blkcg() rather than
      after. This makes association a required and explicit step along the
      code paths for calling bio_blkcg(). In blk-iolatency, association is
      moved above the bio_blkcg() call to ensure it will not return %NULL.
      
      BFQ uses the old bio_blkcg() function, but I do not want to address it
      in this series due to the complexity. I have created a private version
      documenting the inconsistency and noting not to use it.
      Signed-off-by: NDennis Zhou <dennis@kernel.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0fe061b9
  3. 07 12月, 2018 1 次提交
    • P
      block, bfq: fix decrement of num_active_groups · ba7aeae5
      Paolo Valente 提交于
      Since commit '2d29c9f8 ("block, bfq: improve asymmetric scenarios
      detection")', if there are process groups with I/O requests waiting for
      completion, then BFQ tags the scenario as 'asymmetric'. This detection
      is needed for preserving service guarantees (for details, see comments
      on the computation * of the variable asymmetric_scenario in the
      function bfq_better_to_idle).
      
      Unfortunately, commit '2d29c9f8 ("block, bfq: improve asymmetric
      scenarios detection")' contains an error exactly in the updating of
      the number of groups with I/O requests waiting for completion: if a
      group has more than one descendant process, then the above number of
      groups, which is renamed from num_active_groups to a more appropriate
      num_groups_with_pending_reqs by this commit, may happen to be wrongly
      decremented multiple times, namely every time one of the descendant
      processes gets all its pending I/O requests completed.
      
      A correct, complete solution should work as follows. Consider a group
      that is inactive, i.e., that has no descendant process with pending
      I/O inside BFQ queues. Then suppose that num_groups_with_pending_reqs
      is still accounting for this group, because the group still has some
      descendant process with some I/O request still in
      flight. num_groups_with_pending_reqs should be decremented when the
      in-flight request of the last descendant process is finally completed
      (assuming that nothing else has changed for the group in the meantime,
      in terms of composition of the group and active/inactive state of
      child groups and processes). To accomplish this, an additional
      pending-request counter must be added to entities, and must be
      updated correctly.
      
      To avoid this additional field and operations, this commit resorts to
      the following tradeoff between simplicity and accuracy: for an
      inactive group that is still counted in num_groups_with_pending_reqs,
      this commit decrements num_groups_with_pending_reqs when the first
      descendant process of the group remains with no request waiting for
      completion.
      
      This simplified scheme provides a fix to the unbalanced decrements
      introduced by 2d29c9f8. Since this error was also caused by lack
      of comments on this non-trivial issue, this commit also adds related
      comments.
      
      Fixes: 2d29c9f8 ("block, bfq: improve asymmetric scenarios detection")
      Reported-by: NSteven Barrett <steven@liquorix.net>
      Tested-by: NSteven Barrett <steven@liquorix.net>
      Tested-by: NLucjan Lucjanov <lucjan.lucjanov@gmail.com>
      Reviewed-by: NFederico Motta <federico@willer.it>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ba7aeae5
  4. 16 11月, 2018 1 次提交
  5. 08 11月, 2018 2 次提交
  6. 02 11月, 2018 1 次提交
  7. 14 10月, 2018 1 次提交
    • F
      block, bfq: improve asymmetric scenarios detection · 2d29c9f8
      Federico Motta 提交于
      bfq defines as asymmetric a scenario where an active entity, say E
      (representing either a single bfq_queue or a group of other entities),
      has a higher weight than some other entities.  If the entity E does sync
      I/O in such a scenario, then bfq plugs the dispatch of the I/O of the
      other entities in the following situation: E is in service but
      temporarily has no pending I/O request.  In fact, without this plugging,
      all the times that E stops being temporarily idle, it may find the
      internal queues of the storage device already filled with an
      out-of-control number of extra requests, from other entities. So E may
      have to wait for the service of these extra requests, before finally
      having its own requests served. This may easily break service
      guarantees, with E getting less than its fair share of the device
      throughput.  Usually, the end result is that E gets the same fraction of
      the throughput as the other entities, instead of getting more, according
      to its higher weight.
      
      Yet there are two other more subtle cases where E, even if its weight is
      actually equal to or even lower than the weight of any other active
      entities, may get less than its fair share of the throughput in case the
      above I/O plugging is not performed:
      1. other entities issue larger requests than E;
      2. other entities contain more active child entities than E (or in
         general tend to have more backlog than E).
      
      In the first case, other entities may get more service than E because
      they get larger requests, than those of E, served during the temporary
      idle periods of E.  In the second case, other entities get more service
      because, by having many child entities, they have many requests ready
      for dispatching while E is temporarily idle.
      
      This commit addresses this issue by extending the definition of
      asymmetric scenario: a scenario is asymmetric when
      - active entities representing bfq_queues have differentiated weights,
        as in the original definition
      or (inclusive)
      - one or more entities representing groups of entities are active.
      
      This broader definition makes sure that I/O plugging will be performed
      in all the above cases, provided that there is at least one active
      group.  Of course, this definition is very coarse, so it will trigger
      I/O plugging also in cases where it is not needed, such as, e.g.,
      multiple active entities with just one child each, and all with the same
      I/O-request size.  The reason for this coarse definition is just that a
      finer-grained definition would be rather heavy to compute.
      
      On the opposite end, even this new definition does not trigger I/O
      plugging in all cases where there is no active group, and all bfq_queues
      have the same weight.  So, in these cases some unfairness may occur if
      there are asymmetries in I/O-request sizes.  We made this choice because
      I/O plugging may lower throughput, and probably a user that has not
      created any group cares more about throughput than about perfect
      fairness.  At any rate, as for possible applications that may care about
      service guarantees, bfq already guarantees a high responsiveness and a
      low latency to soft real-time applications automatically.
      Signed-off-by: NFederico Motta <federico@willer.it>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2d29c9f8
  8. 22 9月, 2018 1 次提交
    • D
      blkcg: fix ref count issue with bio_blkcg using task_css · 27e6fa99
      Dennis Zhou (Facebook) 提交于
      The accessor function bio_blkcg either returns the blkcg associated with
      the bio or finds one in the current context. This can cause an issue
      when trying to associate a bio with a blkcg. Particularly, it's the
      third case that is problematic:
      
      	return css_to_blkcg(task_css(current, io_cgrp_id));
      
      As the above may race against task migration and the cgroup exiting, it
      is not always ok to take a reference on the blkcg returned from
      bio_blkcg.
      
      This patch adds association ahead of calling bio_blkcg rather than
      after. This makes association a required and explicit step along the
      code paths for calling bio_blkcg. blk_get_rl is modified as well to get
      a reference to the blkcg it may use and blk_put_rl will always put the
      reference back. Association is also moved above the bio_blkcg call to
      ensure it will not return NULL in blk-iolatency.
      
      BFQ and CFQ utilize this flaw, but due to the complexity, I do not want
      to address this in this series. I've created a private version of the
      function with notes not to use it describing the flaw. Hopefully soon,
      that code can be cleaned up.
      Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      27e6fa99
  9. 15 9月, 2018 2 次提交
    • P
      blok, bfq: do not plug I/O if all queues are weight-raised · c8765de0
      Paolo Valente 提交于
      To reduce latency for interactive and soft real-time applications, bfq
      privileges the bfq_queues containing the I/O of these
      applications. These privileged queues, referred-to as weight-raised
      queues, get a much higher share of the device throughput
      w.r.t. non-privileged queues. To preserve this higher share, the I/O
      of any non-weight-raised queue must be plugged whenever a sync
      weight-raised queue, while being served, remains temporarily empty. To
      attain this goal, bfq simply plugs any I/O (from any queue), if a sync
      weight-raised queue remains empty while in service.
      
      Unfortunately, this plugging typically lowers throughput with random
      I/O, on devices with internal queueing (because it reduces the filling
      level of the internal queues of the device).
      
      This commit addresses this issue by restricting the cases where
      plugging is performed: if a sync weight-raised queue remains empty
      while in service, then I/O plugging is performed only if some of the
      active bfq_queues are *not* weight-raised (which is actually the only
      circumstance where plugging is needed to preserve the higher share of
      the throughput of weight-raised queues). This restriction proved able
      to boost throughput in really many use cases needing only maximum
      throughput.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c8765de0
    • P
      block, bfq: inject other-queue I/O into seeky idle queues on NCQ flash · d0edc247
      Paolo Valente 提交于
      The Achilles' heel of BFQ is its failing to reach a high throughput
      with sync random I/O on flash storage with internal queueing, in case
      the processes doing I/O have differentiated weights.
      
      The cause of this failure is as follows. If at least two processes do
      sync I/O, and have a different weight from each other, then BFQ plugs
      I/O dispatching every time one of these processes, while it is being
      served, remains temporarily without pending I/O requests. This
      plugging is necessary to guarantee that every process enjoys a
      bandwidth proportional to its weight; but it empties the internal
      queue(s) of the drive. And this kills throughput with random I/O. So,
      if some processes have differentiated weights and do both sync and
      random I/O, the end result is a throughput collapse.
      
      This commit tries to counter this problem by injecting the service of
      other processes, in a controlled way, while the process in service
      happens to have no I/O. This injection is performed only if the medium
      is non rotational and performs internal queueing, and the process in
      service does random I/O (service injection might be beneficial for
      sequential I/O too, we'll work on that).
      
      As an example of the benefits of this commit, on a PLEXTOR PX-256M5S
      SSD, and with five processes having differentiated weights and doing
      sync random 4KB I/O, this commit makes the throughput with bfq grow by
      400%, from 25 to 100MB/s. This higher throughput is 10MB/s lower than
      that reached with none. As some less random I/O is added to the mix,
      the throughput becomes equal to or higher than that with none.
      
      This commit is a very first attempt to recover throughput without
      losing control, and certainly has many limitations. One is, e.g., that
      the processes whose service is injected are not chosen so as to
      distribute the extra bandwidth they receive in accordance to their
      weights. Thus there might be loss of weighted fairness in some
      cases. Anyway, this loss concerns extra service, which would not have
      been received at all without this commit. Other limitations and issues
      will probably show up with usage.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d0edc247
  10. 17 8月, 2018 2 次提交
    • P
      block, bfq: reduce write overcharge · d5801088
      Paolo Valente 提交于
      When a sync request is dispatched, the queue that contains that
      request, and all the ancestor entities of that queue, are charged with
      the number of sectors of the request. In constrast, if the request is
      async, then the queue and its ancestor entities are charged with the
      number of sectors of the request, multiplied by an overcharge
      factor. This throttles the bandwidth for async I/O, w.r.t. to sync
      I/O, and it is done to counter the tendency of async writes to steal
      I/O throughput to reads.
      
      On the opposite end, the lower this parameter, the stabler I/O
      control, in the following respect.  The lower this parameter is, the
      less the bandwidth enjoyed by a group decreases
      - when the group does writes, w.r.t. to when it does reads;
      - when other groups do reads, w.r.t. to when they do writes.
      
      The fixes "block, bfq: always update the budget of an entity when
      needed" and "block, bfq: readd missing reset of parent-entity service"
      improved I/O control in bfq to such an extent that it has been
      possible to revise this overcharge factor downwards.  This commit
      introduces the resulting, new value.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d5801088
    • P
      block, bfq: readd missing reset of parent-entity service · 8a511ba5
      Paolo Valente 提交于
      The received-service counter needs to be equal to 0 when an entity is
      set in service. Unfortunately, commit "block, bfq: fix service being
      wrongly set to zero in case of preemption" mistakenly removed the
      resetting of this counter for the parent entities of the bfq_queue
      being set in service. This commit fixes this issue by resetting
      service for parent entities, directly on the expiration of the
      in-service bfq_queue.
      
      Fixes: 9fae8dd5 ("block, bfq: fix service being wrongly set to zero in case of preemption")
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8a511ba5
  11. 09 7月, 2018 4 次提交
    • P
      block, bfq: give a better name to bfq_bfqq_may_idle · 277a4a9b
      Paolo Valente 提交于
      The actual goal of the function bfq_bfqq_may_idle is to tell whether
      it is better to perform device idling (more precisely: I/O-dispatch
      plugging) for the input bfq_queue, either to boost throughput or to
      preserve service guarantees. This commit improves the name of the
      function accordingly.
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      277a4a9b
    • P
      block, bfq: fix service being wrongly set to zero in case of preemption · 9fae8dd5
      Paolo Valente 提交于
      If
      - a bfq_queue Q preempts another queue, because one request of Q
      arrives in time,
      - but, after this preemption, Q is not the queue that is set in service,
      then Q->entity.service is set to 0 when Q is eventually set in
      service. But Q should have continued receiving service with its old
      budget (which is why preemption has occurred) and its old service.
      
      This commit addresses this issue by resetting service on queue real
      expiration.
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9fae8dd5
    • P
      block, bfq: do not expire a queue that will deserve dispatch plugging · 4420b095
      Paolo Valente 提交于
      For some bfq_queues, BFQ plugs I/O dispatching when the queue becomes
      idle, and keeps the plug until a new request of the queue arrives, or
      a timeout fires. BFQ does so either to boost throughput or to preserve
      service guarantees for the queue.
      
      More precisely, for such a queue, plugging starts when the queue
      happens to have either no request enqueued, or no request in flight,
      that is, no request already dispatched but not yet completed.
      
      On the opposite end, BFQ may happen to expire a queue with no request
      enqueued, without doing any plugging, if the queue still has some
      request in flight. Unfortunately, such a premature expiration causes
      the queue to lose its chance to enjoy dispatch plugging a moment
      later, i.e., when its in-flight requests finally get completed. This
      breaks service guarantees for the queue.
      
      This commit prevents BFQ from expiring an empty queue if the latter
      still has in-flight requests.
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4420b095
    • P
      block, bfq: add/remove entity weights correctly · 0471559c
      Paolo Valente 提交于
      To keep I/O throughput high as often as possible, BFQ performs
      I/O-dispatch plugging (aka device idling) only when beneficial exactly
      for throughput, or when needed for service guarantees (low latency,
      fairness). An important case where the latter condition holds is when
      the scenario is 'asymmetric' in terms of weights: i.e., when some
      bfq_queue or whole group of queues has a higher weight, and thus has
      to receive more service, than other queues or groups. Without dispatch
      plugging, lower-weight queues/groups may unjustly steal bandwidth to
      higher-weight queues/groups.
      
      To detect asymmetric scenarios, BFQ checks some sufficient
      conditions. One of these conditions is that active groups have
      different weights. BFQ controls this condition by maintaining a
      special set of unique weights of active groups
      (group_weights_tree). To this purpose, in the function
      bfq_active_insert/bfq_active_extract BFQ adds/removes the weight of a
      group to/from this set.
      
      Unfortunately, the function bfq_active_extract may happen to be
      invoked also for a group that is still active (to preserve the correct
      update of the next queue to serve, see comments in function
      bfq_no_longer_next_in_service() for details). In this case, removing
      the weight of the group makes the set group_weights_tree
      inconsistent. Service-guarantee violations follow.
      
      This commit addresses this issue by moving group_weights_tree
      insertions from their previous location (in bfq_active_insert) into
      the function __bfq_activate_entity, and by moving group_weights_tree
      extractions from bfq_active_extract to when the entity that represents
      a group remains throughly idle, i.e., with no request either enqueued
      or dispatched.
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0471559c
  12. 31 5月, 2018 7 次提交
    • D
      block, bfq: prevent soft_rt_next_start from being stuck at infinity · f6c3ca0e
      Davide Sapienza 提交于
      BFQ can deem a bfq_queue as soft real-time only if the queue
      - periodically becomes completely idle, i.e., empty and with
        no still-outstanding I/O request;
      - after becoming idle, gets new I/O only after a special reference
        time soft_rt_next_start.
      
      In this respect, after commit "block, bfq: consider also past I/O in
      soft real-time detection", the value of soft_rt_next_start can never
      decrease. This causes a problem with the following special updating
      case for soft_rt_next_start: to prevent queues that are not completely
      idle to be wrongly detected as soft real-time (when they become
      non-empty again), soft_rt_next_start is temporarily set to infinity
      for empty queues with still outstanding I/O requests. But, if such an
      update is actually performed, then, because of the above commit,
      soft_rt_next_start will be stuck at infinity forever, and the queue
      will have no more chance to be considered soft real-time.
      
      On slow systems, this problem does cause actual soft real-time
      applications to be occasionally not detected as such.
      
      This commit addresses this issue by eliminating the pushing of
      soft_rt_next_start to infinity, and by changing the way non-empty
      queues are prevented from being wrongly detected as soft
      real-time. Simply, a queue that becomes non-empty again can now be
      detected as soft real-time only if it has no outstanding I/O request.
      Signed-off-by: NDavide Sapienza <sapienza.dav@gmail.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f6c3ca0e
    • D
      block, bfq: increase weight-raising duration for interactive apps · d450542e
      Davide Sapienza 提交于
      The maximum possible duration of the weight-raising period for
      interactive applications is limited to 13 seconds, as this is the time
      needed to load the largest application that we considered when tuning
      weight raising. Unfortunately, in such an evaluation, we did not
      consider the case of very slow virtual machines.
      
      For example, on a QEMU/KVM virtual machine
      - running in a slow PC;
      - with a virtual disk stacked on a slow low-end 5400rpm HDD;
      - serving a heavy I/O workload, such as the sequential reading of
      several files;
      mplayer takes 23 seconds to start, if constantly weight-raised.
      
      To address this issue, this commit conservatively sets the upper limit
      for weight-raising duration to 25 seconds.
      Signed-off-by: NDavide Sapienza <sapienza.dav@gmail.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d450542e
    • P
      block, bfq: remove slow-system class · e24f1c24
      Paolo Valente 提交于
      BFQ computes the duration of weight raising for interactive
      applications automatically, using some reference parameters. In
      particular, BFQ uses the best durations (see comments in the code for
      how these durations have been assessed) for two classes of systems:
      slow and fast ones. Examples of slow systems are old phones or systems
      using micro HDDs. Fast systems are all the remaining ones. Using these
      parameters, BFQ computes the actual duration of the weight raising,
      for the system at hand, as a function of the relative speed of the
      system w.r.t. the speed of a reference system, belonging to the same
      class of systems as the system at hand.
      
      This slow vs fast differentiation proved to be useful in the past, but
      happens to have little meaning with current hardware. Even worse, it
      does cause problems in virtual systems, where the speed of the system
      can vary frequently, and so widely to just confuse the class-detection
      mechanism, and, as we have verified experimentally, to cause BFQ to
      compute non-sensical weight-raising durations.
      
      This commit addresses this issue by removing the slow class and the
      class-detection mechanism.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e24f1c24
    • P
      block, bfq: add description of weight-raising heuristics · 4029eef1
      Paolo Valente 提交于
      A description of how weight raising works is missing in BFQ
      sources. In addition, the code for handling weight raising is
      scattered across a few functions. This makes it rather hard to
      understand the mechanism and its rationale. This commits adds such a
      description at the beginning of the main source file.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4029eef1
    • F
      block, bfq: remove the removal of 'next' rq in bfq_requests_merged · ac857e0d
      Filippo Muzzini 提交于
      Since bfq_finish_request() is always called on the request 'next',
      after bfq_requests_merged() is finished, and bfq_finish_request()
      removes 'next' from its bfq_queue if needed, it isn't necessary to do
      such a removal in advance in bfq_merged_requests().
      
      This commit removes such a useless 'next' removal.
      Signed-off-by: NFilippo Muzzini <filippo.muzzini@outlook.it>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ac857e0d
    • P
      block, bfq: remove wrong check in bfq_requests_merged · 8abfa4d6
      Paolo Valente 提交于
      The request rq passed to the function bfq_requests_merged is always in
      a bfq_queue, so the check !RB_EMPTY_NODE(&rq->rb_node) at the
      beginning of bfq_requests_merged always succeeds, and the control
      flow systematically skips to the end of the function.  This implies
      that the body of the function is never executed, i.e., the
      repositioning of rq is never performed.
      
      On the opposite end, a control is missing in the body of the function:
      'next' must be removed only if it is inside a bfq_queue.
      
      This commit removes the wrong check on rq, and adds the missing check
      on 'next'. In addition, this commit adds comments on
      bfq_requests_merged.
      Signed-off-by: NFilippo Muzzini <filippo.muzzini@outlook.it>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8abfa4d6
    • F
      block, bfq: remove wrong lock in bfq_requests_merged · a12bffeb
      Filippo Muzzini 提交于
      In bfq_requests_merged(), there is a deadlock because the lock on
      bfqq->bfqd->lock is held by the calling function, but the code of
      this function tries to grab the lock again.
      
      This deadlock is currently hidden by another bug (fixed by next commit
      for this source file), which causes the body of bfq_requests_merged()
      to be never executed.
      
      This commit removes the deadlock by removing the lock/unlock pair.
      Signed-off-by: NFilippo Muzzini <filippo.muzzini@outlook.it>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a12bffeb
  13. 11 5月, 2018 5 次提交
  14. 09 5月, 2018 1 次提交
    • O
      block: consolidate struct request timestamp fields · 522a7775
      Omar Sandoval 提交于
      Currently, struct request has four timestamp fields:
      
      - A start time, set at get_request time, in jiffies, used for iostats
      - An I/O start time, set at start_request time, in ktime nanoseconds,
        used for blk-stats (i.e., wbt, kyber, hybrid polling)
      - Another start time and another I/O start time, used for cfq and bfq
      
      These can all be consolidated into one start time and one I/O start
      time, both in ktime nanoseconds, shaving off up to 16 bytes from struct
      request depending on the kernel config.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      522a7775
  15. 18 4月, 2018 1 次提交
  16. 27 3月, 2018 1 次提交
  17. 08 2月, 2018 1 次提交
    • P
      block, bfq: add requeue-request hook · a7877390
      Paolo Valente 提交于
      Commit 'a6a252e6 ("blk-mq-sched: decide how to handle flush rq via
      RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
      be re-inserted into the active I/O scheduler for that device. As a
      consequence, I/O schedulers may get the same request inserted again,
      even several times, without a finish_request invoked on that request
      before each re-insertion.
      
      This fact is the cause of the failure reported in [1]. For an I/O
      scheduler, every re-insertion of the same re-prepared request is
      equivalent to the insertion of a new request. For schedulers like
      mq-deadline or kyber, this fact causes no harm. In contrast, it
      confuses a stateful scheduler like BFQ, which keeps state for an I/O
      request, until the finish_request hook is invoked on the request. In
      particular, BFQ may get stuck, waiting forever for the number of
      request dispatches, of the same request, to be balanced by an equal
      number of request completions (while there will be one completion for
      that request). In this state, BFQ may refuse to serve I/O requests
      from other bfq_queues. The hang reported in [1] then follows.
      
      However, the above re-prepared requests undergo a requeue, thus the
      requeue_request hook of the active elevator is invoked for these
      requests, if set. This commit then addresses the above issue by
      properly implementing the hook requeue_request in BFQ.
      
      [1] https://marc.info/?l=linux-block&m=151211117608676Reported-by: NIvan Kozik <ivan@ludios.org>
      Reported-by: NAlban Browaeys <alban.browaeys@gmail.com>
      Tested-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NSerena Ziviani <ziviani.serena@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a7877390
  18. 18 1月, 2018 2 次提交
    • P
      block, bfq: limit sectors served with interactive weight raising · 8a8747dc
      Paolo Valente 提交于
      To maximise responsiveness, BFQ raises the weight, and performs device
      idling, for bfq_queues associated with processes deemed as
      interactive. In particular, weight raising has a maximum duration,
      equal to the time needed to start a large application. If a
      weight-raised process goes on doing I/O beyond this maximum duration,
      it loses weight-raising.
      
      This mechanism is evidently vulnerable to the following false
      positives: I/O-bound applications that will go on doing I/O for much
      longer than the duration of weight-raising. These applications have
      basically no benefit from being weight-raised at the beginning of
      their I/O. On the opposite end, while being weight-raised, these
      applications
      a) unjustly steal throughput to applications that may truly need
      low latency;
      b) make BFQ uselessly perform device idling; device idling results
      in loss of device throughput with most flash-based storage, and may
      increase latencies when used purposelessly.
      
      This commit adds a countermeasure to reduce both the above
      problems. To introduce this countermeasure, we provide the following
      extra piece of information (full details in the comments added by this
      commit). During the start-up of the large application used as a
      reference to set the duration of weight-raising, involved processes
      transfer at most ~110K sectors each. Accordingly, a process initially
      deemed as interactive has no right to be weight-raised any longer,
      once transferred 110K sectors or more.
      
      Basing on this consideration, this commit early-ends weight-raising
      for a bfq_queue if the latter happens to have received an amount of
      service at least equal to 110K sectors (actually, a little bit more,
      to keep a safety margin). I/O-bound applications that reach a high
      throughput, such as file copy, get to this threshold much before the
      allowed weight-raising period finishes. Thus this early ending of
      weight-raising reduces the amount of time during which these
      applications cause the problems described above.
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8a8747dc
    • P
      block, bfq: limit tags for writes and async I/O · a52a69ea
      Paolo Valente 提交于
      Asynchronous I/O can easily starve synchronous I/O (both sync reads
      and sync writes), by consuming all request tags. Similarly, storms of
      synchronous writes, such as those that sync(2) may trigger, can starve
      synchronous reads. In their turn, these two problems may also cause
      BFQ to loose control on latency for interactive and soft real-time
      applications. For example, on a PLEXTOR PX-256M5S SSD, LibreOffice
      Writer takes 0.6 seconds to start if the device is idle, but it takes
      more than 45 seconds (!) if there are sequential writes in the
      background.
      
      This commit addresses this issue by limiting the maximum percentage of
      tags that asynchronous I/O requests and synchronous write requests can
      consume. In particular, this commit grants a higher threshold to
      synchronous writes, to prevent the latter from being starved by
      asynchronous I/O.
      
      According to the above test, LibreOffice Writer now starts in about
      1.2 seconds on average, regardless of the background workload, and
      apart from some rare outlier. To check this improvement, run, e.g.,
      sudo ./comm_startup_lat.sh bfq 5 5 seq 10 "lowriter --terminate_after_init"
      for the comm_startup_lat benchmark in the S suite [1].
      
      [1] https://github.com/Algodev-github/STested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a52a69ea
  19. 10 1月, 2018 1 次提交