1. 20 11月, 2019 18 次提交
  2. 19 11月, 2019 11 次提交
  3. 12 11月, 2019 1 次提交
  4. 07 11月, 2019 10 次提交
    • D
      iocost: don't nest spin_lock_irq in ioc_weight_write() · e0d40c03
      Dan Carpenter 提交于
      This code causes a static analysis warning:
      
          block/blk-iocost.c:2113 ioc_weight_write() error: double lock 'irq'
      
      We disable IRQs in blkg_conf_prep() and re-enable them in
      blkg_conf_finish().  IRQ disable/enable should not be nested because
      that means the IRQs will be enabled at the first unlock instead of the
      second one.
      
      Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost")
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      e0d40c03
    • J
      iocost: fix a deadlock in ioc_rqos_throttle() · 4fee61a5
      Jiufei Xue 提交于
      Function ioc_rqos_throttle() may called inside queue_lock.
      We should unlock the queue_lock before entering sleep.
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      4fee61a5
    • T
      blkcg: Fix multiple bugs in blkcg_activate_policy() · cfde0248
      Tejun Heo 提交于
      commit 9d179b865449b351ad5cb76dbea480c9170d4a27 upstream.
      
      blkcg_activate_policy() has the following bugs.
      
      * cf09a8ee19ad ("blkcg: pass @q and @blkcg into
        blkcg_pol_alloc_pd_fn()") added @blkcg to ->pd_alloc_fn(); however,
        blkcg_activate_policy() ends up using pd's allocated for the root
        blkcg for all preallocations, so ->pd_init_fn() for non-root blkcgs
        can be passed in pd's which are allocated for the root blkcg.
      
        For blk-iocost, this means that ->pd_init_fn() can write beyond the
        end of the allocated object as it determines the length of the flex
        array at the end based on the blkcg's nesting level.
      
      * Each pd is initialized as they get allocated.  If alloc fails, the
        policy will get freed with pd's initialized on it.
      
      * After the above partial failure, the partial pds are not freed.
      
      This patch fixes all the above issues by
      
      * Restructuring blkcg_activate_policy() so that alloc and init passes
        are separate.  Init takes place only after all allocs succeeded and
        on failure all allocated pds are freed.
      
      * Unifying and fixing the cleanup of the remaining pd_prealloc.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: cf09a8ee19ad ("blkcg: pass @q and @blkcg into blkcg_pol_alloc_pd_fn()")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      cfde0248
    • T
      blkcg: blkcg_activate_policy() should initialize ancestors first · b16e0acf
      Tejun Heo 提交于
      commit 71c814077de60b2e7415dac6f5c4e98f59d521fd upstream.
      
      When blkcg_activate_policy() is creating blkg_policy_data for existing
      blkgs, it did in the wrong order - descendants first.  Fix it.  None
      of the existing controllers seem affected by this.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      b16e0acf
    • T
      iocost: don't let vrate run wild while there's no saturation signal · 4c6a0b7a
      Tejun Heo 提交于
      When the QoS targets are met and nothing is being throttled, there's
      no way to tell how saturated the underlying device is - it could be
      almost entirely idle, at the cusp of saturation or anywhere inbetween.
      Given that there's no information, it's best to keep vrate as-is in
      this state.  Before 7cd806a9a953 ("iocost: improve nr_lagging
      handling"), this was the case - if the device isn't missing QoS
      targets and nothing is being throttled, busy_level was reset to zero.
      
      While fixing nr_lagging handling, 7cd806a9a953 ("iocost: improve
      nr_lagging handling") broke this.  Now, while the device is hitting
      QoS targets and nothing is being throttled, vrate keeps getting
      adjusted according to the existing busy_level.
      
      This led to vrate keeping climing till it hits max when there's an IO
      issuer with limited request concurrency if the vrate started low.
      vrate starts getting adjusted upwards until the issuer can issue IOs
      w/o being throttled.  From then on, QoS targets keeps getting met and
      nothing on the system needs throttling and vrate keeps getting
      increased due to the existing busy_level.
      
      This patch makes the following changes to the busy_level logic.
      
      * Reset busy_level if nr_shortages is zero to avoid the above
        scenario.
      
      * Make non-zero nr_lagging block lowering nr_level but still clear
        positive busy_level if there's clear non-saturation signal - QoS
        targets are met and nr_shortages is non-zero.  nr_lagging's role is
        preventing adjusting vrate upwards while there are long-running
        commands and it shouldn't keep busy_level positive while there's
        clear non-saturation signal.
      
      * Restructure code for clarity and add comments.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NAndy Newell <newella@fb.com>
      Fixes: 7cd806a9a953 ("iocost: improve nr_lagging handling")
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      4c6a0b7a
    • T
      iocost: bump up default latency targets for hard disks · 2a3e5884
      Tejun Heo 提交于
      commit 7afcccafa59fb63b58f863a6c5e603a34625955b upstream.
      
      The default hard disk param sets latency targets at 50ms.  As the
      default target percentiles are zero, these don't directly regulate
      vrate; however, they're still used to calculate the period length -
      100ms in this case.
      
      This is excessively low.  A SATA drive with QD32 saturated with random
      IOs can easily reach avg completion latency of several hundred msecs.
      A period duration which is substantially lower than avg completion
      latency can lead to wildly fluctuating vrate.
      
      Let's bump up the default latency targets to 250ms so that the period
      duration is sufficiently long.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      2a3e5884
    • T
      iocost: improve nr_lagging handling · 51789009
      Tejun Heo 提交于
      commit 7cd806a9a953f234b9865c30028f47fd738ce375 upstream.
      
      Some IOs may span multiple periods.  As latencies are collected on
      completion, the inbetween periods won't register them and may
      incorrectly decide to increase vrate.  nr_lagging tracks these IOs to
      avoid those situations.  Currently, whenever there are IOs which are
      spanning from the previous period, busy_level is reset to 0 if
      negative thus suppressing vrate increase.
      
      This has the following two problems.
      
      * When latency target percentiles aren't set, vrate adjustment should
        only be governed by queue depth depletion; however, the current code
        keeps nr_lagging active which pulls in latency results and can keep
        down vrate unexpectedly.
      
      * When lagging condition is detected, it resets the entire negative
        busy_level.  This turned out to be way too aggressive on some
        devices which sometimes experience extended latencies on a small
        subset of commands.  In addition, a lagging IO will be accounted as
        latency target miss on completion anyway and resetting busy_level
        amplifies its impact unnecessarily.
      
      This patch fixes the above two problems by disabling nr_lagging
      counting when latency target percentiles aren't set and blocking vrate
      increases when there are lagging IOs while leaving busy_level as-is.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      51789009
    • T
      iocost: better trace vrate changes · 84b9c65f
      Tejun Heo 提交于
      commit 25d41e4aadb0788b4fae8a8fca90f437b9ebd727 upstream.
      
      vrate_adj tracepoint traces vrate changes; however, it does so only
      when busy_level is non-zero.  busy_level turning to zero can sometimes
      be as interesting an event.  This patch also enables vrate_adj
      tracepoint on other vrate related events - busy_level changes and
      non-zero nr_lagging.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      84b9c65f
    • J
      iocost: fix NULL pointer dereference in ioc_rqos_throttle · 45c5aa58
      Jiufei Xue 提交于
      Bios are not associated with blkg before entering iocost controller.
      do it in ioc_rqos_throttle() as well as ioc_rqos_merge().
      
      Considering that there are so many chances to create blkg before
      ioc_rqos_merge(), we just lookup the blkg here and if blkg are not
      exist, just return rather than create it.
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      45c5aa58
    • J
      iocost: add legacy interface file · 63d9b2ea
      Jiufei Xue 提交于
      To support cgroup v1.
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      63d9b2ea