1. 09 4月, 2010 3 次提交
    • D
      blkio: Add io_queued and avg_queue_size stats · cdc1184c
      Divyesh Shah 提交于
      These stats are useful for getting a feel for the queue depth of the cgroup,
      i.e., how filled up its queues are at a given instant and over the existence of
      the cgroup. This ability is useful when debugging problems in the wild as it
      helps understand the application's IO pattern w/o having to read through the
      userspace code (coz its tedious or just not available) or w/o the ability
      to run blktrace (since you may not have root access and/or not want to disturb
      performance).
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      cdc1184c
    • D
      blkio: Add io_merged stat · 812d4026
      Divyesh Shah 提交于
      This includes both the number of bios merged into requests belonging to this
      cgroup as well as the number of requests merged together.
      In the past, we've observed different merging behavior across upstream kernels,
      some by design some actual bugs. This stat helps a lot in debugging such
      problems when applications report decreased throughput with a new kernel
      version.
      
      This needed adding an extra elevator function to capture bios being merged as I
      did not want to pollute elevator code with blkiocg knowledge and hence needed
      the accounting invocation to come from CFQ.
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      812d4026
    • D
      blkio: Changes to IO controller additional stats patches · 84c124da
      Divyesh Shah 提交于
      that include some minor fixes and addresses all comments.
      
      Changelog: (most based on Vivek Goyal's comments)
      o renamed blkiocg_reset_write to blkiocg_reset_stats
      o more clarification in the documentation on io_service_time and io_wait_time
      o Initialize blkg->stats_lock
      o rename io_add_stat to blkio_add_stat and declare it static
      o use bool for direction and sync
      o derive direction and sync info from existing rq methods
      o use 12 for major:minor string length
      o define io_service_time better to cover the NCQ case
      o add a separate reset_stats interface
      o make the indexed stats a 2d array to simplify macro and function pointer code
      o blkio.time now exports in jiffies as before
      o Added stats description in patch description and
        Documentation/cgroup/blkio-controller.txt
      o Prefix all stats functions with blkio and make them static as applicable
      o replace IO_TYPE_MAX with IO_TYPE_TOTAL
      o Moved #define constant to top of blk-cgroup.c
      o Pass dev_t around instead of char *
      o Add note to documentation file about resetting stats
      o use BLK_CGROUP_MODULE in addition to BLK_CGROUP config option in #ifdef
        statements
      o Avoid struct request specific knowledge in blk-cgroup. blk-cgroup.h now has
        rq_direction() and rq_sync() functions which are used by CFQ and when using
        io-controller at a higher level, bio_* functions can be added.
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      84c124da
  2. 02 4月, 2010 3 次提交
  3. 25 3月, 2010 2 次提交
    • D
      cfq-iosched: Do not merge queues of BE and IDLE classes · 39c01b21
      Divyesh Shah 提交于
      Even if they are found to be co-operating.
      
      The prio_trees do not have any IDLE cfqqs on them. cfq_close_cooperator()
      is called from cfq_select_queue() and cfq_completed_request(). The latter
      ensures that the close cooperator code does not get invoked if the current
      cfqq is of class IDLE but the former doesn't seem to have any such checks.
      So an IDLE cfqq may get merged with a BE cfqq from the same group which
      should be avoided.
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      39c01b21
    • D
      cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging · b1ffe737
      Divyesh Shah 提交于
      These have helped us debug some issues we've noticed in earlier IO
      controller versions and should be useful now as well. The extra logging
      covers:
      - idling behavior. Since there are so many conditions based on which we decide
      to idle or not, this patch adds a log message for some conditions that we've
      found useful.
      - workload slices and current prio and workload type
      
      Changelog from v1:
      o moved log message from cfq_set_active_queue() to __cfq_set_active_queue()
      o changed queue_count to st->count
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b1ffe737
  4. 19 3月, 2010 1 次提交
  5. 01 3月, 2010 5 次提交
    • R
      cfq: remove 8 bytes of padding from cfq_rb_root on 64 bit builds · 73e9ffdd
      Richard Kennedy 提交于
      Reorder cfq_rb_root to remove 8 bytes of padding on 64 bit builds.
      
      Consequently removing 56 bytes from cfq_group and 64 bytes from
      cfq_data.
      Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      73e9ffdd
    • S
      cfq-iosched: quantum check tweak · abc3c744
      Shaohua Li 提交于
      Currently a queue can only dispatch up to 4 requests if there are other queues.
      This isn't optimal, device can handle more requests, for example, AHCI can
      handle 31 requests. I can understand the limit is for fairness, but we could
      do a tweak: if the queue still has a lot of slice left, sounds we could
      ignore the limit. Test shows this boost my workload (two thread randread of
      a SSD) from 78m/s to 100m/s.
      Thanks for suggestions from Corrado and Vivek for the patch.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      abc3c744
    • C
      cfq-iosched: requests "in flight" vs "in driver" clarification · 53c583d2
      Corrado Zoccolo 提交于
      Counters for requests "in flight" and "in driver" are used asymmetrically
      in cfq_may_dispatch, and have slightly different meaning.
      We split the rq_in_flight counter (was sync_flight) to count both sync
      and async requests, in order to use this one, which is more accurate in
      some corner cases.
      The rq_in_driver counter is coalesced, since individual sync/async counts
      are not used any more.
      Signed-off-by: NCorrado Zoccolo <czoccolo@gmail.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      53c583d2
    • C
      cfq-iosched: rethink seeky detection for SSDs · 41647e7a
      Corrado Zoccolo 提交于
      CFQ currently applies the same logic of detecting seeky queues and
      grouping them together for rotational disks as well as SSDs.
      For SSDs, the time to complete a request doesn't depend on the
      request location, but only on the size.
      This patch therefore changes the criterion to group queues by
      request size in case of SSDs, in order to achieve better fairness.
      Signed-off-by: NCorrado Zoccolo <czoccolo@gmail.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      41647e7a
    • C
      cfq-iosched: rework seeky detection · 3dde36dd
      Corrado Zoccolo 提交于
      Current seeky detection is based on average seek lenght.
      This is suboptimal, since the average will not distinguish between:
      * a process doing medium sized seeks
      * a process doing some sequential requests interleaved with larger seeks
      and even a medium seek can take lot of time, if the requested sector
      happens to be behind the disk head in the rotation (50% probability).
      
      Therefore, we change the seeky queue detection to work as follows:
      * each request can be classified as sequential if it is very close to
        the current head position, i.e. it is likely in the disk cache (disks
        usually read more data than requested, and put it in cache for
        subsequent reads). Otherwise, the request is classified as seeky.
      * an history window of the last 32 requests is kept, storing the
        classification result.
      * A queue is marked as seeky if more than 1/8 of the last 32 requests
        were seeky.
      
      This patch fixes a regression reported by Yanmin, on mmap 64k random
      reads.
      Reported-by: NYanmin Zhang <yanmin_zhang@linux.intel.com>
      Signed-off-by: NCorrado Zoccolo <czoccolo@gmail.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3dde36dd
  6. 26 2月, 2010 1 次提交
  7. 22 2月, 2010 1 次提交
  8. 05 2月, 2010 1 次提交
  9. 03 2月, 2010 1 次提交
    • V
      cfq-iosched: Do not idle on async queues · 1efe8fe1
      Vivek Goyal 提交于
      Few weeks back, Shaohua Li had posted similar patch. I am reposting it
      with more test results.
      
      This patch does two things.
      
      - Do not idle on async queues.
      
      - It also changes the write queue depth CFQ drives (cfq_may_dispatch()).
        Currently, we seem to driving queue depth of 1 always for WRITES. This is
        true even if there is only one write queue in the system and all the logic
        of infinite queue depth in case of single busy queue as well as slowly
        increasing queue depth based on last delayed sync request does not seem to
        be kicking in at all.
      
      This patch will allow deeper WRITE queue depths (subjected to the other
      WRITE queue depth contstraints like cfq_quantum and last delayed sync
      request).
      
      Shaohua Li had reported getting more out of his SSD. For me, I have got
      one Lun exported from an HP EVA and when pure buffered writes are on, I
      can get more out of the system. Following are test results of pure
      buffered writes (with end_fsync=1) with vanilla and patched kernel. These
      results are average of 3 sets of run with increasing number of threads.
      
      AVERAGE[bufwfs][vanilla]
      -------
      job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
      ---       --- --  ------------   -----------    -------------  -----------
      bufwfs    3   1   0              0              95349          474141
      bufwfs    3   2   0              0              100282         806926
      bufwfs    3   4   0              0              109989         2.7301e+06
      bufwfs    3   8   0              0              116642         3762231
      bufwfs    3   16  0              0              118230         6902970
      
      AVERAGE[bufwfs] [patched kernel]
      -------
      bufwfs    3   1   0              0              270722         404352
      bufwfs    3   2   0              0              206770         1.06552e+06
      bufwfs    3   4   0              0              195277         1.62283e+06
      bufwfs    3   8   0              0              260960         2.62979e+06
      bufwfs    3   16  0              0              299260         1.70731e+06
      
      I also ran buffered writes along with some sequential reads and some
      buffered reads going on in the system on a SATA disk because the potential
      risk could be that we should not be driving queue depth higher in presence
      of sync IO going to keep the max clat low.
      
      With some random and sequential reads going on in the system on one SATA
      disk I did not see any significant increase in max clat. So it looks like
      other WRITE queue depth control logic is doing its job. Here are the
      results.
      
      AVERAGE[brr, bsr, bufw together] [vanilla]
      -------
      job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
      ---       --- --  ------------   -----------    -------------  -----------
      brr       3   1   850            546345         0              0
      bsr       3   1   14650          729543         0              0
      bufw      3   1   0              0              23908          8274517
      
      brr       3   2   981.333        579395         0              0
      bsr       3   2   14149.7        1175689        0              0
      bufw      3   2   0              0              21921          1.28108e+07
      
      brr       3   4   898.333        1.75527e+06    0              0
      bsr       3   4   12230.7        1.40072e+06    0              0
      bufw      3   4   0              0              19722.3        2.4901e+07
      
      brr       3   8   900            3160594        0              0
      bsr       3   8   9282.33        1.91314e+06    0              0
      bufw      3   8   0              0              18789.3        23890622
      
      AVERAGE[brr, bsr, bufw mixed] [patched kernel]
      -------
      job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
      ---       --- --  ------------   -----------    -------------  -----------
      brr       3   1   837            417973         0              0
      bsr       3   1   14357.7        591275         0              0
      bufw      3   1   0              0              24869.7        8910662
      
      brr       3   2   1038.33        543434         0              0
      bsr       3   2   13351.3        1205858        0              0
      bufw      3   2   0              0              18626.3        13280370
      
      brr       3   4   913            1.86861e+06    0              0
      bsr       3   4   12652.3        1430974        0              0
      bufw      3   4   0              0              15343.3        2.81305e+07
      
      brr       3   8   890            2.92695e+06    0              0
      bsr       3   8   9635.33        1.90244e+06    0              0
      bufw      3   8   0              0              17200.3        24424392
      
      So looks like it might make sense to include this patch.
      
      Thanks
      Vivek
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1efe8fe1
  10. 11 1月, 2010 1 次提交
    • D
      cfq-iosched: Respect ioprio_class when preempting · 875feb63
      Divyesh Shah 提交于
      In cfq_should_preempt(), we currently allow some cases where a non-RT request
      can preempt an ongoing RT cfqq timeslice. This should not happen.
      Examples include:
      
      o A sync_noidle wl type non-RT request pre-empting a sync_noidle wl type cfqq
        on which we are idling.
      o Once we have per-cgroup async queues, a non-RT sync request pre-empting a RT
        async cfqq.
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      875feb63
  11. 28 12月, 2009 1 次提交
  12. 18 12月, 2009 3 次提交
    • V
      cfq-iosched: Remove prio_change logic for workload selection · 65b32a57
      Vivek Goyal 提交于
      o CFQ now internally divides cfq queues in therr workload categories. sync-idle,
        sync-noidle and async. Which workload to run depends primarily on rb_key
        offset across three service trees. Which is a combination of mulitiple things
        including what time queue got queued on the service tree.
      
        There is one exception though. That is if we switched the prio class, say
        we served some RT tasks and again started serving BE class, then with-in
        BE class we always started with sync-noidle workload irrespective of rb_key
        offset in service trees.
      
        This can provide better latencies for sync-noidle workload in the presence
        of RT tasks.
      
      o This patch gets rid of that exception and which workload to run with-in
        class always depends on lowest rb_key across service trees. The reason
        being that now we have multiple BE class groups and if we always switch
        to sync-noidle workload with-in group, we can potentially starve a sync-idle
        workload with-in group. Same is true for async workload which will be in
        root group. Also the workload-switching with-in group will become very
        unpredictable as it now depends whether some RT workload was running in
        the system or not.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Acked-by: NCorrado Zoccolo <czoccolo@gmail.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      65b32a57
    • V
      cfq-iosched: Get rid of nr_groups · fb104db4
      Vivek Goyal 提交于
      o Currently code does not seem to be using cfqd->nr_groups. Get rid of it.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      fb104db4
    • V
      cfq-iosched: Remove the check for same cfq group from allow_merge · 1db32c40
      Vivek Goyal 提交于
      o allow_merge() already checks if submitting task is pointing to same cfqq
        as rq has been queued in. If everything is fine, we should not be having
        a task in one cgroup and having a pointer to cfqq in other cgroup.
      
        Well I guess in some situations it can happen and that is, when a random
        IO queue has been moved into root cgroup for group_isolation=0. In
        this case, tasks's cgroup/group is different from where actually cfqq is,
        but this is intentional and in this case merging should be allowed.
      
        The second situation is where due to close cooperator patches, multiple
        processes can be sharing a cfqq. If everything implemented right, we should
        not end up in a situation where tasks from different processes in different
        groups are sharing the same cfqq as we allow merging of cooperating queues
        only if they are in same group.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1db32c40
  13. 15 12月, 2009 1 次提交
  14. 11 12月, 2009 1 次提交
    • V
      Fix a CFQ crash in "for-2.6.33" branch of block tree · 82bbbf28
      Vivek Goyal 提交于
      I think my previous patch introduced a bug which can lead to CFQ hitting
      BUG_ON().
      
      The offending commit in for-2.6.33 branch is.
      
      commit 7667aa06
      Author: Vivek Goyal <vgoyal@redhat.com>
      Date:   Tue Dec 8 17:52:58 2009 -0500
      
          cfq-iosched: Take care of corner cases of group losing share due to deletion
      
      While doing some stress testing on my box, I enountered following.
      
      login: [ 3165.148841] BUG: scheduling while
      atomic: swapper/0/0x10000100
      [ 3165.149821] Modules linked in: cfq_iosched dm_multipath qla2xxx igb
      scsi_transport_fc dm_snapshot [last unloaded: scsi_wait_scan]
      [ 3165.149821] Pid: 0, comm: swapper Not tainted
      2.6.32-block-for-33-merged-new #3
      [ 3165.149821] Call Trace:
      [ 3165.149821]  <IRQ>  [<ffffffff8103fab8>] __schedule_bug+0x5c/0x60
      [ 3165.149821]  [<ffffffff8103afd7>] ? __wake_up+0x44/0x4d
      [ 3165.149821]  [<ffffffff8153a979>] schedule+0xe3/0x7bc
      [ 3165.149821]  [<ffffffff8103a796>] ? cpumask_next+0x1d/0x1f
      [ 3165.149821]  [<ffffffffa000b21d>] ? cfq_dispatch_requests+0x6ba/0x93e
      [cfq_iosched]
      [ 3165.149821]  [<ffffffff810422d8>] __cond_resched+0x2a/0x35
      [ 3165.149821]  [<ffffffffa000b21d>] ? cfq_dispatch_requests+0x6ba/0x93e
      [cfq_iosched]
      [ 3165.149821]  [<ffffffff8153b1ee>] _cond_resched+0x2c/0x37
      [ 3165.149821]  [<ffffffff8100e2db>] is_valid_bugaddr+0x16/0x2f
      [ 3165.149821]  [<ffffffff811e4161>] report_bug+0x18/0xac
      [ 3165.149821]  [<ffffffff8100f1fc>] die+0x39/0x63
      [ 3165.149821]  [<ffffffff8153cde1>] do_trap+0x11a/0x129
      [ 3165.149821]  [<ffffffff8100d470>] do_invalid_op+0x96/0x9f
      [ 3165.149821]  [<ffffffffa000b21d>] ? cfq_dispatch_requests+0x6ba/0x93e
      [cfq_iosched]
      [ 3165.149821]  [<ffffffff81034b4d>] ? enqueue_task+0x5c/0x67
      [ 3165.149821]  [<ffffffff8103ae83>] ? task_rq_unlock+0x11/0x13
      [ 3165.149821]  [<ffffffff81041aae>] ? try_to_wake_up+0x292/0x2a4
      [ 3165.149821]  [<ffffffff8100c935>] invalid_op+0x15/0x20
      [ 3165.149821]  [<ffffffffa000b21d>] ? cfq_dispatch_requests+0x6ba/0x93e
      [cfq_iosched]
      [ 3165.149821]  [<ffffffff810df5a6>] ? virt_to_head_page+0xe/0x2f
      [ 3165.149821]  [<ffffffff811d8c2a>] blk_peek_request+0x191/0x1a7
      [ 3165.149821]  [<ffffffff811e5b8d>] ? kobject_get+0x1a/0x21
      [ 3165.149821]  [<ffffffff812c8d4c>] scsi_request_fn+0x82/0x3df
      [ 3165.149821]  [<ffffffff8110b2de>] ? bio_fs_destructor+0x15/0x17
      [ 3165.149821]  [<ffffffff810df5a6>] ? virt_to_head_page+0xe/0x2f
      [ 3165.149821]  [<ffffffff811d931f>] __blk_run_queue+0x42/0x71
      [ 3165.149821]  [<ffffffff811d9403>] blk_run_queue+0x26/0x3a
      [ 3165.149821]  [<ffffffff812c8761>] scsi_run_queue+0x2de/0x375
      [ 3165.149821]  [<ffffffff812b60ac>] ? put_device+0x17/0x19
      [ 3165.149821]  [<ffffffff812c92d7>] scsi_next_command+0x3b/0x4b
      [ 3165.149821]  [<ffffffff812c9b9f>] scsi_io_completion+0x1c9/0x3f5
      [ 3165.149821]  [<ffffffff812c3c36>] scsi_finish_command+0xb5/0xbe
      
      I think I have hit following BUG_ON() in cfq_dispatch_request().
      
      BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list));
      
      Please find attached the patch to fix it. I have done some stress testing
      with it and have not seen it happening again.
      
      o We should wait on a queue even after slice expiry only if it is empty. If
        queue is not empty then continue to expire it.
      
      o If we decide to keep the queue then make cfqq=NULL. Otherwise select_queue()
        will return a valid cfqq and cfq_dispatch_request() can hit following
        BUG_ON().
      
        BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list))
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      82bbbf28
  15. 10 12月, 2009 2 次提交
  16. 09 12月, 2009 4 次提交
  17. 08 12月, 2009 1 次提交
  18. 06 12月, 2009 1 次提交
  19. 04 12月, 2009 7 次提交
    • V
      blkio: Implement dynamic io controlling policy registration · 3e252066
      Vivek Goyal 提交于
      o One of the goals of block IO controller is that it should be able to
        support mulitple io control policies, some of which be operational at
        higher level in storage hierarchy.
      
      o To begin with, we had one io controlling policy implemented by CFQ, and
        I hard coded the CFQ functions called by blkio. This created issues when
        CFQ is compiled as module.
      
      o This patch implements a basic dynamic io controlling policy registration
        functionality in blkio. This is similar to elevator functionality where
        ioschedulers register the functions dynamically.
      
      o Now in future, when more IO controlling policies are implemented, these
        can dynakically register with block IO controller.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3e252066
    • V
      blkio: Export some symbols from blkio as its user CFQ can be a module · 9d6a986c
      Vivek Goyal 提交于
      o blkio controller is inside the kernel and cfq makes use of interfaces
        exported by blkio. CFQ can be a module too, hence export symbols used
        by CFQ.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      9d6a986c
    • S
      cfq-iosched: make nonrot check logic consistent · 3c764b7a
      Shaohua Li 提交于
      cfq_arm_slice_timer() has logic to disable idle window for SSD device. The same
      thing should be done at cfq_select_queue() too, otherwise we will still see
      idle window. This makes the nonrot check logic consistent in cfq.
      Tests in a intel SSD with low_latency knob close, below patch can triple disk
      thoughput for muti-thread sequential read.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3c764b7a
    • J
      cfq-iosched: move IO controller declerations to a header file · f2eecb91
      Jens Axboe 提交于
      They should not be declared inside some other file that's not related
      to CFQ.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      f2eecb91
    • V
      blkio: Wait on sync-noidle queue even if rq_noidle = 1 · c04645e5
      Vivek Goyal 提交于
      o rq_noidle() is supposed to tell cfq that do not expect a request after this
        one, hence don't idle. But this does not seem to work very well. For example
        for direct random readers, rq_noidle = 1 but there is next request coming
        after this. Not idling, leads to a group not getting its share even if
        group_isolation=1.
      
      o The right solution for this issue is to scan the higher layers and set
        right flag (WRITE_SYNC or WRITE_ODIRECT). For the time being, this single
        line fix helps. This should not have any significant impact when we are
        not using cgroups. I will later figure out IO paths in higher layer and
        fix it.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c04645e5
    • V
      blkio: Implement group_isolation tunable · ae30c286
      Vivek Goyal 提交于
      o If a group is running only a random reader, then it will not have enough
        traffic to keep disk busy and we will reduce overall throughput. This
        should result in better latencies for random reader though. If we don't
        idle on random reader service tree, then this random reader will experience
        large latencies if there are other groups present in system with sequential
        readers running in these.
      
      o One solution suggested by corrado is that by default keep the random readers
        or sync-noidle workload in root group so that during one dispatch round
        we idle only once on sync-noidle tree. This means that all the sync-idle
        workload queues will be in their respective group and we will see service
        differentiation in those but not on sync-noidle workload.
      
      o Provide a tunable group_isolation. If set, this will make sure that even
        sync-noidle queues go in their respective group and we wait on these. This
        provides stronger isolation between groups but at the expense of throughput
        if group does not have enough traffic to keep the disk busy.
      
      o By default group_isolation = 0
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      ae30c286
    • V
      blkio: Determine async workload length based on total number of queues · f26bd1f0
      Vivek Goyal 提交于
      o Async queues are not per group. Instead these are system wide and maintained
        in root group. Hence their workload slice length should be calculated
        based on total number of queues in the system and not just queues in the
        root group.
      
      o As root group's default weight is 1000, make sure to charge async queue
        more in terms of vtime so that it does not get more time on disk because
        root group has higher weight.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      f26bd1f0