1. 02 4月, 2010 1 次提交
  2. 25 3月, 2010 2 次提交
    • D
      cfq-iosched: Do not merge queues of BE and IDLE classes · 39c01b21
      Divyesh Shah 提交于
      Even if they are found to be co-operating.
      
      The prio_trees do not have any IDLE cfqqs on them. cfq_close_cooperator()
      is called from cfq_select_queue() and cfq_completed_request(). The latter
      ensures that the close cooperator code does not get invoked if the current
      cfqq is of class IDLE but the former doesn't seem to have any such checks.
      So an IDLE cfqq may get merged with a BE cfqq from the same group which
      should be avoided.
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      39c01b21
    • D
      cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging · b1ffe737
      Divyesh Shah 提交于
      These have helped us debug some issues we've noticed in earlier IO
      controller versions and should be useful now as well. The extra logging
      covers:
      - idling behavior. Since there are so many conditions based on which we decide
      to idle or not, this patch adds a log message for some conditions that we've
      found useful.
      - workload slices and current prio and workload type
      
      Changelog from v1:
      o moved log message from cfq_set_active_queue() to __cfq_set_active_queue()
      o changed queue_count to st->count
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b1ffe737
  3. 19 3月, 2010 1 次提交
  4. 01 3月, 2010 5 次提交
    • R
      cfq: remove 8 bytes of padding from cfq_rb_root on 64 bit builds · 73e9ffdd
      Richard Kennedy 提交于
      Reorder cfq_rb_root to remove 8 bytes of padding on 64 bit builds.
      
      Consequently removing 56 bytes from cfq_group and 64 bytes from
      cfq_data.
      Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      73e9ffdd
    • S
      cfq-iosched: quantum check tweak · abc3c744
      Shaohua Li 提交于
      Currently a queue can only dispatch up to 4 requests if there are other queues.
      This isn't optimal, device can handle more requests, for example, AHCI can
      handle 31 requests. I can understand the limit is for fairness, but we could
      do a tweak: if the queue still has a lot of slice left, sounds we could
      ignore the limit. Test shows this boost my workload (two thread randread of
      a SSD) from 78m/s to 100m/s.
      Thanks for suggestions from Corrado and Vivek for the patch.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      abc3c744
    • C
      cfq-iosched: requests "in flight" vs "in driver" clarification · 53c583d2
      Corrado Zoccolo 提交于
      Counters for requests "in flight" and "in driver" are used asymmetrically
      in cfq_may_dispatch, and have slightly different meaning.
      We split the rq_in_flight counter (was sync_flight) to count both sync
      and async requests, in order to use this one, which is more accurate in
      some corner cases.
      The rq_in_driver counter is coalesced, since individual sync/async counts
      are not used any more.
      Signed-off-by: NCorrado Zoccolo <czoccolo@gmail.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      53c583d2
    • C
      cfq-iosched: rethink seeky detection for SSDs · 41647e7a
      Corrado Zoccolo 提交于
      CFQ currently applies the same logic of detecting seeky queues and
      grouping them together for rotational disks as well as SSDs.
      For SSDs, the time to complete a request doesn't depend on the
      request location, but only on the size.
      This patch therefore changes the criterion to group queues by
      request size in case of SSDs, in order to achieve better fairness.
      Signed-off-by: NCorrado Zoccolo <czoccolo@gmail.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      41647e7a
    • C
      cfq-iosched: rework seeky detection · 3dde36dd
      Corrado Zoccolo 提交于
      Current seeky detection is based on average seek lenght.
      This is suboptimal, since the average will not distinguish between:
      * a process doing medium sized seeks
      * a process doing some sequential requests interleaved with larger seeks
      and even a medium seek can take lot of time, if the requested sector
      happens to be behind the disk head in the rotation (50% probability).
      
      Therefore, we change the seeky queue detection to work as follows:
      * each request can be classified as sequential if it is very close to
        the current head position, i.e. it is likely in the disk cache (disks
        usually read more data than requested, and put it in cache for
        subsequent reads). Otherwise, the request is classified as seeky.
      * an history window of the last 32 requests is kept, storing the
        classification result.
      * A queue is marked as seeky if more than 1/8 of the last 32 requests
        were seeky.
      
      This patch fixes a regression reported by Yanmin, on mmap 64k random
      reads.
      Reported-by: NYanmin Zhang <yanmin_zhang@linux.intel.com>
      Signed-off-by: NCorrado Zoccolo <czoccolo@gmail.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3dde36dd
  5. 26 2月, 2010 1 次提交
  6. 22 2月, 2010 1 次提交
  7. 05 2月, 2010 1 次提交
  8. 03 2月, 2010 1 次提交
    • V
      cfq-iosched: Do not idle on async queues · 1efe8fe1
      Vivek Goyal 提交于
      Few weeks back, Shaohua Li had posted similar patch. I am reposting it
      with more test results.
      
      This patch does two things.
      
      - Do not idle on async queues.
      
      - It also changes the write queue depth CFQ drives (cfq_may_dispatch()).
        Currently, we seem to driving queue depth of 1 always for WRITES. This is
        true even if there is only one write queue in the system and all the logic
        of infinite queue depth in case of single busy queue as well as slowly
        increasing queue depth based on last delayed sync request does not seem to
        be kicking in at all.
      
      This patch will allow deeper WRITE queue depths (subjected to the other
      WRITE queue depth contstraints like cfq_quantum and last delayed sync
      request).
      
      Shaohua Li had reported getting more out of his SSD. For me, I have got
      one Lun exported from an HP EVA and when pure buffered writes are on, I
      can get more out of the system. Following are test results of pure
      buffered writes (with end_fsync=1) with vanilla and patched kernel. These
      results are average of 3 sets of run with increasing number of threads.
      
      AVERAGE[bufwfs][vanilla]
      -------
      job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
      ---       --- --  ------------   -----------    -------------  -----------
      bufwfs    3   1   0              0              95349          474141
      bufwfs    3   2   0              0              100282         806926
      bufwfs    3   4   0              0              109989         2.7301e+06
      bufwfs    3   8   0              0              116642         3762231
      bufwfs    3   16  0              0              118230         6902970
      
      AVERAGE[bufwfs] [patched kernel]
      -------
      bufwfs    3   1   0              0              270722         404352
      bufwfs    3   2   0              0              206770         1.06552e+06
      bufwfs    3   4   0              0              195277         1.62283e+06
      bufwfs    3   8   0              0              260960         2.62979e+06
      bufwfs    3   16  0              0              299260         1.70731e+06
      
      I also ran buffered writes along with some sequential reads and some
      buffered reads going on in the system on a SATA disk because the potential
      risk could be that we should not be driving queue depth higher in presence
      of sync IO going to keep the max clat low.
      
      With some random and sequential reads going on in the system on one SATA
      disk I did not see any significant increase in max clat. So it looks like
      other WRITE queue depth control logic is doing its job. Here are the
      results.
      
      AVERAGE[brr, bsr, bufw together] [vanilla]
      -------
      job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
      ---       --- --  ------------   -----------    -------------  -----------
      brr       3   1   850            546345         0              0
      bsr       3   1   14650          729543         0              0
      bufw      3   1   0              0              23908          8274517
      
      brr       3   2   981.333        579395         0              0
      bsr       3   2   14149.7        1175689        0              0
      bufw      3   2   0              0              21921          1.28108e+07
      
      brr       3   4   898.333        1.75527e+06    0              0
      bsr       3   4   12230.7        1.40072e+06    0              0
      bufw      3   4   0              0              19722.3        2.4901e+07
      
      brr       3   8   900            3160594        0              0
      bsr       3   8   9282.33        1.91314e+06    0              0
      bufw      3   8   0              0              18789.3        23890622
      
      AVERAGE[brr, bsr, bufw mixed] [patched kernel]
      -------
      job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
      ---       --- --  ------------   -----------    -------------  -----------
      brr       3   1   837            417973         0              0
      bsr       3   1   14357.7        591275         0              0
      bufw      3   1   0              0              24869.7        8910662
      
      brr       3   2   1038.33        543434         0              0
      bsr       3   2   13351.3        1205858        0              0
      bufw      3   2   0              0              18626.3        13280370
      
      brr       3   4   913            1.86861e+06    0              0
      bsr       3   4   12652.3        1430974        0              0
      bufw      3   4   0              0              15343.3        2.81305e+07
      
      brr       3   8   890            2.92695e+06    0              0
      bsr       3   8   9635.33        1.90244e+06    0              0
      bufw      3   8   0              0              17200.3        24424392
      
      So looks like it might make sense to include this patch.
      
      Thanks
      Vivek
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1efe8fe1
  9. 11 1月, 2010 1 次提交
    • D
      cfq-iosched: Respect ioprio_class when preempting · 875feb63
      Divyesh Shah 提交于
      In cfq_should_preempt(), we currently allow some cases where a non-RT request
      can preempt an ongoing RT cfqq timeslice. This should not happen.
      Examples include:
      
      o A sync_noidle wl type non-RT request pre-empting a sync_noidle wl type cfqq
        on which we are idling.
      o Once we have per-cgroup async queues, a non-RT sync request pre-empting a RT
        async cfqq.
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      875feb63
  10. 28 12月, 2009 1 次提交
  11. 18 12月, 2009 3 次提交
    • V
      cfq-iosched: Remove prio_change logic for workload selection · 65b32a57
      Vivek Goyal 提交于
      o CFQ now internally divides cfq queues in therr workload categories. sync-idle,
        sync-noidle and async. Which workload to run depends primarily on rb_key
        offset across three service trees. Which is a combination of mulitiple things
        including what time queue got queued on the service tree.
      
        There is one exception though. That is if we switched the prio class, say
        we served some RT tasks and again started serving BE class, then with-in
        BE class we always started with sync-noidle workload irrespective of rb_key
        offset in service trees.
      
        This can provide better latencies for sync-noidle workload in the presence
        of RT tasks.
      
      o This patch gets rid of that exception and which workload to run with-in
        class always depends on lowest rb_key across service trees. The reason
        being that now we have multiple BE class groups and if we always switch
        to sync-noidle workload with-in group, we can potentially starve a sync-idle
        workload with-in group. Same is true for async workload which will be in
        root group. Also the workload-switching with-in group will become very
        unpredictable as it now depends whether some RT workload was running in
        the system or not.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Acked-by: NCorrado Zoccolo <czoccolo@gmail.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      65b32a57
    • V
      cfq-iosched: Get rid of nr_groups · fb104db4
      Vivek Goyal 提交于
      o Currently code does not seem to be using cfqd->nr_groups. Get rid of it.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      fb104db4
    • V
      cfq-iosched: Remove the check for same cfq group from allow_merge · 1db32c40
      Vivek Goyal 提交于
      o allow_merge() already checks if submitting task is pointing to same cfqq
        as rq has been queued in. If everything is fine, we should not be having
        a task in one cgroup and having a pointer to cfqq in other cgroup.
      
        Well I guess in some situations it can happen and that is, when a random
        IO queue has been moved into root cgroup for group_isolation=0. In
        this case, tasks's cgroup/group is different from where actually cfqq is,
        but this is intentional and in this case merging should be allowed.
      
        The second situation is where due to close cooperator patches, multiple
        processes can be sharing a cfqq. If everything implemented right, we should
        not end up in a situation where tasks from different processes in different
        groups are sharing the same cfqq as we allow merging of cooperating queues
        only if they are in same group.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1db32c40
  12. 15 12月, 2009 1 次提交
  13. 11 12月, 2009 1 次提交
    • V
      Fix a CFQ crash in "for-2.6.33" branch of block tree · 82bbbf28
      Vivek Goyal 提交于
      I think my previous patch introduced a bug which can lead to CFQ hitting
      BUG_ON().
      
      The offending commit in for-2.6.33 branch is.
      
      commit 7667aa06
      Author: Vivek Goyal <vgoyal@redhat.com>
      Date:   Tue Dec 8 17:52:58 2009 -0500
      
          cfq-iosched: Take care of corner cases of group losing share due to deletion
      
      While doing some stress testing on my box, I enountered following.
      
      login: [ 3165.148841] BUG: scheduling while
      atomic: swapper/0/0x10000100
      [ 3165.149821] Modules linked in: cfq_iosched dm_multipath qla2xxx igb
      scsi_transport_fc dm_snapshot [last unloaded: scsi_wait_scan]
      [ 3165.149821] Pid: 0, comm: swapper Not tainted
      2.6.32-block-for-33-merged-new #3
      [ 3165.149821] Call Trace:
      [ 3165.149821]  <IRQ>  [<ffffffff8103fab8>] __schedule_bug+0x5c/0x60
      [ 3165.149821]  [<ffffffff8103afd7>] ? __wake_up+0x44/0x4d
      [ 3165.149821]  [<ffffffff8153a979>] schedule+0xe3/0x7bc
      [ 3165.149821]  [<ffffffff8103a796>] ? cpumask_next+0x1d/0x1f
      [ 3165.149821]  [<ffffffffa000b21d>] ? cfq_dispatch_requests+0x6ba/0x93e
      [cfq_iosched]
      [ 3165.149821]  [<ffffffff810422d8>] __cond_resched+0x2a/0x35
      [ 3165.149821]  [<ffffffffa000b21d>] ? cfq_dispatch_requests+0x6ba/0x93e
      [cfq_iosched]
      [ 3165.149821]  [<ffffffff8153b1ee>] _cond_resched+0x2c/0x37
      [ 3165.149821]  [<ffffffff8100e2db>] is_valid_bugaddr+0x16/0x2f
      [ 3165.149821]  [<ffffffff811e4161>] report_bug+0x18/0xac
      [ 3165.149821]  [<ffffffff8100f1fc>] die+0x39/0x63
      [ 3165.149821]  [<ffffffff8153cde1>] do_trap+0x11a/0x129
      [ 3165.149821]  [<ffffffff8100d470>] do_invalid_op+0x96/0x9f
      [ 3165.149821]  [<ffffffffa000b21d>] ? cfq_dispatch_requests+0x6ba/0x93e
      [cfq_iosched]
      [ 3165.149821]  [<ffffffff81034b4d>] ? enqueue_task+0x5c/0x67
      [ 3165.149821]  [<ffffffff8103ae83>] ? task_rq_unlock+0x11/0x13
      [ 3165.149821]  [<ffffffff81041aae>] ? try_to_wake_up+0x292/0x2a4
      [ 3165.149821]  [<ffffffff8100c935>] invalid_op+0x15/0x20
      [ 3165.149821]  [<ffffffffa000b21d>] ? cfq_dispatch_requests+0x6ba/0x93e
      [cfq_iosched]
      [ 3165.149821]  [<ffffffff810df5a6>] ? virt_to_head_page+0xe/0x2f
      [ 3165.149821]  [<ffffffff811d8c2a>] blk_peek_request+0x191/0x1a7
      [ 3165.149821]  [<ffffffff811e5b8d>] ? kobject_get+0x1a/0x21
      [ 3165.149821]  [<ffffffff812c8d4c>] scsi_request_fn+0x82/0x3df
      [ 3165.149821]  [<ffffffff8110b2de>] ? bio_fs_destructor+0x15/0x17
      [ 3165.149821]  [<ffffffff810df5a6>] ? virt_to_head_page+0xe/0x2f
      [ 3165.149821]  [<ffffffff811d931f>] __blk_run_queue+0x42/0x71
      [ 3165.149821]  [<ffffffff811d9403>] blk_run_queue+0x26/0x3a
      [ 3165.149821]  [<ffffffff812c8761>] scsi_run_queue+0x2de/0x375
      [ 3165.149821]  [<ffffffff812b60ac>] ? put_device+0x17/0x19
      [ 3165.149821]  [<ffffffff812c92d7>] scsi_next_command+0x3b/0x4b
      [ 3165.149821]  [<ffffffff812c9b9f>] scsi_io_completion+0x1c9/0x3f5
      [ 3165.149821]  [<ffffffff812c3c36>] scsi_finish_command+0xb5/0xbe
      
      I think I have hit following BUG_ON() in cfq_dispatch_request().
      
      BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list));
      
      Please find attached the patch to fix it. I have done some stress testing
      with it and have not seen it happening again.
      
      o We should wait on a queue even after slice expiry only if it is empty. If
        queue is not empty then continue to expire it.
      
      o If we decide to keep the queue then make cfqq=NULL. Otherwise select_queue()
        will return a valid cfqq and cfq_dispatch_request() can hit following
        BUG_ON().
      
        BUG_ON(RB_EMPTY_ROOT(&cfqq->sort_list))
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      82bbbf28
  14. 10 12月, 2009 2 次提交
  15. 09 12月, 2009 4 次提交
  16. 08 12月, 2009 1 次提交
  17. 06 12月, 2009 1 次提交
  18. 04 12月, 2009 12 次提交