1. 06 4月, 2010 1 次提交
    • M
      laptop-mode: Make flushes per-device · 31373d09
      Matthew Garrett 提交于
      One of the features of laptop-mode is that it forces a writeout of dirty
      pages if something else triggers a physical read or write from a device.
      The current implementation flushes pages on all devices, rather than only
      the one that triggered the flush. This patch alters the behaviour so that
      only the recently accessed block device is flushed, preventing other
      disks being spun up for no terribly good reason.
      Signed-off-by: NMatthew Garrett <mjg@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      31373d09
  2. 02 4月, 2010 4 次提交
  3. 25 3月, 2010 2 次提交
    • D
      cfq-iosched: Do not merge queues of BE and IDLE classes · 39c01b21
      Divyesh Shah 提交于
      Even if they are found to be co-operating.
      
      The prio_trees do not have any IDLE cfqqs on them. cfq_close_cooperator()
      is called from cfq_select_queue() and cfq_completed_request(). The latter
      ensures that the close cooperator code does not get invoked if the current
      cfqq is of class IDLE but the former doesn't seem to have any such checks.
      So an IDLE cfqq may get merged with a BE cfqq from the same group which
      should be avoided.
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      39c01b21
    • D
      cfq-iosched: Add additional blktrace log messages in CFQ for easier debugging · b1ffe737
      Divyesh Shah 提交于
      These have helped us debug some issues we've noticed in earlier IO
      controller versions and should be useful now as well. The extra logging
      covers:
      - idling behavior. Since there are so many conditions based on which we decide
      to idle or not, this patch adds a log message for some conditions that we've
      found useful.
      - workload slices and current prio and workload type
      
      Changelog from v1:
      o moved log message from cfq_set_active_queue() to __cfq_set_active_queue()
      o changed queue_count to st->count
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b1ffe737
  4. 19 3月, 2010 1 次提交
  5. 16 3月, 2010 1 次提交
  6. 15 3月, 2010 2 次提交
  7. 13 3月, 2010 1 次提交
    • B
      cgroups: blkio subsystem as module · 67523c48
      Ben Blum 提交于
      Modify the Block I/O cgroup subsystem to be able to be built as a module.
      As the CFQ disk scheduler optionally depends on blk-cgroup, config options
      in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
      enhanced to support the new module dependency.
      Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      67523c48
  8. 08 3月, 2010 1 次提交
  9. 01 3月, 2010 6 次提交
  10. 26 2月, 2010 6 次提交
  11. 23 2月, 2010 2 次提交
  12. 22 2月, 2010 1 次提交
  13. 09 2月, 2010 1 次提交
  14. 05 2月, 2010 1 次提交
  15. 03 2月, 2010 1 次提交
    • V
      cfq-iosched: Do not idle on async queues · 1efe8fe1
      Vivek Goyal 提交于
      Few weeks back, Shaohua Li had posted similar patch. I am reposting it
      with more test results.
      
      This patch does two things.
      
      - Do not idle on async queues.
      
      - It also changes the write queue depth CFQ drives (cfq_may_dispatch()).
        Currently, we seem to driving queue depth of 1 always for WRITES. This is
        true even if there is only one write queue in the system and all the logic
        of infinite queue depth in case of single busy queue as well as slowly
        increasing queue depth based on last delayed sync request does not seem to
        be kicking in at all.
      
      This patch will allow deeper WRITE queue depths (subjected to the other
      WRITE queue depth contstraints like cfq_quantum and last delayed sync
      request).
      
      Shaohua Li had reported getting more out of his SSD. For me, I have got
      one Lun exported from an HP EVA and when pure buffered writes are on, I
      can get more out of the system. Following are test results of pure
      buffered writes (with end_fsync=1) with vanilla and patched kernel. These
      results are average of 3 sets of run with increasing number of threads.
      
      AVERAGE[bufwfs][vanilla]
      -------
      job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
      ---       --- --  ------------   -----------    -------------  -----------
      bufwfs    3   1   0              0              95349          474141
      bufwfs    3   2   0              0              100282         806926
      bufwfs    3   4   0              0              109989         2.7301e+06
      bufwfs    3   8   0              0              116642         3762231
      bufwfs    3   16  0              0              118230         6902970
      
      AVERAGE[bufwfs] [patched kernel]
      -------
      bufwfs    3   1   0              0              270722         404352
      bufwfs    3   2   0              0              206770         1.06552e+06
      bufwfs    3   4   0              0              195277         1.62283e+06
      bufwfs    3   8   0              0              260960         2.62979e+06
      bufwfs    3   16  0              0              299260         1.70731e+06
      
      I also ran buffered writes along with some sequential reads and some
      buffered reads going on in the system on a SATA disk because the potential
      risk could be that we should not be driving queue depth higher in presence
      of sync IO going to keep the max clat low.
      
      With some random and sequential reads going on in the system on one SATA
      disk I did not see any significant increase in max clat. So it looks like
      other WRITE queue depth control logic is doing its job. Here are the
      results.
      
      AVERAGE[brr, bsr, bufw together] [vanilla]
      -------
      job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
      ---       --- --  ------------   -----------    -------------  -----------
      brr       3   1   850            546345         0              0
      bsr       3   1   14650          729543         0              0
      bufw      3   1   0              0              23908          8274517
      
      brr       3   2   981.333        579395         0              0
      bsr       3   2   14149.7        1175689        0              0
      bufw      3   2   0              0              21921          1.28108e+07
      
      brr       3   4   898.333        1.75527e+06    0              0
      bsr       3   4   12230.7        1.40072e+06    0              0
      bufw      3   4   0              0              19722.3        2.4901e+07
      
      brr       3   8   900            3160594        0              0
      bsr       3   8   9282.33        1.91314e+06    0              0
      bufw      3   8   0              0              18789.3        23890622
      
      AVERAGE[brr, bsr, bufw mixed] [patched kernel]
      -------
      job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
      ---       --- --  ------------   -----------    -------------  -----------
      brr       3   1   837            417973         0              0
      bsr       3   1   14357.7        591275         0              0
      bufw      3   1   0              0              24869.7        8910662
      
      brr       3   2   1038.33        543434         0              0
      bsr       3   2   13351.3        1205858        0              0
      bufw      3   2   0              0              18626.3        13280370
      
      brr       3   4   913            1.86861e+06    0              0
      bsr       3   4   12652.3        1430974        0              0
      bufw      3   4   0              0              15343.3        2.81305e+07
      
      brr       3   8   890            2.92695e+06    0              0
      bsr       3   8   9635.33        1.90244e+06    0              0
      bufw      3   8   0              0              17200.3        24424392
      
      So looks like it might make sense to include this patch.
      
      Thanks
      Vivek
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1efe8fe1
  16. 01 2月, 2010 1 次提交
    • G
      blk-cgroup: Fix potential deadlock in blk-cgroup · bcf4dd43
      Gui Jianfeng 提交于
      I triggered a lockdep warning as following.
      
      =======================================================
      [ INFO: possible circular locking dependency detected ]
      2.6.33-rc2 #1
      -------------------------------------------------------
      test_io_control/7357 is trying to acquire lock:
       (blkio_list_lock){+.+...}, at: [<c053a990>] blkiocg_weight_write+0x82/0x9e
      
      but task is already holding lock:
       (&(&blkcg->lock)->rlock){......}, at: [<c053a949>] blkiocg_weight_write+0x3b/0x9e
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (&(&blkcg->lock)->rlock){......}:
             [<c04583b7>] validate_chain+0x8bc/0xb9c
             [<c0458dba>] __lock_acquire+0x723/0x789
             [<c0458eb0>] lock_acquire+0x90/0xa7
             [<c0692b0a>] _raw_spin_lock_irqsave+0x27/0x5a
             [<c053a4e1>] blkiocg_add_blkio_group+0x1a/0x6d
             [<c053cac7>] cfq_get_queue+0x225/0x3de
             [<c053eec2>] cfq_set_request+0x217/0x42d
             [<c052c8a6>] elv_set_request+0x17/0x26
             [<c0532a0f>] get_request+0x203/0x2c5
             [<c0532ae9>] get_request_wait+0x18/0x10e
             [<c0533470>] __make_request+0x2ba/0x375
             [<c0531985>] generic_make_request+0x28d/0x30f
             [<c0532da7>] submit_bio+0x8a/0x8f
             [<c04d827a>] submit_bh+0xf0/0x10f
             [<c04d91d2>] ll_rw_block+0xc0/0xf9
             [<f86e9705>] ext3_find_entry+0x319/0x544 [ext3]
             [<f86eae58>] ext3_lookup+0x2c/0xb9 [ext3]
             [<c04c3e1b>] do_lookup+0xd3/0x172
             [<c04c56c8>] link_path_walk+0x5fb/0x95c
             [<c04c5a65>] path_walk+0x3c/0x81
             [<c04c5b63>] do_path_lookup+0x21/0x8a
             [<c04c66cc>] do_filp_open+0xf0/0x978
             [<c04c0c7e>] open_exec+0x1b/0xb7
             [<c04c1436>] do_execve+0xbb/0x266
             [<c04081a9>] sys_execve+0x24/0x4a
             [<c04028a2>] ptregs_execve+0x12/0x18
      
      -> #1 (&(&q->__queue_lock)->rlock){..-.-.}:
             [<c04583b7>] validate_chain+0x8bc/0xb9c
             [<c0458dba>] __lock_acquire+0x723/0x789
             [<c0458eb0>] lock_acquire+0x90/0xa7
             [<c0692b0a>] _raw_spin_lock_irqsave+0x27/0x5a
             [<c053dd2a>] cfq_unlink_blkio_group+0x17/0x41
             [<c053a6eb>] blkiocg_destroy+0x72/0xc7
             [<c0467df0>] cgroup_diput+0x4a/0xb2
             [<c04ca473>] dentry_iput+0x93/0xb7
             [<c04ca4b3>] d_kill+0x1c/0x36
             [<c04cb5c5>] dput+0xf5/0xfe
             [<c04c6084>] do_rmdir+0x95/0xbe
             [<c04c60ec>] sys_rmdir+0x10/0x12
             [<c04027cc>] sysenter_do_call+0x12/0x32
      
      -> #0 (blkio_list_lock){+.+...}:
             [<c0458117>] validate_chain+0x61c/0xb9c
             [<c0458dba>] __lock_acquire+0x723/0x789
             [<c0458eb0>] lock_acquire+0x90/0xa7
             [<c06929fd>] _raw_spin_lock+0x1e/0x4e
             [<c053a990>] blkiocg_weight_write+0x82/0x9e
             [<c0467f1e>] cgroup_file_write+0xc6/0x1c0
             [<c04bd2f3>] vfs_write+0x8c/0x116
             [<c04bd7c6>] sys_write+0x3b/0x60
             [<c04027cc>] sysenter_do_call+0x12/0x32
      
      other info that might help us debug this:
      
      1 lock held by test_io_control/7357:
       #0:  (&(&blkcg->lock)->rlock){......}, at: [<c053a949>] blkiocg_weight_write+0x3b/0x9e
      stack backtrace:
      Pid: 7357, comm: test_io_control Not tainted 2.6.33-rc2 #1
      Call Trace:
       [<c045754f>] print_circular_bug+0x91/0x9d
       [<c0458117>] validate_chain+0x61c/0xb9c
       [<c0458dba>] __lock_acquire+0x723/0x789
       [<c0458eb0>] lock_acquire+0x90/0xa7
       [<c053a990>] ? blkiocg_weight_write+0x82/0x9e
       [<c06929fd>] _raw_spin_lock+0x1e/0x4e
       [<c053a990>] ? blkiocg_weight_write+0x82/0x9e
       [<c053a990>] blkiocg_weight_write+0x82/0x9e
       [<c0467f1e>] cgroup_file_write+0xc6/0x1c0
       [<c0454df5>] ? trace_hardirqs_off+0xb/0xd
       [<c044d93a>] ? cpu_clock+0x2e/0x44
       [<c050e6ec>] ? security_file_permission+0xf/0x11
       [<c04bcdda>] ? rw_verify_area+0x8a/0xad
       [<c0467e58>] ? cgroup_file_write+0x0/0x1c0
       [<c04bd2f3>] vfs_write+0x8c/0x116
       [<c04bd7c6>] sys_write+0x3b/0x60
       [<c04027cc>] sysenter_do_call+0x12/0x32
      
      To prevent deadlock, we should take locks as following sequence:
      
      blkio_list_lock -> queue_lock ->  blkcg_lock.
      
      The following patch should fix this bug.
      Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      bcf4dd43
  17. 29 1月, 2010 1 次提交
    • A
      block: Added in stricter no merge semantics for block I/O · 488991e2
      Alan D. Brunelle 提交于
      Updated 'nomerges' tunable to accept a value of '2' - indicating that _no_
      merges at all are to be attempted (not even the simple one-hit cache).
      
      The following table illustrates the additional benefit - 5 minute runs of
      a random I/O load were applied to a dozen devices on a 16-way x86_64 system.
      
      nomerges        Throughput      %System         Improvement (tput / %sys)
      --------        ------------    -----------     -------------------------
      0               12.45 MB/sec    0.669365609
      1               12.50 MB/sec    0.641519199     0.40% / 2.71%
      2               12.52 MB/sec    0.639849750     0.56% / 2.96%
      Signed-off-by: NAlan D. Brunelle <alan.brunelle@hp.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      488991e2
  18. 11 1月, 2010 6 次提交
  19. 29 12月, 2009 1 次提交