1. 30 4月, 2007 4 次提交
    • J
      cfq-iosched: rework the whole round-robin list concept · d9e7620e
      Jens Axboe 提交于
      Drawing on some inspiration from the CFS CPU scheduler design, overhaul
      the pending cfq_queue concept list management. Currently CFQ uses a
      doubly linked list per priority level for sorting and service uses.
      Kill those lists and maintain an rbtree of cfq_queue's, sorted by when
      to service them.
      
      This unfortunately means that the ionice levels aren't as strong
      anymore, will work on improving those later. We only scale the slice
      time now, not the number of times we service. This means that latency
      is better (for all priority levels), but that the distinction between
      the highest and lower levels aren't as big.
      
      The diffstat speaks for itself.
      
       cfq-iosched.c |  363 +++++++++++++++++---------------------------------
       1 file changed, 125 insertions(+), 238 deletions(-)
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      d9e7620e
    • J
      cfq-iosched: minor updates · 1afba045
      Jens Axboe 提交于
      - Move the queue_new flag clear to when the queue is selected
      - Only select the non-first queue in cfq_get_best_queue(), if there's
        a substantial difference between the best and first.
      - Get rid of ->busy_rr
      - Only select a close cooperator, if the current queue is known to take
        a while to "think".
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1afba045
    • J
      cfq-iosched: development update · 6d048f53
      Jens Axboe 提交于
      - Implement logic for detecting cooperating processes, so we
        choose the best available queue whenever possible.
      
      - Improve residual slice time accounting.
      
      - Remove dead code: we no longer see async requests coming in on
        sync queues. That part was removed a long time ago. That means
        that we can also remove the difference between cfq_cfqq_sync()
        and cfq_cfqq_class_sync(), they are now indentical. And we can
        kill the on_dispatch array, just make it a counter.
      
      - Allow a process to go into the current list, if it hasn't been
        serviced in this scheduler tick yet.
      
      Possible future improvements including caching the cfqq lookup
      in cfq_close_cooperator(), so we don't have to look it up twice.
      cfq_get_best_queue() should just use that last decision instead
      of doing it again.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      6d048f53
    • J
      cfq-iosched: improve preemption for cooperating tasks · 1e3335de
      Jens Axboe 提交于
      When testing the syslet async io approach, I discovered that CFQ
      sometimes didn't perform as well as expected. cfq_should_preempt()
      needs to better check for cooperating tasks, so fix that by allowing
      preemption of an equal priority queue if the recently queued request
      is as good a candidate for IO as the one we are currently waiting for.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1e3335de
  2. 25 4月, 2007 1 次提交
    • J
      cfq-iosched: fix alias + front merge bug · 5044eed4
      Jens Axboe 提交于
      There's a really rare and obscure bug in CFQ, that causes a crash in
      cfq_dispatch_insert() due to rq == NULL.  One example of the resulting
      oops is seen here:
      
      	http://lkml.org/lkml/2007/4/15/41
      
      Neil correctly diagnosed the situation for how this can happen: if two
      concurrent requests with the exact same sector number (due to direct IO
      or aliasing between MD and the raw device access), the alias handling
      will add the request to the sortlist, but next_rq remains NULL.
      
      Read the more complete analysis at:
      
      	http://lkml.org/lkml/2007/4/25/57
      
      This looks like it requires md to trigger, even though it should
      potentially be possible to due with O_DIRECT (at least if you edit the
      kernel and doctor some of the unplug calls).
      
      The fix is to move the ->next_rq update to when we add a request to the
      rbtree. Then we remove the possibility for a request to exist in the
      rbtree code, but not have ->next_rq correctly updated.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5044eed4
  3. 21 4月, 2007 1 次提交
    • J
      cfq-iosched: fix sequential write regression · a9938006
      Jens Axboe 提交于
      We have a 10-15% performance regression for sequential writes on TCQ/NCQ
      enabled drives in 2.6.21-rcX after the CFQ update went in.  It has been
      reported by Valerie Clement <valerie.clement@bull.net> and the Intel
      testing folks.  The regression is because of CFQ's now more aggressive
      queue control, limiting the depth available to the device.
      
      This patches fixes that regression by allowing a greater depth when only
      one queue is busy.  It has been tested to not impact sync-vs-async
      workloads too much - we still do a lot better than 2.6.20.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9938006
  4. 12 2月, 2007 11 次提交
  5. 03 1月, 2007 1 次提交
    • J
      [PATCH] cfq-iosched: merging problem · ec8acb69
      Jens Axboe 提交于
      Two issues:
      
      - The final return 1 should be a return 0, otherwise comparing cfqq is
        a noop.
      
      - bio_sync() only checks the sync flag, while rq_is_sync() checks both
        for READ and sync. The latter is what we want. Expand the bio check
        to include reads, and relax the restriction to allow merging of async
        io into sync requests.
      
      In the future we want to clean up the SYNC logic, right now it means
      both sync request (such as READ and O_DIRECT WRITE) and unplug-on-issue.
      Leave that for later.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ec8acb69
  6. 23 12月, 2006 1 次提交
  7. 20 12月, 2006 1 次提交
    • J
      [PATCH] cfq-iosched: don't allow sync merges across queues · da775265
      Jens Axboe 提交于
      Currently we allow any merge, even if the io originates from different
      processes. This can cause really bad starvation and unfairness, if those
      ios happen to be synchronous (reads or direct writes).
      
      So add a allow_merge hook to the io scheduler ops, so an io scheduler can
      help decide whether a bio/process combination may be merged with an
      existing request.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      da775265
  8. 13 12月, 2006 1 次提交
  9. 08 12月, 2006 1 次提交
  10. 01 12月, 2006 1 次提交
  11. 22 11月, 2006 1 次提交
    • D
      WorkStruct: Pass the work_struct pointer instead of context data · 65f27f38
      David Howells 提交于
      Pass the work_struct pointer to the work function rather than context data.
      The work function can use container_of() to work out the data.
      
      For the cases where the container of the work_struct may go away the moment the
      pending bit is cleared, it is made possible to defer the release of the
      structure by deferring the clearing of the pending bit.
      
      To make this work, an extra flag is introduced into the management side of the
      work_struct.  This governs auto-release of the structure upon execution.
      
      Ordinarily, the work queue executor would release the work_struct for further
      scheduling or deallocation by clearing the pending bit prior to jumping to the
      work function.  This means that, unless the driver makes some guarantee itself
      that the work_struct won't go away, the work function may not access anything
      else in the work_struct or its container lest they be deallocated..  This is a
      problem if the auxiliary data is taken away (as done by the last patch).
      
      However, if the pending bit is *not* cleared before jumping to the work
      function, then the work function *may* access the work_struct and its container
      with no problems.  But then the work function must itself release the
      work_struct by calling work_release().
      
      In most cases, automatic release is fine, so this is the default.  Special
      initiators exist for the non-auto-release case (ending in _NAR).
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      65f27f38
  12. 01 11月, 2006 1 次提交
  13. 31 10月, 2006 2 次提交
  14. 01 10月, 2006 13 次提交