1. 11 9月, 2009 5 次提交
  2. 11 7月, 2009 1 次提交
    • V
      cfq-iosched: reset oom_cfqq in cfq_set_request() · 32f2e807
      Vivek Goyal 提交于
      In case memory is scarce, we now default to oom_cfqq. Once memory is
      available again, we should allocate a new cfqq and stop using oom_cfqq for
      a particular io context.
      
      Once a new request comes in, check if we are using oom_cfqq, and if yes,
      try to allocate a new cfqq.
      
      Tested the patch by forcing the use of oom_cfqq and upon next request thread
      realized that it was using oom_cfqq and it allocated a new cfqq.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      32f2e807
  3. 01 7月, 2009 3 次提交
  4. 16 6月, 2009 2 次提交
  5. 11 6月, 2009 1 次提交
  6. 11 5月, 2009 3 次提交
    • T
      block: drop request->hard_* and *nr_sectors · 2e46e8b2
      Tejun Heo 提交于
      struct request has had a few different ways to represent some
      properties of a request.  ->hard_* represent block layer's view of the
      request progress (completion cursor) and the ones without the prefix
      are supposed to represent the issue cursor and allowed to be updated
      as necessary by the low level drivers.  The thing is that as block
      layer supports partial completion, the two cursors really aren't
      necessary and only cause confusion.  In addition, manual management of
      request detail from low level drivers is cumbersome and error-prone at
      the very least.
      
      Another interesting duplicate fields are rq->[hard_]nr_sectors and
      rq->{hard_cur|current}_nr_sectors against rq->data_len and
      rq->bio->bi_size.  This is more convoluted than the hard_ case.
      
      rq->[hard_]nr_sectors are initialized for requests with bio but
      blk_rq_bytes() uses it only for !pc requests.  rq->data_len is
      initialized for all request but blk_rq_bytes() uses it only for pc
      requests.  This causes good amount of confusion throughout block layer
      and its drivers and determining the request length has been a bit of
      black magic which may or may not work depending on circumstances and
      what the specific LLD is actually doing.
      
      rq->{hard_cur|current}_nr_sectors represent the number of sectors in
      the contiguous data area at the front.  This is mainly used by drivers
      which transfers data by walking request segment-by-segment.  This
      value always equals rq->bio->bi_size >> 9.  However, data length for
      pc requests may not be multiple of 512 bytes and using this field
      becomes a bit confusing.
      
      In general, having multiple fields to represent the same property
      leads only to confusion and subtle bugs.  With recent block low level
      driver cleanups, no driver is accessing or manipulating these
      duplicate fields directly.  Drop all the duplicates.  Now rq->sector
      means the current sector, rq->data_len the current total length and
      rq->bio->bi_size the current segment length.  Everything else is
      defined in terms of these three and available only through accessors.
      
      * blk_recalc_rq_sectors() is collapsed into blk_update_request() and
        now handles pc and fs requests equally other than rq->sector update.
        This means that now pc requests can use partial completion too (no
        in-kernel user yet tho).
      
      * bio_cur_sectors() is replaced with bio_cur_bytes() as block layer
        now uses byte count as the primary data length.
      
      * blk_rq_pos() is now guranteed to be always correct.  In-block users
        converted.
      
      * blk_rq_bytes() is now guaranteed to be always valid as is
        blk_rq_sectors().  In-block users converted.
      
      * blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9.
        More convenient one is used.
      
      * blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const
        pointer to request.
      
      [ Impact: API cleanup, single way to represent one property of a request ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      2e46e8b2
    • T
      block: convert to pos and nr_sectors accessors · 83096ebf
      Tejun Heo 提交于
      With recent cleanups, there is no place where low level driver
      directly manipulates request fields.  This means that the 'hard'
      request fields always equal the !hard fields.  Convert all
      rq->sectors, nr_sectors and current_nr_sectors references to
      accessors.
      
      While at it, drop superflous blk_rq_pos() < 0 test in swim.c.
      
      [ Impact: use pos and nr_sectors accessors ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Tested-by: NGrant Likely <grant.likely@secretlab.ca>
      Acked-by: NGrant Likely <grant.likely@secretlab.ca>
      Tested-by: NAdrian McMenamin <adrian@mcmen.demon.co.uk>
      Acked-by: NAdrian McMenamin <adrian@mcmen.demon.co.uk>
      Acked-by: NMike Miller <mike.miller@hp.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Borislav Petkov <petkovbb@googlemail.com>
      Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Eric Moore <Eric.Moore@lsi.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Pete Zaitcev <zaitcev@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Paul Clements <paul.clements@steeleye.com>
      Cc: Tim Waugh <tim@cyberelk.net>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Alex Dubov <oakad@yahoo.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Dario Ballabio <ballabio_dario@emc.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: unsik Kim <donari75@gmail.com>
      Cc: Laurent Vivier <Laurent@lvivier.info>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      83096ebf
    • T
      block: implement blk_rq_pos/[cur_]sectors() and convert obvious ones · 5b93629b
      Tejun Heo 提交于
      Implement accessors - blk_rq_pos(), blk_rq_sectors() and
      blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors
      and rq->hard_cur_sectors respectively and convert direct references of
      the said fields to the accessors.
      
      This is in preparation of request data length handling cleanup.
      
      Geert	: suggested adding const to struct request * parameter to accessors
      Sergei	: spotted error in patch description
      
      [ Impact: cleanup ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Acked-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Tested-by: NGrant Likely <grant.likely@secretlab.ca>
      Acked-by: NGrant Likely <grant.likely@secretlab.ca>
      Ackec-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Borislav Petkov <petkovbb@googlemail.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      5b93629b
  7. 28 4月, 2009 1 次提交
    • T
      block: kill blk_start_queueing() · a7f55792
      Tejun Heo 提交于
      blk_start_queueing() is identical to __blk_run_queue() except that it
      doesn't check for recursion.  None of the current users depends on
      blk_start_queueing() running request_fn directly.  Replace usages of
      blk_start_queueing() with [__]blk_run_queue() and kill it.
      
      [ Impact: removal of mostly duplicate interface function ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a7f55792
  8. 24 4月, 2009 3 次提交
    • J
      cfq-iosched: cache prio_tree root in cfqq->p_root · f2d1f0ae
      Jens Axboe 提交于
      Currently we look it up from ->ioprio, but ->ioprio can change if
      either the process gets its IO priority changed explicitly, or if
      cfq decides to temporarily boost it. So if we are unlucky, we can
      end up attempting to remove a node from a different rbtree root than
      where it was added.
      
      Fix this by using ->org_ioprio as the prio_tree index, since that
      will only change for explicit IO priority settings (not for a boost).
      Additionally cache the rbtree root inside the cfqq, then we don't have
      to add code to reinsert the cfqq in the prio_tree if IO priority changes.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      f2d1f0ae
    • J
      cfq-iosched: fix bug with aliased request and cooperation detection · 3ac6c9f8
      Jens Axboe 提交于
      cfq_prio_tree_lookup() should return the direct match, yet it always
      returns zero. Fix that.
      
      cfq_prio_tree_add() assumes that we don't get a direct match, while
      it is very possible that we do. Using O_DIRECT, you can have different
      cfqq with matching requests, since you don't have the page cache
      to serialize things for you. Fix this bug by only adding the cfqq if
      there isn't an existing match.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3ac6c9f8
    • J
      cfq-iosched: clear ->prio_trees[] on cfqd alloc · 26a2ac00
      Jens Axboe 提交于
      Not strictly needed, but we should make it clear that we init the
      rbtree roots here.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      26a2ac00
  9. 22 4月, 2009 2 次提交
  10. 15 4月, 2009 7 次提交
  11. 07 4月, 2009 3 次提交
    • J
      cfq-iosched: don't let idling interfere with plugging · b029195d
      Jens Axboe 提交于
      When CFQ is waiting for a new request from a process, currently it'll
      immediately restart queuing when it sees such a request. This doesn't
      work very well with streamed IO, since we then end up splitting IO
      that would otherwise have been merged nicely. For a simple dd test,
      this causes 10x as many requests to be issued as we should have.
      Normally this goes unnoticed due to the low overhead of requests
      at the device side, but some hardware is very sensitive to request
      sizes and there it can cause big slow downs.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b029195d
    • J
      cfq-iosched: kill two unused cfqq flags · 75e50984
      Jens Axboe 提交于
      We only manipulate the must_dispatch and queue_new flags, they are not
      tested anymore. So get rid of them.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      75e50984
    • J
      cfq-iosched: change dispatch logic to deal with single requests at the time · 2f5cb738
      Jens Axboe 提交于
      The IO scheduler core calls into the IO scheduler dispatch_request hook
      to move requests from the IO scheduler and into the driver dispatch
      list. It only does so when the dispatch list is empty. CFQ moves several
      requests to the dispatch list, which can cause higher latencies if we
      suddenly have to switch to some important sync IO. Change the logic to
      move one request at the time instead.
      
      This should almost be functionally equivalent to what we did before,
      except that we now honor 'quantum' as the maximum queue depth at the
      device side from any single cfqq. If there's just a single active
      cfqq, we allow up to 4 times the normal quantum.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      2f5cb738
  12. 06 4月, 2009 1 次提交
  13. 30 1月, 2009 1 次提交
    • D
      cfq-iosched: Allow RT requests to pre-empt ongoing BE timeslice · 3a9a3f6c
      Divyesh Shah 提交于
      This patch adds the ability to pre-empt an ongoing BE timeslice when a RT
      request is waiting for the current timeslice to complete. This reduces the
      wait time to disk for RT requests from an upper bound of 4 (current value
      of cfq_quantum) to 1 disk request.
      
      Applied Jens' suggeested changes to avoid the rb lookup and use !cfq_class_rt()
      and retested.
      
      Latency(secs) for the RT task when doing sequential reads from 10G file.
                             | only RT | RT + BE | RT + BE + this patch
      small (512 byte) reads | 143     | 163     | 145
      large (1Mb) reads      | 142     | 158     | 146
      Signed-off-by: NDivyesh Shah <dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3a9a3f6c
  14. 29 12月, 2008 4 次提交
  15. 09 10月, 2008 3 次提交