1. 10 9月, 2010 2 次提交
    • T
      block: drop barrier ordering by queue draining · 28e7d184
      Tejun Heo 提交于
      Filesystems will take all the responsibilities for ordering requests
      around commit writes and will only indicate how the commit writes
      themselves should be handled by block layers.  This patch drops
      barrier ordering by queue draining from block layer.  Ordering by
      draining implementation was somewhat invasive to request handling.
      List of notable changes follow.
      
      * Each queue has 1 bit color which is flipped on each barrier issue.
        This is used to track whether a given request is issued before the
        current barrier or not.  REQ_ORDERED_COLOR flag and coloring
        implementation in __elv_add_request() are removed.
      
      * Requests which shouldn't be processed yet for draining were stalled
        by returning -EAGAIN from blk_do_ordered() according to the test
        result between blk_ordered_req_seq() and blk_blk_ordered_cur_seq().
        This logic is removed.
      
      * Draining completion logic in elv_completed_request() removed.
      
      * All barrier sequence requests were queued to request queue and then
        trckled to lower layer according to progress and thus maintaining
        request orders during requeue was necessary.  This is replaced by
        queueing the next request in the barrier sequence only after the
        current one is complete from blk_ordered_complete_seq(), which
        removes the need for multiple proxy requests in struct request_queue
        and the request sorting logic in the ELEVATOR_INSERT_REQUEUE path of
        elv_insert().
      
      * As barriers no longer have ordering constraints, there's no need to
        dump the whole elevator onto the dispatch queue on each barrier.
        Insert barriers at the front instead.
      
      * If other barrier requests come to the front of the dispatch queue
        while one is already in progress, they are stored in
        q->pending_barriers and restored to dispatch queue one-by-one after
        each barrier completion from blk_ordered_complete_seq().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      28e7d184
    • T
      block: misc cleanups in barrier code · dd831006
      Tejun Heo 提交于
      Make the following cleanups in preparation of barrier/flush update.
      
      * blk_do_ordered() declaration is moved from include/linux/blkdev.h to
        block/blk.h.
      
      * blk_do_ordered() now returns pointer to struct request, with %NULL
        meaning "try the next request" and ERR_PTR(-EAGAIN) "try again
        later".  The third case will be dropped with further changes.
      
      * In the initialization of proxy barrier request, data direction is
        already set by init_request_from_bio().  Drop unnecessary explicit
        REQ_WRITE setting and move init_request_from_bio() above REQ_FUA
        flag setting.
      
      * add_request() is collapsed into __make_request().
      
      These changes don't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      dd831006
  2. 08 8月, 2010 1 次提交
  3. 11 9月, 2009 1 次提交
    • T
      block: implement mixed merge of different failfast requests · 80a761fd
      Tejun Heo 提交于
      Failfast has characteristics from other attributes.  When issuing,
      executing and successuflly completing requests, failfast doesn't make
      any difference.  It only affects how a request is handled on failure.
      Allowing requests with different failfast settings to be merged cause
      normal IOs to fail prematurely while not allowing has performance
      penalties as failfast is used for read aheads which are likely to be
      located near in-flight or to-be-issued normal IOs.
      
      This patch introduces the concept of 'mixed merge'.  A request is a
      mixed merge if it is merge of segments which require different
      handling on failure.  Currently the only mixable attributes are
      failfast ones (or lack thereof).
      
      When a bio with different failfast settings is added to an existing
      request or requests of different failfast settings are merged, the
      merged request is marked mixed.  Each bio carries failfast settings
      and the request always tracks failfast state of the first bio.  When
      the request fails, blk_rq_err_bytes() can be used to determine how
      many bytes can be safely failed without crossing into an area which
      requires further retrials.
      
      This allows request merging regardless of failfast settings while
      keeping the failure handling correct.
      
      This patch only implements mixed merge but doesn't enable it.  The
      next one will update SCSI to make use of mixed merge.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Niel Lambrechts <niel.lambrechts@gmail.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      80a761fd
  4. 27 5月, 2009 1 次提交
    • K
      block: fix no diskstat problem · 3c4198e8
      Kiyoshi Ueda 提交于
      The commit below in 2.6-block/for-2.6.31 causes no diskstat problem
      because the blk_discard_rq() check was added with '&&'.
      It should be 'blk_fs_request() || blk_discard_rq()'.
      This patch does it and fixes the no diskstat problem.
      Please review and apply.
      
      ------ /proc/diskstat without this patch -------------------------------------
         8       0 sda 0 0 0 0 0 0 0 0 0 0 0
      ------------------------------------------------------------------------------
      
      ----- /proc/diskstat with this patch applied ---------------------------------
         8       0 sda 4186 303 373621 61600 9578 3859 107468 169479 2 89755 231059
      ------------------------------------------------------------------------------
      
      --------------------------------------------------------------------------
      commit c69d4854
      Author: Jens Axboe <jens.axboe@oracle.com>
      Date:   Fri Apr 24 08:12:19 2009 +0200
      
          block: include discard requests in IO accounting
      
          We currently don't do merging on discard requests, but we potentially
          could. If we do, then we need to include discard requests in the IO
          accounting, or merging would end up decrementing in_flight IO counters
          for an IO which never incremented them.
      
          So enable accounting for discard requests.
      
      <snip>
      
       static inline int blk_do_io_stat(struct request *rq)
       {
      -       return rq->rq_disk && blk_rq_io_stat(rq) && blk_fs_request(rq);
      +       return rq->rq_disk && blk_rq_io_stat(rq) && blk_fs_request(rq) &&
      +               blk_discard_rq(rq);
       }
      --------------------------------------------------------------------------
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3c4198e8
  5. 19 5月, 2009 1 次提交
  6. 11 5月, 2009 2 次提交
    • T
      block: implement and enforce request peek/start/fetch · 9934c8c0
      Tejun Heo 提交于
      Till now block layer allowed two separate modes of request execution.
      A request is always acquired from the request queue via
      elv_next_request().  After that, drivers are free to either dequeue it
      or process it without dequeueing.  Dequeue allows elv_next_request()
      to return the next request so that multiple requests can be in flight.
      
      Executing requests without dequeueing has its merits mostly in
      allowing drivers for simpler devices which can't do sg to deal with
      segments only without considering request boundary.  However, the
      benefit this brings is dubious and declining while the cost of the API
      ambiguity is increasing.  Segment based drivers are usually for very
      old or limited devices and as converting to dequeueing model isn't
      difficult, it doesn't justify the API overhead it puts on block layer
      and its more modern users.
      
      Previous patches converted all block low level drivers to dequeueing
      model.  This patch completes the API transition by...
      
      * renaming elv_next_request() to blk_peek_request()
      
      * renaming blkdev_dequeue_request() to blk_start_request()
      
      * adding blk_fetch_request() which is combination of peek and start
      
      * disallowing completion of queued (not started) requests
      
      * applying new API to all LLDs
      
      Renamings are for consistency and to break out of tree code so that
      it's apparent that out of tree drivers need updating.
      
      [ Impact: block request issue API cleanup, no functional change ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Mike Miller <mike.miller@hp.com>
      Cc: unsik Kim <donari75@gmail.com>
      Cc: Paul Clements <paul.clements@steeleye.com>
      Cc: Tim Waugh <tim@cyberelk.net>
      Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Laurent Vivier <Laurent@lvivier.info>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Borislav Petkov <petkovbb@googlemail.com>
      Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Alex Dubov <oakad@yahoo.com>
      Cc: Pierre Ossman <drzeus@drzeus.cx>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
      Cc: Stefan Weinhuber <wein@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Pete Zaitcev <zaitcev@redhat.com>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      9934c8c0
    • T
      block: drop request->hard_* and *nr_sectors · 2e46e8b2
      Tejun Heo 提交于
      struct request has had a few different ways to represent some
      properties of a request.  ->hard_* represent block layer's view of the
      request progress (completion cursor) and the ones without the prefix
      are supposed to represent the issue cursor and allowed to be updated
      as necessary by the low level drivers.  The thing is that as block
      layer supports partial completion, the two cursors really aren't
      necessary and only cause confusion.  In addition, manual management of
      request detail from low level drivers is cumbersome and error-prone at
      the very least.
      
      Another interesting duplicate fields are rq->[hard_]nr_sectors and
      rq->{hard_cur|current}_nr_sectors against rq->data_len and
      rq->bio->bi_size.  This is more convoluted than the hard_ case.
      
      rq->[hard_]nr_sectors are initialized for requests with bio but
      blk_rq_bytes() uses it only for !pc requests.  rq->data_len is
      initialized for all request but blk_rq_bytes() uses it only for pc
      requests.  This causes good amount of confusion throughout block layer
      and its drivers and determining the request length has been a bit of
      black magic which may or may not work depending on circumstances and
      what the specific LLD is actually doing.
      
      rq->{hard_cur|current}_nr_sectors represent the number of sectors in
      the contiguous data area at the front.  This is mainly used by drivers
      which transfers data by walking request segment-by-segment.  This
      value always equals rq->bio->bi_size >> 9.  However, data length for
      pc requests may not be multiple of 512 bytes and using this field
      becomes a bit confusing.
      
      In general, having multiple fields to represent the same property
      leads only to confusion and subtle bugs.  With recent block low level
      driver cleanups, no driver is accessing or manipulating these
      duplicate fields directly.  Drop all the duplicates.  Now rq->sector
      means the current sector, rq->data_len the current total length and
      rq->bio->bi_size the current segment length.  Everything else is
      defined in terms of these three and available only through accessors.
      
      * blk_recalc_rq_sectors() is collapsed into blk_update_request() and
        now handles pc and fs requests equally other than rq->sector update.
        This means that now pc requests can use partial completion too (no
        in-kernel user yet tho).
      
      * bio_cur_sectors() is replaced with bio_cur_bytes() as block layer
        now uses byte count as the primary data length.
      
      * blk_rq_pos() is now guranteed to be always correct.  In-block users
        converted.
      
      * blk_rq_bytes() is now guaranteed to be always valid as is
        blk_rq_sectors().  In-block users converted.
      
      * blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9.
        More convenient one is used.
      
      * blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const
        pointer to request.
      
      [ Impact: API cleanup, single way to represent one property of a request ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      2e46e8b2
  7. 28 4月, 2009 3 次提交
  8. 24 4月, 2009 1 次提交
  9. 15 4月, 2009 1 次提交
  10. 07 4月, 2009 2 次提交
  11. 13 3月, 2009 1 次提交
  12. 02 2月, 2009 1 次提交
  13. 26 12月, 2008 1 次提交
  14. 17 10月, 2008 1 次提交
  15. 09 10月, 2008 3 次提交
    • J
      block: add fault injection mechanism for faking request timeouts · 581d4e28
      Jens Axboe 提交于
      Only works for the generic request timer handling. Allows one to
      sporadically ignore request completions, thus exercising the timeout
      handling.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      581d4e28
    • J
      block: unify request timeout handling · 242f9dcb
      Jens Axboe 提交于
      Right now SCSI and others do their own command timeout handling.
      Move those bits to the block layer.
      
      Instead of having a timer per command, we try to be a bit more clever
      and simply have one per-queue. This avoids the overhead of having to
      tear down and setup a timer for each command, so it will result in a lot
      less timer fiddling.
      Signed-off-by: NMike Anderson <andmike@linux.vnet.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      242f9dcb
    • J
      block: add support for IO CPU affinity · c7c22e4d
      Jens Axboe 提交于
      This patch adds support for controlling the IO completion CPU of
      either all requests on a queue, or on a per-request basis. We export
      a sysfs variable (rq_affinity) which, if set, migrates completions
      of requests to the CPU that originally submitted it. A bio helper
      (bio_set_completion_cpu()) is also added, so that queuers can ask
      for completion on that specific CPU.
      
      In testing, this has been show to cut the system time by as much
      as 20-40% on synthetic workloads where CPU affinity is desired.
      
      This requires a little help from the architecture, so it'll only
      work as designed for archs that are using the new generic smp
      helper infrastructure.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c7c22e4d
  16. 03 7月, 2008 1 次提交
  17. 29 4月, 2008 1 次提交
  18. 04 3月, 2008 1 次提交
  19. 30 1月, 2008 3 次提交