1. 10 9月, 2010 6 次提交
    • T
      block: rename barrier/ordered to flush · dd4c133f
      Tejun Heo 提交于
      With ordering requirements dropped, barrier and ordered are misnomers.
      Now all block layer does is sequencing FLUSH and FUA.  Rename them to
      flush.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      dd4c133f
    • T
      block: rename blk-barrier.c to blk-flush.c · 8839a0e0
      Tejun Heo 提交于
      Without ordering requirements, barrier and ordering are minomers.
      Rename block/blk-barrier.c to block/blk-flush.c.  Rename of symbols
      will follow.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      8839a0e0
    • T
      block: drop barrier ordering by queue draining · 28e7d184
      Tejun Heo 提交于
      Filesystems will take all the responsibilities for ordering requests
      around commit writes and will only indicate how the commit writes
      themselves should be handled by block layers.  This patch drops
      barrier ordering by queue draining from block layer.  Ordering by
      draining implementation was somewhat invasive to request handling.
      List of notable changes follow.
      
      * Each queue has 1 bit color which is flipped on each barrier issue.
        This is used to track whether a given request is issued before the
        current barrier or not.  REQ_ORDERED_COLOR flag and coloring
        implementation in __elv_add_request() are removed.
      
      * Requests which shouldn't be processed yet for draining were stalled
        by returning -EAGAIN from blk_do_ordered() according to the test
        result between blk_ordered_req_seq() and blk_blk_ordered_cur_seq().
        This logic is removed.
      
      * Draining completion logic in elv_completed_request() removed.
      
      * All barrier sequence requests were queued to request queue and then
        trckled to lower layer according to progress and thus maintaining
        request orders during requeue was necessary.  This is replaced by
        queueing the next request in the barrier sequence only after the
        current one is complete from blk_ordered_complete_seq(), which
        removes the need for multiple proxy requests in struct request_queue
        and the request sorting logic in the ELEVATOR_INSERT_REQUEUE path of
        elv_insert().
      
      * As barriers no longer have ordering constraints, there's no need to
        dump the whole elevator onto the dispatch queue on each barrier.
        Insert barriers at the front instead.
      
      * If other barrier requests come to the front of the dispatch queue
        while one is already in progress, they are stored in
        q->pending_barriers and restored to dispatch queue one-by-one after
        each barrier completion from blk_ordered_complete_seq().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      28e7d184
    • T
      block: misc cleanups in barrier code · dd831006
      Tejun Heo 提交于
      Make the following cleanups in preparation of barrier/flush update.
      
      * blk_do_ordered() declaration is moved from include/linux/blkdev.h to
        block/blk.h.
      
      * blk_do_ordered() now returns pointer to struct request, with %NULL
        meaning "try the next request" and ERR_PTR(-EAGAIN) "try again
        later".  The third case will be dropped with further changes.
      
      * In the initialization of proxy barrier request, data direction is
        already set by init_request_from_bio().  Drop unnecessary explicit
        REQ_WRITE setting and move init_request_from_bio() above REQ_FUA
        flag setting.
      
      * add_request() is collapsed into __make_request().
      
      These changes don't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      dd831006
    • T
      block: deprecate barrier and replace blk_queue_ordered() with blk_queue_flush() · 4913efe4
      Tejun Heo 提交于
      Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA
      requests.  Deprecate barrier.  All REQ_HARDBARRIERs are failed with
      -EOPNOTSUPP and blk_queue_ordered() is replaced with simpler
      blk_queue_flush().
      
      blk_queue_flush() takes combinations of REQ_FLUSH and FUA.  If a
      device has write cache and can flush it, it should set REQ_FLUSH.  If
      the device can handle FUA writes, it should also set REQ_FUA.
      
      All blk_queue_ordered() users are converted.
      
      * ORDERED_DRAIN is mapped to 0 which is the default value.
      * ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH.
      * ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NBoaz Harrosh <bharrosh@panasas.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Pierre Ossman <drzeus@drzeus.cx>
      Cc: Stefan Weinhuber <wein@de.ibm.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      4913efe4
    • T
      block: kill QUEUE_ORDERED_BY_TAG · 6958f145
      Tejun Heo 提交于
      Nobody is making meaningful use of ORDERED_BY_TAG now and queue
      draining for barrier requests will be removed soon which will render
      the advantage of tag ordering moot.  Kill ORDERED_BY_TAG.  The
      following users are affected.
      
      * brd: converted to ORDERED_DRAIN.
      * virtio_blk: ORDERED_TAG path was already marked deprecated.  Removed.
      * xen-blkfront: ORDERED_TAG case dropped.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      6958f145
  2. 08 8月, 2010 8 次提交
  3. 29 4月, 2010 3 次提交
  4. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  5. 29 12月, 2009 1 次提交
  6. 02 10月, 2009 4 次提交
    • C
      block: allow large discard requests · 67efc925
      Christoph Hellwig 提交于
      Currently we set the bio size to the byte equivalent of the blocks to
      be trimmed when submitting the initial DISCARD ioctl.  That means it
      is subject to the max_hw_sectors limitation of the HBA which is
      much lower than the size of a DISCARD request we can support.
      Add a separate max_discard_sectors tunable to limit the size for discard
      requests.
      
      We limit the max discard request size in bytes to 32bit as that is the
      limit for bio->bi_size.  This could be much larger if we had a way to pass
      that information through the block layer.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      67efc925
    • C
      block: use normal I/O path for discard requests · c15227de
      Christoph Hellwig 提交于
      prepare_discard_fn() was being called in a place where memory allocation
      was effectively impossible.  This makes it inappropriate for all but
      the most trivial translations of Linux's DISCARD operation to the block
      command set.  Additionally adding a payload there makes the ownership
      of the bio backing unclear as it's now allocated by the device driver
      and not the submitter as usual.
      
      It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether
      the queue supports discard operations or not.  blkdev_issue_discard now
      allocates a one-page, sector-length payload which is the right thing
      for the common ATA and SCSI implementations.
      
      The mtd implementation of prepare_discard_fn() is replaced with simply
      checking for the request being a discard.
      
      Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx>
      which did the prepare_discard_fn but not the different payload allocation
      yet.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c15227de
    • C
      block: allow large discard requests · ca80650c
      Christoph Hellwig 提交于
      Currently we set the bio size to the byte equivalent of the blocks to
      be trimmed when submitting the initial DISCARD ioctl.  That means it
      is subject to the max_hw_sectors limitation of the HBA which is
      much lower than the size of a DISCARD request we can support.
      Add a separate max_discard_sectors tunable to limit the size for discard
      requests.
      
      We limit the max discard request size in bytes to 32bit as that is the
      limit for bio->bi_size.  This could be much larger if we had a way to pass
      that information through the block layer.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      ca80650c
    • C
      block: use normal I/O path for discard requests · 1122a26f
      Christoph Hellwig 提交于
      prepare_discard_fn() was being called in a place where memory allocation
      was effectively impossible.  This makes it inappropriate for all but
      the most trivial translations of Linux's DISCARD operation to the block
      command set.  Additionally adding a payload there makes the ownership
      of the bio backing unclear as it's now allocated by the device driver
      and not the submitter as usual.
      
      It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether
      the queue supports discard operations or not.  blkdev_issue_discard now
      allocates a one-page, sector-length payload which is the right thing
      for the common ATA and SCSI implementations.
      
      The mtd implementation of prepare_discard_fn() is replaced with simply
      checking for the request being a discard.
      
      Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx>
      which did the prepare_discard_fn but not the different payload allocation
      yet.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      1122a26f
  7. 14 9月, 2009 1 次提交
    • C
      block: use blkdev_issue_discard in blk_ioctl_discard · 746cd1e7
      Christoph Hellwig 提交于
      blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard,
      the only difference between the two is that blkdev_issue_discard needs to
      send a barrier discard request and blk_ioctl_discard a non-barrier one,
      and blk_ioctl_discard needs to wait on the request.  To facilitates this
      add a flags argument to blkdev_issue_discard to control both aspects of the
      behaviour.  This will be very useful later on for using the waiting
      funcitonality for other callers.
      
      Based on an earlier patch from Matthew Wilcox <matthew@wil.cx>.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      746cd1e7
  8. 23 5月, 2009 1 次提交
  9. 20 5月, 2009 1 次提交
  10. 11 5月, 2009 3 次提交
    • T
      block: implement and enforce request peek/start/fetch · 9934c8c0
      Tejun Heo 提交于
      Till now block layer allowed two separate modes of request execution.
      A request is always acquired from the request queue via
      elv_next_request().  After that, drivers are free to either dequeue it
      or process it without dequeueing.  Dequeue allows elv_next_request()
      to return the next request so that multiple requests can be in flight.
      
      Executing requests without dequeueing has its merits mostly in
      allowing drivers for simpler devices which can't do sg to deal with
      segments only without considering request boundary.  However, the
      benefit this brings is dubious and declining while the cost of the API
      ambiguity is increasing.  Segment based drivers are usually for very
      old or limited devices and as converting to dequeueing model isn't
      difficult, it doesn't justify the API overhead it puts on block layer
      and its more modern users.
      
      Previous patches converted all block low level drivers to dequeueing
      model.  This patch completes the API transition by...
      
      * renaming elv_next_request() to blk_peek_request()
      
      * renaming blkdev_dequeue_request() to blk_start_request()
      
      * adding blk_fetch_request() which is combination of peek and start
      
      * disallowing completion of queued (not started) requests
      
      * applying new API to all LLDs
      
      Renamings are for consistency and to break out of tree code so that
      it's apparent that out of tree drivers need updating.
      
      [ Impact: block request issue API cleanup, no functional change ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Mike Miller <mike.miller@hp.com>
      Cc: unsik Kim <donari75@gmail.com>
      Cc: Paul Clements <paul.clements@steeleye.com>
      Cc: Tim Waugh <tim@cyberelk.net>
      Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Laurent Vivier <Laurent@lvivier.info>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Borislav Petkov <petkovbb@googlemail.com>
      Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Alex Dubov <oakad@yahoo.com>
      Cc: Pierre Ossman <drzeus@drzeus.cx>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
      Cc: Stefan Weinhuber <wein@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Pete Zaitcev <zaitcev@redhat.com>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      9934c8c0
    • T
      block: convert to pos and nr_sectors accessors · 83096ebf
      Tejun Heo 提交于
      With recent cleanups, there is no place where low level driver
      directly manipulates request fields.  This means that the 'hard'
      request fields always equal the !hard fields.  Convert all
      rq->sectors, nr_sectors and current_nr_sectors references to
      accessors.
      
      While at it, drop superflous blk_rq_pos() < 0 test in swim.c.
      
      [ Impact: use pos and nr_sectors accessors ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Tested-by: NGrant Likely <grant.likely@secretlab.ca>
      Acked-by: NGrant Likely <grant.likely@secretlab.ca>
      Tested-by: NAdrian McMenamin <adrian@mcmen.demon.co.uk>
      Acked-by: NAdrian McMenamin <adrian@mcmen.demon.co.uk>
      Acked-by: NMike Miller <mike.miller@hp.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Borislav Petkov <petkovbb@googlemail.com>
      Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Eric Moore <Eric.Moore@lsi.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Pete Zaitcev <zaitcev@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Paul Clements <paul.clements@steeleye.com>
      Cc: Tim Waugh <tim@cyberelk.net>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Alex Dubov <oakad@yahoo.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Dario Ballabio <ballabio_dario@emc.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: unsik Kim <donari75@gmail.com>
      Cc: Laurent Vivier <Laurent@lvivier.info>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      83096ebf
    • T
      block: implement blk_rq_pos/[cur_]sectors() and convert obvious ones · 5b93629b
      Tejun Heo 提交于
      Implement accessors - blk_rq_pos(), blk_rq_sectors() and
      blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors
      and rq->hard_cur_sectors respectively and convert direct references of
      the said fields to the accessors.
      
      This is in preparation of request data length handling cleanup.
      
      Geert	: suggested adding const to struct request * parameter to accessors
      Sergei	: spotted error in patch description
      
      [ Impact: cleanup ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Acked-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Tested-by: NGrant Likely <grant.likely@secretlab.ca>
      Acked-by: NGrant Likely <grant.likely@secretlab.ca>
      Ackec-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Borislav Petkov <petkovbb@googlemail.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      5b93629b
  11. 28 4月, 2009 1 次提交
    • T
      block: implement and use [__]blk_end_request_all() · 40cbbb78
      Tejun Heo 提交于
      There are many [__]blk_end_request() call sites which call it with
      full request length and expect full completion.  Many of them ensure
      that the request actually completes by doing BUG_ON() the return
      value, which is awkward and error-prone.
      
      This patch adds [__]blk_end_request_all() which takes @rq and @error
      and fully completes the request.  BUG_ON() is added to to ensure that
      this actually happens.
      
      Most conversions are simple but there are a few noteworthy ones.
      
      * cdrom/viocd: viocd_end_request() replaced with direct calls to
        __blk_end_request_all().
      
      * s390/block/dasd: dasd_end_request() replaced with direct calls to
        __blk_end_request_all().
      
      * s390/char/tape_block: tapeblock_end_request() replaced with direct
        calls to blk_end_request_all().
      
      [ Impact: cleanup ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Mike Miller <mike.miller@hp.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Alex Dubov <oakad@yahoo.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      40cbbb78
  12. 15 4月, 2009 1 次提交
  13. 30 1月, 2009 1 次提交
  14. 29 12月, 2008 6 次提交
    • T
      block: fix empty barrier on write-through w/ ordered tag · a185eb4b
      Tejun Heo 提交于
      Empty barrier on write-through (or no cache) w/ ordered tag has no
      command to execute and without any command to execute ordered tag is
      never issued to the device and the ordering is never achieved.  Force
      draining for such cases.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a185eb4b
    • T
      block: simplify empty barrier implementation · 58eea927
      Tejun Heo 提交于
      Empty barrier required special handling in __elv_next_request() to
      complete it without letting the low level driver see it.
      
      With previous changes, barrier code is now flexible enough to skip the
      BAR step using the same barrier sequence selection mechanism.  Drop
      the special handling and mask off q->ordered from start_ordered().
      
      Remove blk_empty_barrier() test which now has no user.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      58eea927
    • T
      block: make barrier completion more robust · 8f11b3e9
      Tejun Heo 提交于
      Barrier completion had the following assumptions.
      
      * start_ordered() couldn't finish the whole sequence properly.  If all
        actions are to be skipped, q->ordseq is set correctly but the actual
        completion was never triggered thus hanging the barrier request.
      
      * Drain completion in elv_complete_request() assumed that there's
        always at least one request in the queue when drain completes.
      
      Both assumptions are true but these assumptions need to be removed to
      improve empty barrier implementation.  This patch makes the following
      changes.
      
      * Make start_ordered() use blk_ordered_complete_seq() to mark skipped
        steps complete and notify __elv_next_request() that it should fetch
        the next request if the whole barrier has completed inside
        start_ordered().
      
      * Make drain completion path in elv_complete_request() check whether
        the queue is empty.  Empty queue also indicates drain completion.
      
      * While at it, convert 0/1 return from blk_do_ordered() to false/true.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      8f11b3e9
    • T
      block: make every barrier action optional · f671620e
      Tejun Heo 提交于
      In all barrier sequences, the barrier write itself was always assumed
      to be issued and thus didn't have corresponding control flag.  This
      patch adds QUEUE_ORDERED_DO_BAR and unify action mask handling in
      start_ordered() such that any barrier action can be skipped.
      
      This patch doesn't introduce any visible behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      f671620e
    • T
      block: remove duplicate or unused barrier/discard error paths · a7384677
      Tejun Heo 提交于
      * Because barrier mode can be changed dynamically, whether barrier is
        supported or not can be determined only when actually issuing the
        barrier and there is no point in checking it earlier.  Drop barrier
        support check in generic_make_request() and __make_request(), and
        update comment around the support check in blk_do_ordered().
      
      * There is no reason to check discard support in both
        generic_make_request() and __make_request().  Drop the check in
        __make_request().  While at it, move error action block to the end
        of the function and add unlikely() to q existence test.
      
      * Barrier request, be it empty or not, is never passed to low level
        driver and thus it's meaningless to try to copy back req->sector to
        bio->bi_sector on error.  In addition, the notion of failed sector
        doesn't make any sense for empty barrier to begin with.  Drop the
        code block from __end_that_request_first().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a7384677
    • T
      block: reorganize QUEUE_ORDERED_* constants · 313e4299
      Tejun Heo 提交于
      Separate out ordering type (drain,) and action masks (preflush,
      postflush, fua) from visible ordering mode selectors
      (QUEUE_ORDERED_*).  Ordering types are now named QUEUE_ORDERED_BY_*
      while action masks are named QUEUE_ORDERED_DO_*.
      
      This change is necessary to add QUEUE_ORDERED_DO_BAR and make it
      optional to improve empty barrier implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      313e4299
  15. 03 12月, 2008 1 次提交
    • T
      block: internal dequeue shouldn't start timer · 53a08807
      Tejun Heo 提交于
      blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
      both start the timeout timer.  Barrier code dequeues the original
      barrier request but doesn't passes the request itself to lower level
      driver, only broken down proxy requests; however, as the original
      barrier code goes through the same dequeue path and timeout timer is
      started on it.  If barrier sequence takes long enough, this timer
      expires but the low level driver has no idea about this request and
      oops follows.
      
      Timeout timer shouldn't have been started on the original barrier
      request as it never goes through actual IO.  This patch unexports
      elv_dequeue_request(), which has no external user anyway, and makes it
      operate on elevator proper w/o adding the timer and make
      blkdev_dequeue_request() call elv_dequeue_request() and add timer.
      Internal users which don't pass the request to driver - barrier code
      and end_that_request_last() - are converted to use
      elv_dequeue_request().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      53a08807
  16. 09 10月, 2008 1 次提交
    • H
      block: adjust blkdev_issue_discard for swap · 3e6053d7
      Hugh Dickins 提交于
      Two mods to blkdev_issue_discard(), thinking ahead to its use on swap:
      
      1. Add gfp_mask argument, so swap allocation can use it where GFP_KERNEL
         might deadlock but GFP_NOIO is safe.
      
      2. Enlarge nr_sects argument from unsigned to sector_t: unsigned long is
         enough to cover a whole swap area, but sector_t suits any partition.
      
      Change sb_issue_discard()'s nr_blocks to sector_t too; but no need seen
      for a gfp_mask there, just pass GFP_KERNEL down to blkdev_issue_discard().
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3e6053d7