1. 28 4月, 2009 11 次提交
    • T
      block: implement and use [__]blk_end_request_all() · 40cbbb78
      Tejun Heo 提交于
      There are many [__]blk_end_request() call sites which call it with
      full request length and expect full completion.  Many of them ensure
      that the request actually completes by doing BUG_ON() the return
      value, which is awkward and error-prone.
      
      This patch adds [__]blk_end_request_all() which takes @rq and @error
      and fully completes the request.  BUG_ON() is added to to ensure that
      this actually happens.
      
      Most conversions are simple but there are a few noteworthy ones.
      
      * cdrom/viocd: viocd_end_request() replaced with direct calls to
        __blk_end_request_all().
      
      * s390/block/dasd: dasd_end_request() replaced with direct calls to
        __blk_end_request_all().
      
      * s390/char/tape_block: tapeblock_end_request() replaced with direct
        calls to blk_end_request_all().
      
      [ Impact: cleanup ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Mike Miller <mike.miller@hp.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Alex Dubov <oakad@yahoo.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      40cbbb78
    • T
      block: move rq->start_time initialization to blk_rq_init() · b243ddcb
      Tejun Heo 提交于
      rq->start_time was initialized in init_request_from_bio() so special
      requests didn't have start_time set.  This has been okay as start_time
      has been used only for fs requests; however, there is no indication of
      this actually is the case or not.  Set rq->start_time in blk_rq_init()
      and guarantee that all initialized rq's have its start_time set.  This
      improves consistency at virtually no cost and future changes will make
      use of the timestamp for !bio requests.
      
      [ Impact: rq->start_time is valid for all requests ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      b243ddcb
    • T
      block: clean up request completion API · 2e60e022
      Tejun Heo 提交于
      Request completion has gone through several changes and became a bit
      messy over the time.  Clean it up.
      
      1. end_that_request_data() is a thin wrapper around
         end_that_request_data_first() which checks whether bio is NULL
         before doing anything and handles bidi completion.
         blk_update_request() is a thin wrapper around
         end_that_request_data() which clears nr_sectors on the last
         iteration but doesn't use the bidi completion.
      
         Clean it up by moving the initial bio NULL check and nr_sectors
         clearing on the last iteration into end_that_request_data() and
         renaming it to blk_update_request(), which makes blk_end_io() the
         only user of end_that_request_data().  Collapse
         end_that_request_data() into blk_end_io().
      
      2. There are four visible completion variants - blk_end_request(),
         __blk_end_request(), blk_end_bidi_request() and end_request().
         blk_end_request() and blk_end_bidi_request() uses blk_end_request()
         as the backend but __blk_end_request() and end_request() use
         separate implementation in __blk_end_request() due to different
         locking rules.
      
         blk_end_bidi_request() is identical to blk_end_io().  Collapse
         blk_end_io() into blk_end_bidi_request(), separate out request
         update into internal helper blk_update_bidi_request() and add
         __blk_end_bidi_request().  Redefine [__]blk_end_request() as thin
         inline wrappers around [__]blk_end_bidi_request().
      
      3. As the whole request issue/completion usages are about to be
         modified and audited, it's a good chance to convert completion
         functions return bool which better indicates the intended meaning
         of return values.
      
      4. The function name end_that_request_last() is from the days when it
         was a public interface and slighly confusing.  Give it a proper
         internal name - blk_finish_request().
      
      5. Add description explaning that blk_end_bidi_request() can be safely
         used for uni requests as suggested by Boaz Harrosh.
      
      The only visible behavior change is from #1.  nr_sectors counts are
      cleared after the final iteration no matter which function is used to
      complete the request.  I couldn't find any place where the code
      assumes those nr_sectors counters contain the values for the last
      segment and this change is good as it makes the API much more
      consistent as the end result is now same whether a request is
      completed using [__]blk_end_request() alone or in combination with
      blk_update_request().
      
      API further cleaned up per Christoph's suggestion.
      
      [ Impact: cleanup, rq->*nr_sectors always updated after req completion ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NBoaz Harrosh <bharrosh@panasas.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      2e60e022
    • T
      block: kill blk_end_request_callback() · 0b302d5a
      Tejun Heo 提交于
      With recent IDE updates, blk_end_request_callback() doesn't have any
      user now.  Kill it.
      
      [ Impact: removal of unused convoluted interface ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      0b302d5a
    • T
      block: reorganize request fetching functions · 158dbda0
      Tejun Heo 提交于
      Impact: code reorganization
      
      elv_next_request() and elv_dequeue_request() are public block layer
      interface than actual elevator implementation.  They mostly deal with
      how requests interact with block layer and low level drivers at the
      beginning of rqeuest processing whereas __elv_next_request() is the
      actual eleveator request fetching interface.
      
      Move the two functions to blk-core.c.  This prepares for further
      interface cleanup.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      158dbda0
    • T
      block: reorder request completion functions · 5efccd17
      Tejun Heo 提交于
      Reorder request completion functions such that
      
      * All request completion functions are located together.
      
      * Functions which are used by only one caller is put right above the
        caller.
      
      * end_request() is put after other completion functions but before
        blk_update_request().
      
      This change is for completion function cleanup which will follow.
      
      [ Impact: cleanup, code reorganization ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5efccd17
    • T
      block: cleanup REQ_SOFTBARRIER usages · 10732f56
      Tejun Heo 提交于
      blk_insert_request() doesn't need to worry about REQ_SOFTBARRIER.
      Don't set it.  Combined with recent ide updates, REQ_SOFTBARRIER is
      now only used in elevator proper and for discard requests.
      
      [ Impact: cleanup ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      10732f56
    • T
      block: don't set REQ_NOMERGE unnecessarily · e4025f6c
      Tejun Heo 提交于
      RQ_NOMERGE_FLAGS already clears defines which REQ flags aren't
      mergeable.  There is no reason to specify it superflously.  It only
      adds to confusion.  Don't set REQ_NOMERGE for barriers and requests
      with specific queueing directive.  REQ_NOMERGE is now exclusively used
      by the merging code.
      
      [ Impact: cleanup ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e4025f6c
    • T
      block: kill blk_start_queueing() · a7f55792
      Tejun Heo 提交于
      blk_start_queueing() is identical to __blk_run_queue() except that it
      doesn't check for recursion.  None of the current users depends on
      blk_start_queueing() running request_fn directly.  Replace usages of
      blk_start_queueing() with [__]blk_run_queue() and kill it.
      
      [ Impact: removal of mostly duplicate interface function ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a7f55792
    • T
      block: merge blk_invoke_request_fn() into __blk_run_queue() · a538cd03
      Tejun Heo 提交于
      __blk_run_queue wraps blk_invoke_request_fn() such that it
      additionally removes plug and bails out early if the queue is empty.
      Both extra operations have their own pending mechanisms and don't
      cause any harm correctness-wise when they are done superflously.
      
      The only user of blk_invoke_request_fn() being blk_start_queue(),
      there isn't much reason to keep both functions around.  Merge
      blk_invoke_request_fn() into __blk_run_queue() and make
      blk_start_queue() use __blk_run_queue() instead.
      
      [ Impact: merge two subtly different internal functions ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a538cd03
    • T
      block: clear req->errors on bio completion only for fs requests · 924cec77
      Tejun Heo 提交于
      Impact: subtle behavior change
      
      For fs requests, rq is only carrier of bios and rq error status as a
      whole doesn't mean much.  This is the reason why rq->errors is being
      cleared on each partial completion of a request as on each partial
      completion the error status is transferred to the respective bios.
      
      For pc requests, rq->errors is used to carry error status to the
      issuer and thus __end_that_request_first() doesn't clear it on such
      cases.
      
      The condition was fine till now as only fs and pc requests have used
      bio and thus the bio completion path.  However, future changes will
      unify data accesses to bio and all non fs users care about rq error
      status.  Clear rq->errors on bio completion only for fs requests.
      
      In general, the implicit clearing is a bit too subtle especially as
      the meaning of rq->errors is completely dependent on low level
      drivers.  Unifying / cleaning up rq->errors usage and letting llds
      manage it would be better.  TODO comment added.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJens Axboe <axboe@kernel.dk>
      924cec77
  2. 24 4月, 2009 1 次提交
  3. 07 4月, 2009 2 次提交
  4. 06 4月, 2009 3 次提交
  5. 03 4月, 2009 1 次提交
    • L
      blktrace: fix pdu_len when tracing packet command requests · e2494e1b
      Li Zefan 提交于
      Impact: output all of packet commands - not just the first 4 / 8 bytes
      
      Since commit d7e3c324 ("block: add
      large command support"), struct request->cmd has been changed from
      unsinged char cmd[BLK_MAX_CDB] to unsigned char *cmd.
      
      v1 -> v2: by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      
      - make sure rq->cmd_len is always intialized, and then we can use
        rq->cmd_len instead of BLK_MAX_CDB.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      LKML-Reference: <49D4507E.2060602@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e2494e1b
  6. 26 3月, 2009 1 次提交
  7. 24 3月, 2009 2 次提交
  8. 02 2月, 2009 1 次提交
  9. 30 1月, 2009 3 次提交
  10. 29 12月, 2008 5 次提交
    • J
      block: don't use plugging on SSD devices · a31a9738
      Jens Axboe 提交于
      We just want to hand the first bits of IO to the device as fast
      as possible. Gains a few percent on the IOPS rate.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a31a9738
    • T
      block: remove duplicate or unused barrier/discard error paths · a7384677
      Tejun Heo 提交于
      * Because barrier mode can be changed dynamically, whether barrier is
        supported or not can be determined only when actually issuing the
        barrier and there is no point in checking it earlier.  Drop barrier
        support check in generic_make_request() and __make_request(), and
        update comment around the support check in blk_do_ordered().
      
      * There is no reason to check discard support in both
        generic_make_request() and __make_request().  Drop the check in
        __make_request().  While at it, move error action block to the end
        of the function and add unlikely() to q existence test.
      
      * Barrier request, be it empty or not, is never passed to low level
        driver and thus it's meaningless to try to copy back req->sector to
        bio->bi_sector on error.  In addition, the notion of failed sector
        doesn't make any sense for empty barrier to begin with.  Drop the
        code block from __end_that_request_first().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a7384677
    • C
      block: use cancel_work_sync() instead of kblockd_flush_work() · 64d01dc9
      Cheng Renquan 提交于
      After many improvements on kblockd_flush_work, it is now identical to
      cancel_work_sync, so a direct call to cancel_work_sync is suggested.
      
      The only difference is that cancel_work_sync is a GPL symbol,
      so no non-GPL modules anymore.
      Signed-off-by: NCheng Renquan <crquan@gmail.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      64d01dc9
    • K
      block: Supress Buffer I/O errors when SCSI REQ_QUIET flag set · 08bafc03
      Keith Mannthey 提交于
      Allow the scsi request REQ_QUIET flag to be propagated to the buffer
      file system layer. The basic ideas is to pass the flag from the scsi
      request to the bio (block IO) and then to the buffer layer.  The buffer
      layer can then suppress needless printks.
      
      This patch declutters the kernel log by removed the 40-50 (per lun)
      buffer io error messages seen during a boot in my multipath setup . It
      is a good chance any real errors will be missed in the "noise" it the
      logs without this patch.
      
      During boot I see blocks of messages like
      "
      __ratelimit: 211 callbacks suppressed
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242847
      Buffer I/O error on device sdm, logical block 1
      Buffer I/O error on device sdm, logical block 5242878
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242872
      "
      in my logs.
      
      My disk environment is multipath fiber channel using the SCSI_DH_RDAC
      code and multipathd.  This topology includes an "active" and "ghost"
      path for each lun. IO's to the "ghost" path will never complete and the
      SCSI layer, via the scsi device handler rdac code, quick returns the IOs
      to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi
      layer messages.
      
       I am wanting to extend the QUIET behavior to include the buffer file
      system layer to deal with these errors as well. I have been running this
      patch for a while now on several boxes without issue.  A few runs of
      bonnie++ show no noticeable difference in performance in my setup.
      
      Thanks for John Stultz for the quiet_error finalization.
      Submitted-by: NKeith Mannthey <kmannth@us.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      08bafc03
    • J
      block: leave the request timeout timer running even on an empty list · 70ed28b9
      Jens Axboe 提交于
      For sync IO, we'll often do them serialized. This means we'll be touching
      the queue timer for every IO, as opposed to only occasionally like we
      do for queued IO. Instead of deleting the timer when the last request
      is removed, just let continue running. If a new request comes up soon
      we then don't have to readd the timer again. If no new requests arrive,
      the timer will expire without side effect later.
      
      This improves high iops sync IO by ~1%.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      70ed28b9
  11. 03 12月, 2008 2 次提交
    • M
      block: fix setting of max_segment_size and seg_boundary mask · 0e435ac2
      Milan Broz 提交于
      Fix setting of max_segment_size and seg_boundary mask for stacked md/dm
      devices.
      
      When stacking devices (LVM over MD over SCSI) some of the request queue
      parameters are not set up correctly in some cases by default, namely
      max_segment_size and and seg_boundary mask.
      
      If you create MD device over SCSI, these attributes are zeroed.
      
      Problem become when there is over this mapping next device-mapper mapping
      - queue attributes are set in DM this way:
      
      request_queue   max_segment_size  seg_boundary_mask
      SCSI                65536             0xffffffff
      MD RAID1                0                      0
      LVM                 65536                 -1 (64bit)
      
      Unfortunately bio_add_page (resp.  bio_phys_segments) calculates number of
      physical segments according to these parameters.
      
      During the generic_make_request() is segment cout recalculated and can
      increase bio->bi_phys_segments count over the allowed limit.  (After
      bio_clone() in stack operation.)
      
      Thi is specially problem in CCISS driver, where it produce OOPS here
      
          BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);
      
      (MAXSEGENTRIES is 31 by default.)
      
      Sometimes even this command is enough to cause oops:
      
        dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10
      
      This command generates bios with 250 sectors, allocated in 32 4k-pages
      (last page uses only 1024 bytes).
      
      For LVM layer, it allocates bio with 31 segments (still OK for CCISS),
      unfortunatelly on lower layer it is recalculated to 32 segments and this
      violates CCISS restriction and triggers BUG_ON().
      
      The patch tries to fix it by:
      
       * initializing attributes above in queue request constructor
         blk_queue_make_request()
      
       * make sure that blk_queue_stack_limits() inherits setting
      
       (DM uses its own function to set the limits because it
       blk_queue_stack_limits() was introduced later.  It should probably switch
       to use generic stack limit function too.)
      
       * sets the default seg_boundary value in one place (blkdev.h)
      
       * use this mask as default in DM (instead of -1, which differs in 64bit)
      
      Bugs related to this:
      https://bugzilla.redhat.com/show_bug.cgi?id=471639
      http://bugzilla.kernel.org/show_bug.cgi?id=8672Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Reviewed-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Mike Miller <mike.miller@hp.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      0e435ac2
    • T
      block: internal dequeue shouldn't start timer · 53a08807
      Tejun Heo 提交于
      blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
      both start the timeout timer.  Barrier code dequeues the original
      barrier request but doesn't passes the request itself to lower level
      driver, only broken down proxy requests; however, as the original
      barrier code goes through the same dequeue path and timeout timer is
      started on it.  If barrier sequence takes long enough, this timer
      expires but the low level driver has no idea about this request and
      oops follows.
      
      Timeout timer shouldn't have been started on the original barrier
      request as it never goes through actual IO.  This patch unexports
      elv_dequeue_request(), which has no external user anyway, and makes it
      operate on elevator proper w/o adding the timer and make
      blkdev_dequeue_request() call elv_dequeue_request() and add timer.
      Internal users which don't pass the request to driver - barrier code
      and end_that_request_last() - are converted to use
      elv_dequeue_request().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      53a08807
  12. 26 11月, 2008 2 次提交
  13. 06 11月, 2008 1 次提交
  14. 17 10月, 2008 4 次提交
  15. 13 10月, 2008 1 次提交
    • M
      [SCSI] block: separate failfast into multiple bits. · 6000a368
      Mike Christie 提交于
      Multipath is best at handling transport errors. If it gets a device
      error then there is not much the multipath layer can do. It will just
      access the same device but from a different path.
      
      This patch breaks up failfast into device, transport and driver errors.
      The multipath layers (md and dm mutlipath) only ask the lower levels to
      fast fail transport errors. The user of failfast, read ahead, will ask
      to fast fail on all errors.
      
      Note that blk_noretry_request will return true if any failfast bit
      is set. This allows drivers that do not support the multipath failfast
      bits to continue to fail on any failfast error like before. Drivers
      like scsi that are able to fail fast specific errors can check
      for the specific fail fast type. In the next patch I will convert
      scsi.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      6000a368