1. 06 4月, 2009 2 次提交
  2. 26 3月, 2009 1 次提交
  3. 24 3月, 2009 2 次提交
  4. 02 2月, 2009 1 次提交
  5. 30 1月, 2009 3 次提交
  6. 29 12月, 2008 5 次提交
    • J
      block: don't use plugging on SSD devices · a31a9738
      Jens Axboe 提交于
      We just want to hand the first bits of IO to the device as fast
      as possible. Gains a few percent on the IOPS rate.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a31a9738
    • T
      block: remove duplicate or unused barrier/discard error paths · a7384677
      Tejun Heo 提交于
      * Because barrier mode can be changed dynamically, whether barrier is
        supported or not can be determined only when actually issuing the
        barrier and there is no point in checking it earlier.  Drop barrier
        support check in generic_make_request() and __make_request(), and
        update comment around the support check in blk_do_ordered().
      
      * There is no reason to check discard support in both
        generic_make_request() and __make_request().  Drop the check in
        __make_request().  While at it, move error action block to the end
        of the function and add unlikely() to q existence test.
      
      * Barrier request, be it empty or not, is never passed to low level
        driver and thus it's meaningless to try to copy back req->sector to
        bio->bi_sector on error.  In addition, the notion of failed sector
        doesn't make any sense for empty barrier to begin with.  Drop the
        code block from __end_that_request_first().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a7384677
    • C
      block: use cancel_work_sync() instead of kblockd_flush_work() · 64d01dc9
      Cheng Renquan 提交于
      After many improvements on kblockd_flush_work, it is now identical to
      cancel_work_sync, so a direct call to cancel_work_sync is suggested.
      
      The only difference is that cancel_work_sync is a GPL symbol,
      so no non-GPL modules anymore.
      Signed-off-by: NCheng Renquan <crquan@gmail.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      64d01dc9
    • K
      block: Supress Buffer I/O errors when SCSI REQ_QUIET flag set · 08bafc03
      Keith Mannthey 提交于
      Allow the scsi request REQ_QUIET flag to be propagated to the buffer
      file system layer. The basic ideas is to pass the flag from the scsi
      request to the bio (block IO) and then to the buffer layer.  The buffer
      layer can then suppress needless printks.
      
      This patch declutters the kernel log by removed the 40-50 (per lun)
      buffer io error messages seen during a boot in my multipath setup . It
      is a good chance any real errors will be missed in the "noise" it the
      logs without this patch.
      
      During boot I see blocks of messages like
      "
      __ratelimit: 211 callbacks suppressed
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242847
      Buffer I/O error on device sdm, logical block 1
      Buffer I/O error on device sdm, logical block 5242878
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242872
      "
      in my logs.
      
      My disk environment is multipath fiber channel using the SCSI_DH_RDAC
      code and multipathd.  This topology includes an "active" and "ghost"
      path for each lun. IO's to the "ghost" path will never complete and the
      SCSI layer, via the scsi device handler rdac code, quick returns the IOs
      to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi
      layer messages.
      
       I am wanting to extend the QUIET behavior to include the buffer file
      system layer to deal with these errors as well. I have been running this
      patch for a while now on several boxes without issue.  A few runs of
      bonnie++ show no noticeable difference in performance in my setup.
      
      Thanks for John Stultz for the quiet_error finalization.
      Submitted-by: NKeith Mannthey <kmannth@us.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      08bafc03
    • J
      block: leave the request timeout timer running even on an empty list · 70ed28b9
      Jens Axboe 提交于
      For sync IO, we'll often do them serialized. This means we'll be touching
      the queue timer for every IO, as opposed to only occasionally like we
      do for queued IO. Instead of deleting the timer when the last request
      is removed, just let continue running. If a new request comes up soon
      we then don't have to readd the timer again. If no new requests arrive,
      the timer will expire without side effect later.
      
      This improves high iops sync IO by ~1%.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      70ed28b9
  7. 03 12月, 2008 2 次提交
    • M
      block: fix setting of max_segment_size and seg_boundary mask · 0e435ac2
      Milan Broz 提交于
      Fix setting of max_segment_size and seg_boundary mask for stacked md/dm
      devices.
      
      When stacking devices (LVM over MD over SCSI) some of the request queue
      parameters are not set up correctly in some cases by default, namely
      max_segment_size and and seg_boundary mask.
      
      If you create MD device over SCSI, these attributes are zeroed.
      
      Problem become when there is over this mapping next device-mapper mapping
      - queue attributes are set in DM this way:
      
      request_queue   max_segment_size  seg_boundary_mask
      SCSI                65536             0xffffffff
      MD RAID1                0                      0
      LVM                 65536                 -1 (64bit)
      
      Unfortunately bio_add_page (resp.  bio_phys_segments) calculates number of
      physical segments according to these parameters.
      
      During the generic_make_request() is segment cout recalculated and can
      increase bio->bi_phys_segments count over the allowed limit.  (After
      bio_clone() in stack operation.)
      
      Thi is specially problem in CCISS driver, where it produce OOPS here
      
          BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);
      
      (MAXSEGENTRIES is 31 by default.)
      
      Sometimes even this command is enough to cause oops:
      
        dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10
      
      This command generates bios with 250 sectors, allocated in 32 4k-pages
      (last page uses only 1024 bytes).
      
      For LVM layer, it allocates bio with 31 segments (still OK for CCISS),
      unfortunatelly on lower layer it is recalculated to 32 segments and this
      violates CCISS restriction and triggers BUG_ON().
      
      The patch tries to fix it by:
      
       * initializing attributes above in queue request constructor
         blk_queue_make_request()
      
       * make sure that blk_queue_stack_limits() inherits setting
      
       (DM uses its own function to set the limits because it
       blk_queue_stack_limits() was introduced later.  It should probably switch
       to use generic stack limit function too.)
      
       * sets the default seg_boundary value in one place (blkdev.h)
      
       * use this mask as default in DM (instead of -1, which differs in 64bit)
      
      Bugs related to this:
      https://bugzilla.redhat.com/show_bug.cgi?id=471639
      http://bugzilla.kernel.org/show_bug.cgi?id=8672Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Reviewed-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Mike Miller <mike.miller@hp.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      0e435ac2
    • T
      block: internal dequeue shouldn't start timer · 53a08807
      Tejun Heo 提交于
      blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
      both start the timeout timer.  Barrier code dequeues the original
      barrier request but doesn't passes the request itself to lower level
      driver, only broken down proxy requests; however, as the original
      barrier code goes through the same dequeue path and timeout timer is
      started on it.  If barrier sequence takes long enough, this timer
      expires but the low level driver has no idea about this request and
      oops follows.
      
      Timeout timer shouldn't have been started on the original barrier
      request as it never goes through actual IO.  This patch unexports
      elv_dequeue_request(), which has no external user anyway, and makes it
      operate on elevator proper w/o adding the timer and make
      blkdev_dequeue_request() call elv_dequeue_request() and add timer.
      Internal users which don't pass the request to driver - barrier code
      and end_that_request_last() - are converted to use
      elv_dequeue_request().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      53a08807
  8. 26 11月, 2008 2 次提交
  9. 06 11月, 2008 1 次提交
  10. 17 10月, 2008 4 次提交
  11. 13 10月, 2008 1 次提交
    • M
      [SCSI] block: separate failfast into multiple bits. · 6000a368
      Mike Christie 提交于
      Multipath is best at handling transport errors. If it gets a device
      error then there is not much the multipath layer can do. It will just
      access the same device but from a different path.
      
      This patch breaks up failfast into device, transport and driver errors.
      The multipath layers (md and dm mutlipath) only ask the lower levels to
      fast fail transport errors. The user of failfast, read ahead, will ask
      to fast fail on all errors.
      
      Note that blk_noretry_request will return true if any failfast bit
      is set. This allows drivers that do not support the multipath failfast
      bits to continue to fail on any failfast error like before. Drivers
      like scsi that are able to fail fast specific errors can check
      for the specific fail fast type. In the next patch I will convert
      scsi.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      6000a368
  12. 09 10月, 2008 16 次提交
    • K
      block: remove end_{queued|dequeued}_request() · d00e29fd
      Kiyoshi Ueda 提交于
      This patch removes end_queued_request() and end_dequeued_request(),
      which are no longer used.
      
      As a results, users of __end_request() became only end_request().
      So the actual code in __end_request() is moved to end_request()
      and __end_request() is removed.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      d00e29fd
    • K
      block: add lld busy state exporting interface · ef9e3fac
      Kiyoshi Ueda 提交于
      This patch adds an new interface, blk_lld_busy(), to check lld's
      busy state from the block layer.
      blk_lld_busy() calls down into low-level drivers for the checking
      if the drivers set q->lld_busy_fn() using blk_queue_lld_busy().
      
      This resolves a performance problem on request stacking devices below.
      
      Some drivers like scsi mid layer stop dispatching request when
      they detect busy state on its low-level device like host/target/device.
      It allows other requests to stay in the I/O scheduler's queue
      for a chance of merging.
      
      Request stacking drivers like request-based dm should follow
      the same logic.
      However, there is no generic interface for the stacked device
      to check if the underlying device(s) are busy.
      If the request stacking driver dispatches and submits requests to
      the busy underlying device, the requests will stay in
      the underlying device's queue without a chance of merging.
      This causes performance problem on burst I/O load.
      
      With this patch, busy state of the underlying device is exported
      via q->lld_busy_fn().  So the request stacking driver can check it
      and stop dispatching requests if busy.
      
      The underlying device driver must return the busy state appropriately:
          1: when the device driver can't process requests immediately.
          0: when the device driver can process requests immediately,
             including abnormal situations where the device driver needs
             to kill all requests.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      ef9e3fac
    • E
      block: Fix blk_start_queueing() to not kick a stopped queue · 336c3d8c
      Elias Oltmanns 提交于
      blk_start_queueing() should act like the generic queue unplugging
      and kicking and ignore a stopped queue. Such a queue may not be
      run until after a call to blk_start_queue().
      Signed-off-by: NElias Oltmanns <eo@nebensachen.de>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      336c3d8c
    • K
      block: add a queue flag for request stacking support · 4ee5eaf4
      Kiyoshi Ueda 提交于
      This patch adds a queue flag to indicate the block device can be
      used for request stacking.
      
      Request stacking drivers need to stack their devices on top of
      only devices of which q->request_fn is functional.
      Since bio stacking drivers (e.g. md, loop) basically initialize
      their queue using blk_alloc_queue() and don't set q->request_fn,
      the check of (q->request_fn == NULL) looks enough for that purpose.
      
      However, dm will become both types of stacking driver (bio-based and
      request-based).  And dm will always set q->request_fn even if the dm
      device is bio-based of which q->request_fn is not functional actually.
      So we need something else to distinguish the type of the device.
      Adding a queue flag is a solution for that.
      
      The reason why dm always sets q->request_fn is to keep
      the compatibility of dm user-space tools.
      Currently, all dm user-space tools are using bio-based dm without
      specifying the type of the dm device they use.
      To use request-based dm without changing such tools, the kernel
      must decide the type of the dm device automatically.
      The automatic type decision can't be done at the device creation time
      and needs to be deferred until such tools load a mapping table,
      since the actual type is decided by dm target type included in
      the mapping table.
      
      So a dm device has to be initialized using blk_init_queue()
      so that we can load either type of table.
      Then, all queue stuffs are set (e.g. q->request_fn) and we have
      no element to distinguish that it is bio-based or request-based,
      even after a table is loaded and the type of the device is decided.
      
      By the way, some stuffs of the queue (e.g. request_list, elevator)
      are needless when the dm device is used as bio-based.
      But the memory size is not so large (about 20[KB] per queue on ia64),
      so I hope the memory loss can be acceptable for bio-based dm users.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      4ee5eaf4
    • K
      block: add request submission interface · 82124d60
      Kiyoshi Ueda 提交于
      This patch adds blk_insert_cloned_request(), a generic request
      submission interface for request stacking drivers.
      Request-based dm will use it to submit their clones to underlying
      devices.
      
      blk_rq_check_limits() is also added because it is possible that
      the lower queue has stronger limitations than the upper queue
      if multiple drivers are stacking at request-level.
      Not only for blk_insert_cloned_request()'s internal use, the function
      will be used by request-based dm when the queue limitation is
      modified (e.g. by replacing dm's table).
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      82124d60
    • K
      block: add request update interface · 32fab448
      Kiyoshi Ueda 提交于
      This patch adds blk_update_request(), which updates struct request
      with completing its data part, but doesn't complete the struct
      request itself.
      Though it looks like end_that_request_first() of older kernels,
      blk_update_request() should be used only by request stacking drivers.
      
      Request-based dm will use it in bio->bi_end_io callback to update
      the original request when a data part of a cloned request completes.
      Followings are additional background information of why request-based
      dm needs this interface.
      
        - Request stacking drivers can't use blk_end_request() directly from
          the lower driver's completion context (bio->bi_end_io or rq->end_io),
          because some device drivers (e.g. ide) may try to complete
          their request with queue lock held, and it may cause deadlock.
          See below for detailed description of possible deadlock:
          <http://marc.info/?l=linux-kernel&m=120311479108569&w=2>
      
        - To solve that, request-based dm offloads the completion of
          cloned struct request to softirq context (i.e. using
          blk_complete_request() from rq->end_io).
      
        - Though it is possible to use the same solution from bio->bi_end_io,
          it will delay the notification of bio completion to the original
          submitter.  Also, it will cause inefficient partial completion,
          because the lower driver can't perform the cloned request anymore
          and request-based dm needs to requeue and redispatch it to
          the lower driver again later.  That's not good.
      
        - So request-based dm needs blk_update_request() to perform the bio
          completion in the lower driver's completion context, which is more
          efficient.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      32fab448
    • J
      block: blk_cleanup_queue() should call blk_sync_queue() · e3335de9
      Jens Axboe 提交于
      When a driver calls blk_cleanup_queue(), the device should be fully idle.
      However, the block layer may have pending plugging timers and the IO
      schedulers may have pending work in the work queues. So quisce the device
      by waiting for the timer and flushing the work queues.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      e3335de9
    • J
      block: unify request timeout handling · 242f9dcb
      Jens Axboe 提交于
      Right now SCSI and others do their own command timeout handling.
      Move those bits to the block layer.
      
      Instead of having a timer per command, we try to be a bit more clever
      and simply have one per-queue. This avoids the overhead of having to
      tear down and setup a timer for each command, so it will result in a lot
      less timer fiddling.
      Signed-off-by: NMike Anderson <andmike@linux.vnet.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      242f9dcb
    • J
      block: update comment on end_request() · 839e96af
      Jens Axboe 提交于
      It refers to functions that no longer exist after the IO completion
      changes.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      839e96af
    • J
      block: don't use bio_has_data() in the completion path · 60540161
      Jens Axboe 提交于
      We should just check for rq->bio, as that is really the information
      we are looking for. Even if the bio attached doesn't carry data,
      we still need to do IO post processing on it.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      60540161
    • J
      block: inherit CPU completion on bio->rq and rq->rq merges · ab780f1e
      Jens Axboe 提交于
      Somewhat incomplete, as we do allow merges of requests and bios
      that have different completion CPUs given. This is done on the
      assumption that a larger IO is still more beneficial than CPU
      locality.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      ab780f1e
    • J
      block: add support for IO CPU affinity · c7c22e4d
      Jens Axboe 提交于
      This patch adds support for controlling the IO completion CPU of
      either all requests on a queue, or on a per-request basis. We export
      a sysfs variable (rq_affinity) which, if set, migrates completions
      of requests to the CPU that originally submitted it. A bio helper
      (bio_set_completion_cpu()) is also added, so that queuers can ask
      for completion on that specific CPU.
      
      In testing, this has been show to cut the system time by as much
      as 20-40% on synthetic workloads where CPU affinity is desired.
      
      This requires a little help from the architecture, so it'll only
      work as designed for archs that are using the new generic smp
      helper infrastructure.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c7c22e4d
    • J
      block: make kblockd_schedule_work() take the queue as parameter · 18887ad9
      Jens Axboe 提交于
      Preparatory patch for checking queuing affinity.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      18887ad9
    • J
      block: split softirq handling into blk-softirq.c · b646fc59
      Jens Axboe 提交于
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b646fc59
    • T
      block: move stats from disk to part0 · 074a7aca
      Tejun Heo 提交于
      Move stats related fields - stamp, in_flight, dkstats - from disk to
      part0 and unify stat handling such that...
      
      * part_stat_*() now updates part0 together if the specified partition
        is not part0.  ie. part_stat_*() are now essentially all_stat_*().
      
      * {disk|all}_stat_*() are gone.
      
      * part_round_stats() is updated similary.  It handles part0 stats
        automatically and disk_round_stats() is killed.
      
      * part_{inc|dec}_in_fligh() is implemented which automatically updates
        part0 stats for parts other than part0.
      
      * disk_map_sector_rcu() is updated to return part0 if no part matches.
        Combined with the above changes, this makes NULL special case
        handling in callers unnecessary.
      
      * Separate stats show code paths for disk are collapsed into part
        stats show code paths.
      
      * Rename disk_stat_lock/unlock() to part_stat_lock/unlock()
      
      While at it, reposition stat handling macros a bit and add missing
      parentheses around macro parameters.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      074a7aca
    • T
      block: kill GENHD_FL_FAIL and use part0->make_it_fail · eddb2e26
      Tejun Heo 提交于
      GENHD_FL_FAIL for disk is what make_it_fail is for parts.  Kill it and
      use part0->make_it_fail.  Sysfs node handling is unified too.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      eddb2e26