1. 18 2月, 2009 4 次提交
    • H
      block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list · be987fdb
      Hannes Reinecke 提交于
      blk_abort_queue() iterates the timeout list and aborts each request on the
      list, but if the driver error handling readds a request to the timeout list
      during this processing, we could be looping forever. Fix this by splicing
      current entries to a local list and run over that list instead.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      be987fdb
    • N
      block: fix booting from partitioned md array · 41b8c853
      Neil Brown 提交于
      Hi Tejun,
      
       it looks like your commit:
      
         block: don't depend on consecutive minor space
         f331c029
      
       broke a particular case for booting from partitioned md/raid devices.
       That is the second time this has been broken recently.  The previous
       time was fixed by
      
         block: do_mounts - accept root=<non-existant partition>
         30f2f0eb
      
       Because the data isn't available when an md device is first created
       (we add disks and set it up after creation), the initial partition
       scan finds nothing.  It is not until the device is opened that
       another partition scan happens and finds something.
      
       So at the point where the kernel parameter "root=/dev/md_d0p1" is
       being parsed, md_d0 exists, but md_d0p1 does not.
       However if we let blk_lookup_devt return the correct device number
       even though the device doesn't exist, then the attempt to mount it
       will successfully find the partition.
      
       I have tried in the past to find a way to get the partition table to
       be read as soon as the array is assembled but that proved impossible
       (at the time).  I don't remember the details, and could possibly
       revisit it.  However it would be really nice if blk_lookup_devt
       could be adjusted to again accept non existant partitions.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      41b8c853
    • J
      block: fix bad definition of BIO_RW_SYNC · 93dbb393
      Jens Axboe 提交于
      We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO
      and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before
      213d9417.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      93dbb393
    • B
      bsg: Fix sense buffer bug in SG_IO · c1c20120
      Boaz Harrosh 提交于
      When submitting requests via SG_IO, which does a sync io, a
      bsg_command is not allocated. So an in-Kernel sense_buffer was not
      set. However when calling blk_execute_rq() with no sense buffer
      one is provided from the stack. Now bsg at blk_complete_sgv4_hdr_rq()
      would check if rq->sense_len and a sense was requested by sg_io_v4
      the rq->sense was copy_user() back, but by now it is already mangled
      stack memory.
      
      I have fixed that by forcing a sense_buffer when calling bsg_map_hdr().
      The bsg_command->sense is provided in the write/read path like before,
      and on-the-stack buffer is provided when doing SG_IO.
      
      I have also fixed a dprintk message to print rq->errors in hex because
      of the scsi bit-field use of this member. For other block devices it
      does not matter anyway.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Acked-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c1c20120
  2. 02 2月, 2009 1 次提交
  3. 30 1月, 2009 8 次提交
  4. 07 1月, 2009 2 次提交
  5. 03 1月, 2009 2 次提交
  6. 29 12月, 2008 22 次提交
    • J
      cfq-iosched: fix race between exiting queue and exiting task · 62c1fe9d
      Jens Axboe 提交于
      Original patch from Nikanth Karthikesan <knikanth@suse.de>
      
      When a queue exits the queue lock is taken and cfq_exit_queue() would free all
      the cic's associated with the queue.
      
      But when a task exits, cfq_exit_io_context() gets cic one by one and then
      locks the associated queue to call __cfq_exit_single_io_context. It looks like
      between getting a cic from the ioc and locking the queue, the queue might have
      exited on another cpu.
      
      Fix this by rechecking the cfq_io_context queue key inside the queue lock
      again, and not calling into __cfq_exit_single_io_context() if somebody
      beat us to it.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      62c1fe9d
    • J
      Get rid of CONFIG_LSF · b3a6ffe1
      Jens Axboe 提交于
      We have two seperate config entries for large devices/files. One
      is CONFIG_LBD that guards just the devices, the other is CONFIG_LSF
      that handles large files. This doesn't make a lot of sense, you typically
      want both or none. So get rid of CONFIG_LSF and change CONFIG_LBD wording
      to indicate that it covers both.
      Acked-by: NJean Delvare <khali@linux-fr.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b3a6ffe1
    • R
      block: make blk_softirq_init() static · 3c18ce71
      Roel Kluin 提交于
      Sparse asked whether these could be static.
      Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3c18ce71
    • F
      block: use min_not_zero in blk_queue_stack_limits · 18af8b2c
      FUJITA Tomonori 提交于
      zero is invalid for max_phys_segments, max_hw_segments, and
      max_segment_size. It's better to use use min_not_zero instead of
      min. min() works though (because the commit 0e435ac2 makes sure that
      these values are set to the default values, non zero, if a queue is
      initialized properly).
      
      With this patch, blk_queue_stack_limits does the almost same thing
      that dm's combine_restrictions_low() does. I think that it's easy to
      remove dm's combine_restrictions_low.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      18af8b2c
    • J
      block: add one-hit cache for disk partition lookup · a6f23657
      Jens Axboe 提交于
      disk_map_sector_rcu() returns a partition from a sector offset,
      which we use for IO statistics on a per-partition basis. The
      lookup itself is an O(N) list lookup, where N is the number of
      partitions. This actually hurts performance quite a bit, even
      on the lower end partitions. On higher numbered partitions,
      it can get pretty bad.
      
      Solve this by adding a one-hit cache for partition lookup.
      This makes the lookup O(1) for the case where we do most IO to
      one partition. Even for mixed partition workloads, amortized cost
      is pretty close to O(1) since the natural IO batching makes the
      one-hit cache last for lots of IOs.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a6f23657
    • J
      cfq-iosched: remove limit of dispatch depth of max 4 times quantum · 30e0dc28
      Jens Axboe 提交于
      This basically limits the hardware queue depth to 4*quantum at any
      point in time, which is 16 with the default settings. As CFQ uses
      other means to shrink the hardware queue when necessary in the first
      place, there's really no need for this extra heuristic. Additionally,
      it ends up hurting performance in some cases.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      30e0dc28
    • J
      block: get rid of elevator_t typedef · b374d18a
      Jens Axboe 提交于
      Just use struct elevator_queue everywhere instead.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b374d18a
    • J
      block: don't use plugging on SSD devices · a31a9738
      Jens Axboe 提交于
      We just want to hand the first bits of IO to the device as fast
      as possible. Gains a few percent on the IOPS rate.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a31a9738
    • T
      block: fix empty barrier on write-through w/ ordered tag · a185eb4b
      Tejun Heo 提交于
      Empty barrier on write-through (or no cache) w/ ordered tag has no
      command to execute and without any command to execute ordered tag is
      never issued to the device and the ordering is never achieved.  Force
      draining for such cases.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a185eb4b
    • T
      block: simplify empty barrier implementation · 58eea927
      Tejun Heo 提交于
      Empty barrier required special handling in __elv_next_request() to
      complete it without letting the low level driver see it.
      
      With previous changes, barrier code is now flexible enough to skip the
      BAR step using the same barrier sequence selection mechanism.  Drop
      the special handling and mask off q->ordered from start_ordered().
      
      Remove blk_empty_barrier() test which now has no user.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      58eea927
    • T
      block: make barrier completion more robust · 8f11b3e9
      Tejun Heo 提交于
      Barrier completion had the following assumptions.
      
      * start_ordered() couldn't finish the whole sequence properly.  If all
        actions are to be skipped, q->ordseq is set correctly but the actual
        completion was never triggered thus hanging the barrier request.
      
      * Drain completion in elv_complete_request() assumed that there's
        always at least one request in the queue when drain completes.
      
      Both assumptions are true but these assumptions need to be removed to
      improve empty barrier implementation.  This patch makes the following
      changes.
      
      * Make start_ordered() use blk_ordered_complete_seq() to mark skipped
        steps complete and notify __elv_next_request() that it should fetch
        the next request if the whole barrier has completed inside
        start_ordered().
      
      * Make drain completion path in elv_complete_request() check whether
        the queue is empty.  Empty queue also indicates drain completion.
      
      * While at it, convert 0/1 return from blk_do_ordered() to false/true.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      8f11b3e9
    • T
      block: make every barrier action optional · f671620e
      Tejun Heo 提交于
      In all barrier sequences, the barrier write itself was always assumed
      to be issued and thus didn't have corresponding control flag.  This
      patch adds QUEUE_ORDERED_DO_BAR and unify action mask handling in
      start_ordered() such that any barrier action can be skipped.
      
      This patch doesn't introduce any visible behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      f671620e
    • T
      block: remove duplicate or unused barrier/discard error paths · a7384677
      Tejun Heo 提交于
      * Because barrier mode can be changed dynamically, whether barrier is
        supported or not can be determined only when actually issuing the
        barrier and there is no point in checking it earlier.  Drop barrier
        support check in generic_make_request() and __make_request(), and
        update comment around the support check in blk_do_ordered().
      
      * There is no reason to check discard support in both
        generic_make_request() and __make_request().  Drop the check in
        __make_request().  While at it, move error action block to the end
        of the function and add unlikely() to q existence test.
      
      * Barrier request, be it empty or not, is never passed to low level
        driver and thus it's meaningless to try to copy back req->sector to
        bio->bi_sector on error.  In addition, the notion of failed sector
        doesn't make any sense for empty barrier to begin with.  Drop the
        code block from __end_that_request_first().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a7384677
    • T
      block: reorganize QUEUE_ORDERED_* constants · 313e4299
      Tejun Heo 提交于
      Separate out ordering type (drain,) and action masks (preflush,
      postflush, fua) from visible ordering mode selectors
      (QUEUE_ORDERED_*).  Ordering types are now named QUEUE_ORDERED_BY_*
      while action masks are named QUEUE_ORDERED_DO_*.
      
      This change is necessary to add QUEUE_ORDERED_DO_BAR and make it
      optional to improve empty barrier implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      313e4299
    • C
      block: use cancel_work_sync() instead of kblockd_flush_work() · 64d01dc9
      Cheng Renquan 提交于
      After many improvements on kblockd_flush_work, it is now identical to
      cancel_work_sync, so a direct call to cancel_work_sync is suggested.
      
      The only difference is that cancel_work_sync is a GPL symbol,
      so no non-GPL modules anymore.
      Signed-off-by: NCheng Renquan <crquan@gmail.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      64d01dc9
    • K
      block: Supress Buffer I/O errors when SCSI REQ_QUIET flag set · 08bafc03
      Keith Mannthey 提交于
      Allow the scsi request REQ_QUIET flag to be propagated to the buffer
      file system layer. The basic ideas is to pass the flag from the scsi
      request to the bio (block IO) and then to the buffer layer.  The buffer
      layer can then suppress needless printks.
      
      This patch declutters the kernel log by removed the 40-50 (per lun)
      buffer io error messages seen during a boot in my multipath setup . It
      is a good chance any real errors will be missed in the "noise" it the
      logs without this patch.
      
      During boot I see blocks of messages like
      "
      __ratelimit: 211 callbacks suppressed
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242847
      Buffer I/O error on device sdm, logical block 1
      Buffer I/O error on device sdm, logical block 5242878
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242872
      "
      in my logs.
      
      My disk environment is multipath fiber channel using the SCSI_DH_RDAC
      code and multipathd.  This topology includes an "active" and "ghost"
      path for each lun. IO's to the "ghost" path will never complete and the
      SCSI layer, via the scsi device handler rdac code, quick returns the IOs
      to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi
      layer messages.
      
       I am wanting to extend the QUIET behavior to include the buffer file
      system layer to deal with these errors as well. I have been running this
      patch for a while now on several boxes without issue.  A few runs of
      bonnie++ show no noticeable difference in performance in my setup.
      
      Thanks for John Stultz for the quiet_error finalization.
      Submitted-by: NKeith Mannthey <kmannth@us.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      08bafc03
    • W
      block: don't take lock on changing ra_pages · 7c239517
      Wu Fengguang 提交于
      There's no need to take queue_lock or kernel_lock when modifying
      bdi->ra_pages. So remove them. Also remove out of date comment for
      queue_max_sectors_store().
      Signed-off-by: NWu Fengguang <wfg@linux.intel.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      7c239517
    • Q
      block/blk-tag.c: cleanup kernel-doc · c6a06f70
      Qinghuang Feng 提交于
      There is no argument named @tags in blk_init_tags,
      remove its' comment.
      Signed-off-by: NQinghuang Feng <qhfeng.kernel@gmail.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c6a06f70
    • M
      scsi-ioctl: use clock_t <> jiffies · 2b91bafc
      Milton Miller 提交于
      Convert the timeout ioctl scalling to use the clock_t functions
      which are much more accurate with some USER_HZ vs HZ combinations.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      2b91bafc
    • J
      block: leave the request timeout timer running even on an empty list · 70ed28b9
      Jens Axboe 提交于
      For sync IO, we'll often do them serialized. This means we'll be touching
      the queue timer for every IO, as opposed to only occasionally like we
      do for queued IO. Instead of deleting the timer when the last request
      is removed, just let continue running. If a new request comes up soon
      we then don't have to readd the timer again. If no new requests arrive,
      the timer will expire without side effect later.
      
      This improves high iops sync IO by ~1%.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      70ed28b9
    • J
    • M
      block: optimizations in blk_rq_timed_out_timer() · 565e411d
      malahal@us.ibm.com 提交于
      Now the rq->deadline can't be zero if the request is in the
      timeout_list, so there is no need to have next_set. There is no need to
      access a request's deadline field if blk_rq_timed_out is called on it.
      Signed-off-by: NMalahal Naineni <malahal@us.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      565e411d
  7. 26 12月, 2008 1 次提交