1. 11 9月, 2009 1 次提交
    • T
      block: use the same failfast bits for bio and request · a82afdfc
      Tejun Heo 提交于
      bio and request use the same set of failfast bits.  This patch makes
      the following changes to simplify things.
      
      * enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_*
        bits coincide with __REQ_FAILFAST_* bits.
      
      * The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV
        but the matching is useless anyway.  init_request_from_bio() is
        responsible for setting FAILFAST bits on FS requests and non-FS
        requests never use BIO_RW_AHEAD.  Drop the code and comment from
        blk_rq_bio_prep().
      
      * Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and
        simplify FAILFAST flags handling in init_request_from_bio().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a82afdfc
  2. 01 7月, 2009 1 次提交
  3. 15 6月, 2009 1 次提交
  4. 23 5月, 2009 1 次提交
  5. 11 5月, 2009 1 次提交
    • T
      block: drop request->hard_* and *nr_sectors · 2e46e8b2
      Tejun Heo 提交于
      struct request has had a few different ways to represent some
      properties of a request.  ->hard_* represent block layer's view of the
      request progress (completion cursor) and the ones without the prefix
      are supposed to represent the issue cursor and allowed to be updated
      as necessary by the low level drivers.  The thing is that as block
      layer supports partial completion, the two cursors really aren't
      necessary and only cause confusion.  In addition, manual management of
      request detail from low level drivers is cumbersome and error-prone at
      the very least.
      
      Another interesting duplicate fields are rq->[hard_]nr_sectors and
      rq->{hard_cur|current}_nr_sectors against rq->data_len and
      rq->bio->bi_size.  This is more convoluted than the hard_ case.
      
      rq->[hard_]nr_sectors are initialized for requests with bio but
      blk_rq_bytes() uses it only for !pc requests.  rq->data_len is
      initialized for all request but blk_rq_bytes() uses it only for pc
      requests.  This causes good amount of confusion throughout block layer
      and its drivers and determining the request length has been a bit of
      black magic which may or may not work depending on circumstances and
      what the specific LLD is actually doing.
      
      rq->{hard_cur|current}_nr_sectors represent the number of sectors in
      the contiguous data area at the front.  This is mainly used by drivers
      which transfers data by walking request segment-by-segment.  This
      value always equals rq->bio->bi_size >> 9.  However, data length for
      pc requests may not be multiple of 512 bytes and using this field
      becomes a bit confusing.
      
      In general, having multiple fields to represent the same property
      leads only to confusion and subtle bugs.  With recent block low level
      driver cleanups, no driver is accessing or manipulating these
      duplicate fields directly.  Drop all the duplicates.  Now rq->sector
      means the current sector, rq->data_len the current total length and
      rq->bio->bi_size the current segment length.  Everything else is
      defined in terms of these three and available only through accessors.
      
      * blk_recalc_rq_sectors() is collapsed into blk_update_request() and
        now handles pc and fs requests equally other than rq->sector update.
        This means that now pc requests can use partial completion too (no
        in-kernel user yet tho).
      
      * bio_cur_sectors() is replaced with bio_cur_bytes() as block layer
        now uses byte count as the primary data length.
      
      * blk_rq_pos() is now guranteed to be always correct.  In-block users
        converted.
      
      * blk_rq_bytes() is now guaranteed to be always valid as is
        blk_rq_sectors().  In-block users converted.
      
      * blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9.
        More convenient one is used.
      
      * blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const
        pointer to request.
      
      [ Impact: API cleanup, single way to represent one property of a request ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      2e46e8b2
  6. 28 4月, 2009 1 次提交
  7. 22 4月, 2009 1 次提交
    • T
      bio: fix bio_kmalloc() · 451a9ebf
      Tejun Heo 提交于
      Impact: fix bio_kmalloc() and its destruction path
      
      bio_kmalloc() was broken in two ways.
      
      * bvec_alloc_bs() first allocates bvec using kmalloc() and then
        ignores it and allocates again like non-kmalloc bvecs.
      
      * bio_kmalloc_destructor() didn't check for and free bio integrity
        data.
      
      This patch fixes the above problems.  kmalloc patch is separated out
      from bio_alloc_bioset() and allocates the requested number of bvecs as
      inline bvecs.
      
      * bio_alloc_bioset() no longer takes NULL @bs.  None other than
        bio_kmalloc() used it and outside users can't know how it was
        allocated anyway.
      
      * Define and use BIO_POOL_NONE so that pool index check in
        bvec_free_bs() triggers if inline or kmalloc allocated bvec gets
        there.
      
      * Relocate destructors on top of each allocation function so that how
        they're used is more clear.
      
      Jens Axboe suggested allocating bvecs inline.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      451a9ebf
  8. 15 4月, 2009 1 次提交
  9. 06 4月, 2009 1 次提交
  10. 24 3月, 2009 1 次提交
  11. 15 3月, 2009 1 次提交
  12. 18 2月, 2009 1 次提交
  13. 02 2月, 2009 2 次提交
  14. 30 1月, 2009 4 次提交
  15. 29 12月, 2008 6 次提交
    • J
      bio: add support for inlining a number of bio_vecs inside the bio · 392ddc32
      Jens Axboe 提交于
      When we go and allocate a bio for IO, we actually do two allocations.
      One for the bio itself, and one for the bi_io_vec that holds the
      actual pages we are interested in.
      
      This feature inlines a definable amount of io vecs inside the bio
      itself, so we eliminate the bio_vec array allocation for IO's up
      to a certain size. It defaults to 4 vecs, which is typically 16k
      of IO.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      392ddc32
    • J
      bio: allow individual slabs in the bio_set · bb799ca0
      Jens Axboe 提交于
      Instead of having a global bio slab cache, add a reference to one
      in each bio_set that is created. This allows for personalized slabs
      in each bio_set, so that they can have bios of different sizes.
      
      This means we can personalize the bios we return. File systems may
      want to embed the bio inside another structure, to avoid allocation
      more items (and stuffing them in ->bi_private) after the get a bio.
      Or we may want to embed a number of bio_vecs directly at the end
      of a bio, to avoid doing two allocations to return a bio. This is now
      possible.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      bb799ca0
    • J
      bio: move the slab pointer inside the bio_set · 1b434498
      Jens Axboe 提交于
      In preparation for adding differently sized bios.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1b434498
    • J
      bio: only mempool back the largest bio_vec slab cache · 7ff9345f
      Jens Axboe 提交于
      We only very rarely need the mempool backing, so it makes sense to
      get rid of all but one of the mempool in a bio_set. So keep the
      largest bio_vec count mempool so we can always honor the largest
      allocation, and "upgrade" callers that fail.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      7ff9345f
    • R
      block: reorder struct bio to remove padding on 64bit · ba744d5e
      Richard Kennedy 提交于
      Remove 8 bytes of padding from struct bio which also removes 16 bytes from
      struct bio_pair to make it 248 bytes.  bio_pair then fits into one fewer
      cache lines & into a smaller slab.
      Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      ba744d5e
    • K
      block: Supress Buffer I/O errors when SCSI REQ_QUIET flag set · 08bafc03
      Keith Mannthey 提交于
      Allow the scsi request REQ_QUIET flag to be propagated to the buffer
      file system layer. The basic ideas is to pass the flag from the scsi
      request to the bio (block IO) and then to the buffer layer.  The buffer
      layer can then suppress needless printks.
      
      This patch declutters the kernel log by removed the 40-50 (per lun)
      buffer io error messages seen during a boot in my multipath setup . It
      is a good chance any real errors will be missed in the "noise" it the
      logs without this patch.
      
      During boot I see blocks of messages like
      "
      __ratelimit: 211 callbacks suppressed
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242847
      Buffer I/O error on device sdm, logical block 1
      Buffer I/O error on device sdm, logical block 5242878
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242879
      Buffer I/O error on device sdm, logical block 5242872
      "
      in my logs.
      
      My disk environment is multipath fiber channel using the SCSI_DH_RDAC
      code and multipathd.  This topology includes an "active" and "ghost"
      path for each lun. IO's to the "ghost" path will never complete and the
      SCSI layer, via the scsi device handler rdac code, quick returns the IOs
      to theses paths and sets the REQ_QUIET scsi flag to suppress the scsi
      layer messages.
      
       I am wanting to extend the QUIET behavior to include the buffer file
      system layer to deal with these errors as well. I have been running this
      patch for a while now on several boxes without issue.  A few runs of
      bonnie++ show no noticeable difference in performance in my setup.
      
      Thanks for John Stultz for the quiet_error finalization.
      Submitted-by: NKeith Mannthey <kmannth@us.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      08bafc03
  16. 06 11月, 2008 1 次提交
  17. 17 10月, 2008 1 次提交
    • F
      block: fix nr_phys_segments miscalculation bug · 86771427
      FUJITA Tomonori 提交于
      This fixes the bug reported by Nikanth Karthikesan <knikanth@suse.de>:
      
      http://lkml.org/lkml/2008/10/2/203
      
      The root cause of the bug is that blk_phys_contig_segment
      miscalculates q->max_segment_size.
      
      blk_phys_contig_segment checks:
      
      req->biotail->bi_size + next_req->bio->bi_size > q->max_segment_size
      
      But blk_recalc_rq_segments might expect that req->biotail and the
      previous bio in the req are supposed be merged into one
      segment. blk_recalc_rq_segments might also expect that next_req->bio
      and the next bio in the next_req are supposed be merged into one
      segment. In such case, we merge two requests that can't be merged
      here. Later, blk_rq_map_sg gives more segments than it should.
      
      We need to keep track of segment size in blk_recalc_rq_segments and
      use it to see if two requests can be merged. This patch implements it
      in the similar way that we used to do for hw merging (virtual
      merging).
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      86771427
  18. 13 10月, 2008 1 次提交
    • M
      [SCSI] block: separate failfast into multiple bits. · 6000a368
      Mike Christie 提交于
      Multipath is best at handling transport errors. If it gets a device
      error then there is not much the multipath layer can do. It will just
      access the same device but from a different path.
      
      This patch breaks up failfast into device, transport and driver errors.
      The multipath layers (md and dm mutlipath) only ask the lower levels to
      fast fail transport errors. The user of failfast, read ahead, will ask
      to fast fail on all errors.
      
      Note that blk_noretry_request will return true if any failfast bit
      is set. This allows drivers that do not support the multipath failfast
      bits to continue to fail on any failfast error like before. Drivers
      like scsi that are able to fail fast specific errors can check
      for the specific fail fast type. In the next patch I will convert
      scsi.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      6000a368
  19. 09 10月, 2008 13 次提交