1. 02 11月, 2009 1 次提交
  2. 02 10月, 2009 1 次提交
  3. 11 7月, 2009 1 次提交
    • F
      block: fix sg SG_DXFER_TO_FROM_DEV regression · ecb554a8
      FUJITA Tomonori 提交于
      I overlooked SG_DXFER_TO_FROM_DEV support when I converted sg to use
      the block layer mapping API (2.6.28).
      
      Douglas Gilbert explained SG_DXFER_TO_FROM_DEV:
      
      http://www.spinics.net/lists/linux-scsi/msg37135.html
      
      =
      The semantics of SG_DXFER_TO_FROM_DEV were:
         - copy user space buffer to kernel (LLD) buffer
         - do SCSI command which is assumed to be of the DATA_IN
           (data from device) variety. This would overwrite
           some or all of the kernel buffer
         - copy kernel (LLD) buffer back to the user space.
      
      The idea was to detect short reads by filling the original
      user space buffer with some marker bytes ("0xec" it would
      seem in this report). The "resid" value is a better way
      of detecting short reads but that was only added this century
      and requires co-operation from the LLD.
      =
      
      This patch changes the block layer mapping API to support this
      semantics. This simply adds another field to struct rq_map_data and
      enables __bio_copy_iov() to copy data from user space even with READ
      requests.
      
      It's better to add the flags field and kills null_mapped and the new
      from_user fields in struct rq_map_data but that approach makes it
      difficult to send this patch to stable trees because st and osst
      drivers use struct rq_map_data (they were converted to use the block
      layer in 2.6.29 and 2.6.30). Well, I should clean up the block layer
      mapping API.
      
      zhou sf reported this regiression and tested this patch:
      
      http://www.spinics.net/lists/linux-scsi/msg37128.html
      http://www.spinics.net/lists/linux-scsi/msg37168.htmlReported-by: Nzhou sf <sxzzsf@gmail.com>
      Tested-by: Nzhou sf <sxzzsf@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      ecb554a8
  4. 01 7月, 2009 1 次提交
  5. 16 6月, 2009 1 次提交
  6. 13 6月, 2009 1 次提交
  7. 11 6月, 2009 1 次提交
  8. 10 6月, 2009 1 次提交
    • L
      tracing/events: convert block trace points to TRACE_EVENT() · 55782138
      Li Zefan 提交于
      TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
      these new capabilities to this tracepoint:
      
        - zero-copy and per-cpu splice() tracing
        - binary tracing without printf overhead
        - structured logging records exposed under /debug/tracing/events
        - trace events embedded in function tracer output and other plugins
        - user-defined, per tracepoint filter expressions
        ...
      
      Cons:
      
        - no dev_t info for the output of plug, unplug_timer and unplug_io events.
          no dev_t info for getrq and sleeprq events if bio == NULL.
          no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.
      
          This is mainly because we can't get the deivce from a request queue.
          But this may change in the future.
      
        - A packet command is converted to a string in TP_assign, not TP_print.
          While blktrace do the convertion just before output.
      
          Since pc requests should be rather rare, this is not a big issue.
      
        - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
          has a unique format, which means we have some unused data in a trace entry.
      
          The overhead is minimized by using __dynamic_array() instead of __array().
      
      I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:
      
            dd                   dd + ioctl blktrace       dd + TRACE_EVENT (splice)
      1     7.36s, 42.7 MB/s     7.50s, 42.0 MB/s          7.41s, 42.5 MB/s
      2     7.43s, 42.3 MB/s     7.48s, 42.1 MB/s          7.43s, 42.4 MB/s
      3     7.38s, 42.6 MB/s     7.45s, 42.2 MB/s          7.41s, 42.5 MB/s
      
      So the overhead of tracing is very small, and no regression when using
      those trace events vs blktrace.
      
      And the binary output of TRACE_EVENT is much smaller than blktrace:
      
       # ls -l -h
       -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
       -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
       -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out
      
      Following are some comparisons between TRACE_EVENT and blktrace:
      
      plug:
        kjournald-480   [000]   303.084981: block_plug: [kjournald]
        kjournald-480   [000]   303.084981:   8,0    P   N [kjournald]
      
      unplug_io:
        kblockd/0-118   [000]   300.052973: block_unplug_io: [kblockd/0] 1
        kblockd/0-118   [000]   300.052974:   8,0    U   N [kblockd/0] 1
      
      remap:
        kjournald-480   [000]   303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384
        kjournald-480   [000]   303.085043:   8,0    A   W 102736992 + 8 <- (8,8) 33384
      
      bio_backmerge:
        kjournald-480   [000]   303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald]
        kjournald-480   [000]   303.085086:   8,0    M   W 102737032 + 8 [kjournald]
      
      getrq:
        kjournald-480   [000]   303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084975:   8,0    G   W 102736984 + 8 [kjournald]
      
        bash-2066  [001]  1072.953770:   8,0    G   N [bash]
        bash-2066  [001]  1072.953773: block_getrq: 0,0 N 0 + 0 [bash]
      
      rq_complete:
        konsole-2065  [001]   300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0]
        konsole-2065  [001]   300.053191:   8,0    C   W 103669040 + 16 [0]
      
        ksoftirqd/1-7   [001]  1072.953811:   8,0    C   N (5a 00 08 00 00 00 00 00 24 00) [0]
        ksoftirqd/1-7   [001]  1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0]
      
      rq_insert:
        kjournald-480   [000]   303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084986:   8,0    I   W 102736984 + 8 [kjournald]
      
      Changelog from v2 -> v3:
      
      - use the newly introduced __dynamic_array().
      
      Changelog from v1 -> v2:
      
      - use __string() instead of __array() to minimize the memory required
        to store hex dump of rq->cmd().
      
      - support large pc requests.
      
      - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.
      
      - some cleanups.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      55782138
  9. 23 5月, 2009 2 次提交
  10. 19 5月, 2009 1 次提交
    • T
      bio: always copy back data for copied kernel requests · 4fc981ef
      Tejun Heo 提交于
      When a read bio_copy_kern() request fails, the content of the bounce
      buffer is not copied back.  However, as request failure doesn't
      necessarily mean complete failure, the buffer state can be useful.
      This behavior is also inconsistent with the user map counterpart and
      causes the subtle difference between bounced and unbounced IO causes
      confusion.
      
      This patch makes bio_copy_kern_endio() ignore @err and always copy
      back data on request completion.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      4fc981ef
  11. 29 4月, 2009 1 次提交
  12. 22 4月, 2009 2 次提交
    • T
      bio: use bio_kmalloc() in copy/map functions · a9e9dc24
      Tejun Heo 提交于
      Impact: remove possible deadlock condition
      
      There is no reason to use mempool backed allocation for map functions.
      Also, because kern mapping is used inside LLDs (e.g. for EH), using
      mempool backed allocation can lead to deadlock under extreme
      conditions (mempool already consumed by the time a request reached EH
      and requests are blocked on EH).
      
      Switch copy/map functions to bio_kmalloc().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a9e9dc24
    • T
      bio: fix bio_kmalloc() · 451a9ebf
      Tejun Heo 提交于
      Impact: fix bio_kmalloc() and its destruction path
      
      bio_kmalloc() was broken in two ways.
      
      * bvec_alloc_bs() first allocates bvec using kmalloc() and then
        ignores it and allocates again like non-kmalloc bvecs.
      
      * bio_kmalloc_destructor() didn't check for and free bio integrity
        data.
      
      This patch fixes the above problems.  kmalloc patch is separated out
      from bio_alloc_bioset() and allocates the requested number of bvecs as
      inline bvecs.
      
      * bio_alloc_bioset() no longer takes NULL @bs.  None other than
        bio_kmalloc() used it and outside users can't know how it was
        allocated anyway.
      
      * Define and use BIO_POOL_NONE so that pool index check in
        bvec_free_bs() triggers if inline or kmalloc allocated bvec gets
        there.
      
      * Relocate destructors on top of each allocation function so that how
        they're used is more clear.
      
      Jens Axboe suggested allocating bvecs inline.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      451a9ebf
  13. 15 4月, 2009 1 次提交
  14. 30 3月, 2009 1 次提交
  15. 24 3月, 2009 3 次提交
  16. 15 3月, 2009 2 次提交
  17. 26 2月, 2009 1 次提交
  18. 18 2月, 2009 1 次提交
  19. 03 1月, 2009 3 次提交
  20. 29 12月, 2008 5 次提交
    • J
      bio: get rid of bio_vec clearing · d3f76110
      Jens Axboe 提交于
      We don't need to clear the memory used for adding bio_vec entries,
      since nobody should be looking at members unitialized. Any valid
      use should be below bio->bi_vcnt, and that members up until that count
      must be valid since they were added through bio_add_page().
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      d3f76110
    • J
      bio: add support for inlining a number of bio_vecs inside the bio · 392ddc32
      Jens Axboe 提交于
      When we go and allocate a bio for IO, we actually do two allocations.
      One for the bio itself, and one for the bi_io_vec that holds the
      actual pages we are interested in.
      
      This feature inlines a definable amount of io vecs inside the bio
      itself, so we eliminate the bio_vec array allocation for IO's up
      to a certain size. It defaults to 4 vecs, which is typically 16k
      of IO.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      392ddc32
    • J
      bio: allow individual slabs in the bio_set · bb799ca0
      Jens Axboe 提交于
      Instead of having a global bio slab cache, add a reference to one
      in each bio_set that is created. This allows for personalized slabs
      in each bio_set, so that they can have bios of different sizes.
      
      This means we can personalize the bios we return. File systems may
      want to embed the bio inside another structure, to avoid allocation
      more items (and stuffing them in ->bi_private) after the get a bio.
      Or we may want to embed a number of bio_vecs directly at the end
      of a bio, to avoid doing two allocations to return a bio. This is now
      possible.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      bb799ca0
    • J
      bio: move the slab pointer inside the bio_set · 1b434498
      Jens Axboe 提交于
      In preparation for adding differently sized bios.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1b434498
    • J
      bio: only mempool back the largest bio_vec slab cache · 7ff9345f
      Jens Axboe 提交于
      We only very rarely need the mempool backing, so it makes sense to
      get rid of all but one of the mempool in a bio_set. So keep the
      largest bio_vec count mempool so we can always honor the largest
      allocation, and "upgrade" callers that fail.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      7ff9345f
  21. 26 11月, 2008 2 次提交
  22. 09 10月, 2008 7 次提交