1. 01 7月, 2009 6 次提交
  2. 21 6月, 2009 1 次提交
  3. 19 6月, 2009 2 次提交
  4. 18 6月, 2009 1 次提交
  5. 16 6月, 2009 7 次提交
  6. 12 6月, 2009 1 次提交
  7. 11 6月, 2009 2 次提交
    • K
      block: add request clone interface (v2) · b0fd271d
      Kiyoshi Ueda 提交于
      This patch adds the following 2 interfaces for request-stacking drivers:
      
        - blk_rq_prep_clone(struct request *clone, struct request *orig,
      		      struct bio_set *bs, gfp_t gfp_mask,
      		      int (*bio_ctr)(struct bio *, struct bio*, void *),
      		      void *data)
            * Clones bios in the original request to the clone request
              (bio_ctr is called for each cloned bios.)
            * Copies attributes of the original request to the clone request.
              The actual data parts (e.g. ->cmd, ->buffer, ->sense) are not
              copied.
      
        - blk_rq_unprep_clone(struct request *clone)
            * Frees cloned bios from the clone request.
      
      Request stacking drivers (e.g. request-based dm) need to make a clone
      request for a submitted request and dispatch it to other devices.
      
      To allocate request for the clone, request stacking drivers may not
      be able to use blk_get_request() because the allocation may be done
      in an irq-disabled context.
      So blk_rq_prep_clone() takes a request allocated by the caller
      as an argument.
      
      For each clone bio in the clone request, request stacking drivers
      should be able to set up their own completion handler.
      So blk_rq_prep_clone() takes a callback function which is called
      for each clone bio, and a pointer for private data which is passed
      to the callback.
      
      NOTE:
      blk_rq_prep_clone() doesn't copy any actual data of the original
      request.  Pages are shared between original bios and cloned bios.
      So caller must not complete the original request before the clone
      request.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b0fd271d
    • N
      block: prevent possible io_context->refcount overflow · d9c7d394
      Nikanth Karthikesan 提交于
      Currently io_context has an atomic_t(32-bit) as refcount.  In the case of
      cfq, for each device against whcih a task does I/O, a reference to the
      io_context would be taken.  And when there are multiple process sharing
      io_contexts(CLONE_IO) would also have a reference to the same io_context.
      
      Theoretically the possible maximum number of processes sharing the same
      io_context + the number of disks/cfq_data referring to the same io_context
      can overflow the 32-bit counter on a very high-end machine.
      
      Even though it is an improbable case, let us make it atomic_long_t.
      Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      d9c7d394
  8. 10 6月, 2009 1 次提交
    • L
      tracing/events: convert block trace points to TRACE_EVENT() · 55782138
      Li Zefan 提交于
      TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
      these new capabilities to this tracepoint:
      
        - zero-copy and per-cpu splice() tracing
        - binary tracing without printf overhead
        - structured logging records exposed under /debug/tracing/events
        - trace events embedded in function tracer output and other plugins
        - user-defined, per tracepoint filter expressions
        ...
      
      Cons:
      
        - no dev_t info for the output of plug, unplug_timer and unplug_io events.
          no dev_t info for getrq and sleeprq events if bio == NULL.
          no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.
      
          This is mainly because we can't get the deivce from a request queue.
          But this may change in the future.
      
        - A packet command is converted to a string in TP_assign, not TP_print.
          While blktrace do the convertion just before output.
      
          Since pc requests should be rather rare, this is not a big issue.
      
        - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
          has a unique format, which means we have some unused data in a trace entry.
      
          The overhead is minimized by using __dynamic_array() instead of __array().
      
      I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:
      
            dd                   dd + ioctl blktrace       dd + TRACE_EVENT (splice)
      1     7.36s, 42.7 MB/s     7.50s, 42.0 MB/s          7.41s, 42.5 MB/s
      2     7.43s, 42.3 MB/s     7.48s, 42.1 MB/s          7.43s, 42.4 MB/s
      3     7.38s, 42.6 MB/s     7.45s, 42.2 MB/s          7.41s, 42.5 MB/s
      
      So the overhead of tracing is very small, and no regression when using
      those trace events vs blktrace.
      
      And the binary output of TRACE_EVENT is much smaller than blktrace:
      
       # ls -l -h
       -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
       -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
       -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out
      
      Following are some comparisons between TRACE_EVENT and blktrace:
      
      plug:
        kjournald-480   [000]   303.084981: block_plug: [kjournald]
        kjournald-480   [000]   303.084981:   8,0    P   N [kjournald]
      
      unplug_io:
        kblockd/0-118   [000]   300.052973: block_unplug_io: [kblockd/0] 1
        kblockd/0-118   [000]   300.052974:   8,0    U   N [kblockd/0] 1
      
      remap:
        kjournald-480   [000]   303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384
        kjournald-480   [000]   303.085043:   8,0    A   W 102736992 + 8 <- (8,8) 33384
      
      bio_backmerge:
        kjournald-480   [000]   303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald]
        kjournald-480   [000]   303.085086:   8,0    M   W 102737032 + 8 [kjournald]
      
      getrq:
        kjournald-480   [000]   303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084975:   8,0    G   W 102736984 + 8 [kjournald]
      
        bash-2066  [001]  1072.953770:   8,0    G   N [bash]
        bash-2066  [001]  1072.953773: block_getrq: 0,0 N 0 + 0 [bash]
      
      rq_complete:
        konsole-2065  [001]   300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0]
        konsole-2065  [001]   300.053191:   8,0    C   W 103669040 + 16 [0]
      
        ksoftirqd/1-7   [001]  1072.953811:   8,0    C   N (5a 00 08 00 00 00 00 00 24 00) [0]
        ksoftirqd/1-7   [001]  1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0]
      
      rq_insert:
        kjournald-480   [000]   303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084986:   8,0    I   W 102736984 + 8 [kjournald]
      
      Changelog from v2 -> v3:
      
      - use the newly introduced __dynamic_array().
      
      Changelog from v1 -> v2:
      
      - use __string() instead of __array() to minimize the memory required
        to store hex dump of rq->cmd().
      
      - support large pc requests.
      
      - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.
      
      - some cleanups.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      55782138
  9. 09 6月, 2009 4 次提交
  10. 03 6月, 2009 1 次提交
  11. 02 6月, 2009 1 次提交
  12. 30 5月, 2009 1 次提交
  13. 28 5月, 2009 1 次提交
  14. 27 5月, 2009 2 次提交
    • K
      block: fix no diskstat problem · 3c4198e8
      Kiyoshi Ueda 提交于
      The commit below in 2.6-block/for-2.6.31 causes no diskstat problem
      because the blk_discard_rq() check was added with '&&'.
      It should be 'blk_fs_request() || blk_discard_rq()'.
      This patch does it and fixes the no diskstat problem.
      Please review and apply.
      
      ------ /proc/diskstat without this patch -------------------------------------
         8       0 sda 0 0 0 0 0 0 0 0 0 0 0
      ------------------------------------------------------------------------------
      
      ----- /proc/diskstat with this patch applied ---------------------------------
         8       0 sda 4186 303 373621 61600 9578 3859 107468 169479 2 89755 231059
      ------------------------------------------------------------------------------
      
      --------------------------------------------------------------------------
      commit c69d4854
      Author: Jens Axboe <jens.axboe@oracle.com>
      Date:   Fri Apr 24 08:12:19 2009 +0200
      
          block: include discard requests in IO accounting
      
          We currently don't do merging on discard requests, but we potentially
          could. If we do, then we need to include discard requests in the IO
          accounting, or merging would end up decrementing in_flight IO counters
          for an IO which never incremented them.
      
          So enable accounting for discard requests.
      
      <snip>
      
       static inline int blk_do_io_stat(struct request *rq)
       {
      -       return rq->rq_disk && blk_rq_io_stat(rq) && blk_fs_request(rq);
      +       return rq->rq_disk && blk_rq_io_stat(rq) && blk_fs_request(rq) &&
      +               blk_discard_rq(rq);
       }
      --------------------------------------------------------------------------
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3c4198e8
    • J
      block: fix oops with block tag queueing · ba396a6c
      James Bottomley 提交于
      commit e8939a50466fd963eb1ba9118c34b9ffb7ff6aa6
      Author: Tejun Heo <tj@kernel.org>
      Date:   Fri May 8 11:54:16 2009 +0900
      
          block: implement and enforce request peek/start/fetch
      
      Added a BUG_ON(blk_queued_rq(req)) to the top of blk_finish_req().
      Unfortunately, this checks whether req->queuelist is empty.  This list
      is doing double duty both as the queue list and the tag list, so tagged
      requests come in here with this not empty and boom (the tag list is
      emptied by blk_queue_end_tag() lower down).
      
      Fix this by moving the BUG_ON to below the end tag we also seem
      vulnerable to this in blk_requeue_request() as well.  I think all uses
      of blk_queued_rq() need auditing because the check is clearly wrong in
      the tagged case.
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      ba396a6c
  15. 23 5月, 2009 5 次提交
  16. 20 5月, 2009 2 次提交
  17. 19 5月, 2009 2 次提交