1. 18 6月, 2009 6 次提交
  2. 16 6月, 2009 21 次提交
  3. 10 6月, 2009 1 次提交
    • L
      tracing/events: convert block trace points to TRACE_EVENT() · 55782138
      Li Zefan 提交于
      TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
      these new capabilities to this tracepoint:
      
        - zero-copy and per-cpu splice() tracing
        - binary tracing without printf overhead
        - structured logging records exposed under /debug/tracing/events
        - trace events embedded in function tracer output and other plugins
        - user-defined, per tracepoint filter expressions
        ...
      
      Cons:
      
        - no dev_t info for the output of plug, unplug_timer and unplug_io events.
          no dev_t info for getrq and sleeprq events if bio == NULL.
          no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.
      
          This is mainly because we can't get the deivce from a request queue.
          But this may change in the future.
      
        - A packet command is converted to a string in TP_assign, not TP_print.
          While blktrace do the convertion just before output.
      
          Since pc requests should be rather rare, this is not a big issue.
      
        - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
          has a unique format, which means we have some unused data in a trace entry.
      
          The overhead is minimized by using __dynamic_array() instead of __array().
      
      I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:
      
            dd                   dd + ioctl blktrace       dd + TRACE_EVENT (splice)
      1     7.36s, 42.7 MB/s     7.50s, 42.0 MB/s          7.41s, 42.5 MB/s
      2     7.43s, 42.3 MB/s     7.48s, 42.1 MB/s          7.43s, 42.4 MB/s
      3     7.38s, 42.6 MB/s     7.45s, 42.2 MB/s          7.41s, 42.5 MB/s
      
      So the overhead of tracing is very small, and no regression when using
      those trace events vs blktrace.
      
      And the binary output of TRACE_EVENT is much smaller than blktrace:
      
       # ls -l -h
       -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
       -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
       -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out
      
      Following are some comparisons between TRACE_EVENT and blktrace:
      
      plug:
        kjournald-480   [000]   303.084981: block_plug: [kjournald]
        kjournald-480   [000]   303.084981:   8,0    P   N [kjournald]
      
      unplug_io:
        kblockd/0-118   [000]   300.052973: block_unplug_io: [kblockd/0] 1
        kblockd/0-118   [000]   300.052974:   8,0    U   N [kblockd/0] 1
      
      remap:
        kjournald-480   [000]   303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384
        kjournald-480   [000]   303.085043:   8,0    A   W 102736992 + 8 <- (8,8) 33384
      
      bio_backmerge:
        kjournald-480   [000]   303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald]
        kjournald-480   [000]   303.085086:   8,0    M   W 102737032 + 8 [kjournald]
      
      getrq:
        kjournald-480   [000]   303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084975:   8,0    G   W 102736984 + 8 [kjournald]
      
        bash-2066  [001]  1072.953770:   8,0    G   N [bash]
        bash-2066  [001]  1072.953773: block_getrq: 0,0 N 0 + 0 [bash]
      
      rq_complete:
        konsole-2065  [001]   300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0]
        konsole-2065  [001]   300.053191:   8,0    C   W 103669040 + 16 [0]
      
        ksoftirqd/1-7   [001]  1072.953811:   8,0    C   N (5a 00 08 00 00 00 00 00 24 00) [0]
        ksoftirqd/1-7   [001]  1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0]
      
      rq_insert:
        kjournald-480   [000]   303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084986:   8,0    I   W 102736984 + 8 [kjournald]
      
      Changelog from v2 -> v3:
      
      - use the newly introduced __dynamic_array().
      
      Changelog from v1 -> v2:
      
      - use __string() instead of __array() to minimize the memory required
        to store hex dump of rq->cmd().
      
      - support large pc requests.
      
      - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.
      
      - some cleanups.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      55782138
  4. 09 6月, 2009 4 次提交
    • N
      md/raid5: fix bug in reshape code when chunk_size decreases. · 0e6e0271
      NeilBrown 提交于
      Now that we support changing the chunksize, we calculate
      "reshape_sectors" to be the max of number of sectors in old
      and new chunk size.
      However there is one please where we still use 'chunksize'
      rather than 'reshape_sectors'.
      This causes a reshape that reduces the size of chunks to freeze.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      0e6e0271
    • N
      md/raid5 - avoid deadlocks in get_active_stripe during reshape · a8c906ca
      NeilBrown 提交于
      md has functionality to 'quiesce' and array so that all pending
      IO completed and no new IO starts.  This is used to achieve a
      stable state before making internal changes.
      
      Currently this quiescing applies equally to normal IO, resync
      IO, and reshape IO.
      However there is a problem with applying it to reshape IO.
      Reshape can have multiple 'stripe_heads' that must be active together.
      If the quiesce come between allocating the first and the last of
      such a collection, then we deadlock, as the last will not be allocated
      until the quiesce is lifted, the quiesce will not be lifted until the
      first (which has been allocated) gets used, and that first cannot be
      used until the last is allocated.
      
      It is not necessary to inhibit reshape IO when a quiesce is
      requested.  Those places in the code that require a full quiesce will
      ensure the reshape thread is not running at all.
      
      So allow reshape requests to get access to new stripe_heads without
      being blocked by a 'quiesce'.
      
      This only affects in-place reshapes (i.e. where the array does not
      grow or shrink) and these are only newly supported.  So this patch is
      not needed in earlier kernels.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a8c906ca
    • N
      md/raid5: use conf->raid_disks in preference to mddev->raid_disk · f001a70c
      NeilBrown 提交于
      mddev->raid_disks can be changed and any time by a request from
      user-space.  It is a suggestion as to what number of raid_disks is
      desired.
      
      conf->raid_disks can only be changed by the raid5 module with suitable
      locks in place.  It is a statement as to the current number of
      raid_disks.
      
      There are two places where the latter should be used, but the former
      is used.  This can lead to a crash when reshaping an array.
      
      This patch changes to mddev-> to conf->
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f001a70c
    • J
      Revert "block: Fix bounce limit setting in DM" · 9df1bb9b
      Jens Axboe 提交于
      This reverts commit a05c0205.
      
      DM doesn't need to access the bounce_pfn directly.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      9df1bb9b
  5. 03 6月, 2009 1 次提交
  6. 27 5月, 2009 1 次提交
  7. 26 5月, 2009 6 次提交