1. 02 10月, 2009 2 次提交
    • J
      Add a tracepoint for block request remapping · b0da3f0d
      Jun'ichi Nomura 提交于
      Since 2.6.31 now has request-based device-mapper, it's useful to have
      a tracepoint for request-remapping as well as bio-remapping.
      This patch adds a tracepoint for request-remapping, trace_block_rq_remap().
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b0da3f0d
    • Z
      Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs · 48c0d4d4
      Zdenek Kabelac 提交于
      Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
      introduced in commit 1d54ad6d.
      Release kobject also in case the request_fn is NULL.
      
      Problem was noticed via kmemleak backtrace when some sysfs entries were
      note properly destroyed during  device removal:
      
      unreferenced object 0xffff88001aa76640 (size 80):
        comm "lvcreate", pid 2120, jiffies 4294885144
        hex dump (first 32 bytes):
          01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff  .........e......
          90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff  .f........S.....
        backtrace:
          [<ffffffff813f9cc6>] kmemleak_alloc+0x26/0x60
          [<ffffffff8111d693>] kmem_cache_alloc+0x133/0x1c0
          [<ffffffff81195891>] sysfs_new_dirent+0x41/0x120
          [<ffffffff81194b0c>] sysfs_add_file_mode+0x3c/0xb0
          [<ffffffff81197c81>] internal_create_group+0xc1/0x1a0
          [<ffffffff81197d93>] sysfs_create_group+0x13/0x20
          [<ffffffff810d8004>] blk_trace_init_sysfs+0x14/0x20
          [<ffffffff8123f45c>] blk_register_queue+0x3c/0xf0
          [<ffffffff812447e4>] add_disk+0x94/0x160
          [<ffffffffa00d8b08>] dm_create+0x598/0x6e0 [dm_mod]
          [<ffffffffa00de951>] dev_create+0x51/0x350 [dm_mod]
          [<ffffffffa00de823>] ctl_ioctl+0x1a3/0x240 [dm_mod]
          [<ffffffffa00de8f2>] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod]
          [<ffffffff81177bfd>] compat_sys_ioctl+0xcd/0x4f0
          [<ffffffff81036ed8>] sysenter_dispatch+0x7/0x2c
          [<ffffffffffffffff>] 0xffffffffffffffff
      Signed-off-by: NZdenek Kabelac <zkabelac@redhat.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      48c0d4d4
  2. 05 9月, 2009 1 次提交
    • S
      tracing: pass around ring buffer instead of tracer · e77405ad
      Steven Rostedt 提交于
      The latency tracers (irqsoff and wakeup) can swap trace buffers
      on the fly. If an event is happening and has reserved data on one of
      the buffers, and the latency tracer swaps the global buffer with the
      max buffer, the result is that the event may commit the data to the
      wrong buffer.
      
      This patch changes the API to the trace recording to be recieve the
      buffer that was used to reserve a commit. Then this buffer can be passed
      in to the commit.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e77405ad
  3. 13 8月, 2009 1 次提交
    • A
      Remove double removal of blktrace directory · 39cbb602
      Alan D. Brunelle 提交于
      commit fd51d251
      Author: Stefan Raspl <raspl@linux.vnet.ibm.com>
      Date:   Tue May 19 09:59:08 2009 +0200
      
          blktrace: remove debugfs entries on bad path
      
      added in an explicit invocation of debugfs_remove for bt->dir, in
      blk_remove_buf_file_callback we are also getting the directory removed. On
      occasion I am seeing memory corruption that I have bisected down to
      this commit. [The testing involves a (long) series of I/O benchmarks
      with blktrace invoked around the actual runs.] I believe that this
      committed patch is correct, but the problem actually lies in the code
      in blk_remove_buf_file_callback.
      
      With this patch I am able to consistently get complete runs whereas
      previously I could not get a single run to complete.
      
      The first part of the patch simply moves the debugfs_remove below the
      relay_close: the relay_close call will remove files under bt->dir, and
      so we should not remove the directory until all the files we created
      have been removed. (Note: This is not sufficient to fix the problem -
      the file system code has ref counts on the directoy, so our invocation
      does not cause the directory to actually be removed. Nonetheless, we
      should not rely upon that feature.)
      Signed-off-by: NAlan D. Brunelle <alan.brunelle@hp.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      39cbb602
  4. 13 7月, 2009 1 次提交
  5. 10 6月, 2009 1 次提交
    • L
      tracing/events: convert block trace points to TRACE_EVENT() · 55782138
      Li Zefan 提交于
      TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
      these new capabilities to this tracepoint:
      
        - zero-copy and per-cpu splice() tracing
        - binary tracing without printf overhead
        - structured logging records exposed under /debug/tracing/events
        - trace events embedded in function tracer output and other plugins
        - user-defined, per tracepoint filter expressions
        ...
      
      Cons:
      
        - no dev_t info for the output of plug, unplug_timer and unplug_io events.
          no dev_t info for getrq and sleeprq events if bio == NULL.
          no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.
      
          This is mainly because we can't get the deivce from a request queue.
          But this may change in the future.
      
        - A packet command is converted to a string in TP_assign, not TP_print.
          While blktrace do the convertion just before output.
      
          Since pc requests should be rather rare, this is not a big issue.
      
        - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
          has a unique format, which means we have some unused data in a trace entry.
      
          The overhead is minimized by using __dynamic_array() instead of __array().
      
      I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:
      
            dd                   dd + ioctl blktrace       dd + TRACE_EVENT (splice)
      1     7.36s, 42.7 MB/s     7.50s, 42.0 MB/s          7.41s, 42.5 MB/s
      2     7.43s, 42.3 MB/s     7.48s, 42.1 MB/s          7.43s, 42.4 MB/s
      3     7.38s, 42.6 MB/s     7.45s, 42.2 MB/s          7.41s, 42.5 MB/s
      
      So the overhead of tracing is very small, and no regression when using
      those trace events vs blktrace.
      
      And the binary output of TRACE_EVENT is much smaller than blktrace:
      
       # ls -l -h
       -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
       -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
       -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out
      
      Following are some comparisons between TRACE_EVENT and blktrace:
      
      plug:
        kjournald-480   [000]   303.084981: block_plug: [kjournald]
        kjournald-480   [000]   303.084981:   8,0    P   N [kjournald]
      
      unplug_io:
        kblockd/0-118   [000]   300.052973: block_unplug_io: [kblockd/0] 1
        kblockd/0-118   [000]   300.052974:   8,0    U   N [kblockd/0] 1
      
      remap:
        kjournald-480   [000]   303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384
        kjournald-480   [000]   303.085043:   8,0    A   W 102736992 + 8 <- (8,8) 33384
      
      bio_backmerge:
        kjournald-480   [000]   303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald]
        kjournald-480   [000]   303.085086:   8,0    M   W 102737032 + 8 [kjournald]
      
      getrq:
        kjournald-480   [000]   303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084975:   8,0    G   W 102736984 + 8 [kjournald]
      
        bash-2066  [001]  1072.953770:   8,0    G   N [bash]
        bash-2066  [001]  1072.953773: block_getrq: 0,0 N 0 + 0 [bash]
      
      rq_complete:
        konsole-2065  [001]   300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0]
        konsole-2065  [001]   300.053191:   8,0    C   W 103669040 + 16 [0]
      
        ksoftirqd/1-7   [001]  1072.953811:   8,0    C   N (5a 00 08 00 00 00 00 00 24 00) [0]
        ksoftirqd/1-7   [001]  1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0]
      
      rq_insert:
        kjournald-480   [000]   303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084986:   8,0    I   W 102736984 + 8 [kjournald]
      
      Changelog from v2 -> v3:
      
      - use the newly introduced __dynamic_array().
      
      Changelog from v1 -> v2:
      
      - use __string() instead of __array() to minimize the memory required
        to store hex dump of rq->cmd().
      
      - support large pc requests.
      
      - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.
      
      - some cleanups.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      55782138
  6. 19 5月, 2009 1 次提交
    • S
      blktrace: remove debugfs entries on bad path · fd51d251
      Stefan Raspl 提交于
      debugfs directory entries for devices are not removed on some
      of the failure pathes in do_blk_trace_setup().
      One way to reproduce is to start blktrace on multiple devices
      with insufficient Vmalloc space: Devices will fail with
      a message like this:
      
      	BLKTRACESETUP(2) /dev/sdu failed: 5/Input/output error
      
      If so, the respective entries in debugfs
      (e.g. /sys/kernel/debug/block/sdu) will remain and subsequent
      attempts to start blktrace on the respective devices will not
      succeed due to existing directories.
      
      [ Impact: fix /debug/tracing file cleanup corner case ]
      Signed-off-by: NStefan Raspl <stefan.raspl@linux.vnet.ibm.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      LKML-Reference: <4A1266CC.5040801@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fd51d251
  7. 11 5月, 2009 3 次提交
    • L
      blktrace: pdu_buf of pc events should be unsigned · 04986257
      Li Zefan 提交于
      I got this:
        8,0    1   305.417782332  2037  I   R 32 (ffffff9e 10 00 ...) [bash]
      
      It should be:
        8,0    1   305.417782332  2037  I   R 32 (9e 10 00 ...) [bash]
      
      [ Impact: fix output of pc events ]
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A07C6B3.9080802@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      04986257
    • T
      block: drop request->hard_* and *nr_sectors · 2e46e8b2
      Tejun Heo 提交于
      struct request has had a few different ways to represent some
      properties of a request.  ->hard_* represent block layer's view of the
      request progress (completion cursor) and the ones without the prefix
      are supposed to represent the issue cursor and allowed to be updated
      as necessary by the low level drivers.  The thing is that as block
      layer supports partial completion, the two cursors really aren't
      necessary and only cause confusion.  In addition, manual management of
      request detail from low level drivers is cumbersome and error-prone at
      the very least.
      
      Another interesting duplicate fields are rq->[hard_]nr_sectors and
      rq->{hard_cur|current}_nr_sectors against rq->data_len and
      rq->bio->bi_size.  This is more convoluted than the hard_ case.
      
      rq->[hard_]nr_sectors are initialized for requests with bio but
      blk_rq_bytes() uses it only for !pc requests.  rq->data_len is
      initialized for all request but blk_rq_bytes() uses it only for pc
      requests.  This causes good amount of confusion throughout block layer
      and its drivers and determining the request length has been a bit of
      black magic which may or may not work depending on circumstances and
      what the specific LLD is actually doing.
      
      rq->{hard_cur|current}_nr_sectors represent the number of sectors in
      the contiguous data area at the front.  This is mainly used by drivers
      which transfers data by walking request segment-by-segment.  This
      value always equals rq->bio->bi_size >> 9.  However, data length for
      pc requests may not be multiple of 512 bytes and using this field
      becomes a bit confusing.
      
      In general, having multiple fields to represent the same property
      leads only to confusion and subtle bugs.  With recent block low level
      driver cleanups, no driver is accessing or manipulating these
      duplicate fields directly.  Drop all the duplicates.  Now rq->sector
      means the current sector, rq->data_len the current total length and
      rq->bio->bi_size the current segment length.  Everything else is
      defined in terms of these three and available only through accessors.
      
      * blk_recalc_rq_sectors() is collapsed into blk_update_request() and
        now handles pc and fs requests equally other than rq->sector update.
        This means that now pc requests can use partial completion too (no
        in-kernel user yet tho).
      
      * bio_cur_sectors() is replaced with bio_cur_bytes() as block layer
        now uses byte count as the primary data length.
      
      * blk_rq_pos() is now guranteed to be always correct.  In-block users
        converted.
      
      * blk_rq_bytes() is now guaranteed to be always valid as is
        blk_rq_sectors().  In-block users converted.
      
      * blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9.
        More convenient one is used.
      
      * blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const
        pointer to request.
      
      [ Impact: API cleanup, single way to represent one property of a request ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      2e46e8b2
    • T
      block: implement blk_rq_pos/[cur_]sectors() and convert obvious ones · 5b93629b
      Tejun Heo 提交于
      Implement accessors - blk_rq_pos(), blk_rq_sectors() and
      blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors
      and rq->hard_cur_sectors respectively and convert direct references of
      the said fields to the accessors.
      
      This is in preparation of request data length handling cleanup.
      
      Geert	: suggested adding const to struct request * parameter to accessors
      Sergei	: spotted error in patch description
      
      [ Impact: cleanup ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Acked-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Tested-by: NGrant Likely <grant.likely@secretlab.ca>
      Acked-by: NGrant Likely <grant.likely@secretlab.ca>
      Ackec-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Borislav Petkov <petkovbb@googlemail.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      5b93629b
  8. 06 5月, 2009 2 次提交
  9. 16 4月, 2009 4 次提交
    • L
      blktrace: fix context-info when mixed-using blk tracer and trace events · f3948f88
      Li Zefan 提交于
      When current tracer is set to blk tracer, TRACE_ITER_CONTEXT_INFO is
      unset, but actually context-info is printed:
      
          pdflush-431   [000]   821.181576:   8,0    P   N [pdflush]
      
      And then if we enable TRACE_ITER_CONTEXT_INFO:
      
          # echo context-info > trace_options
      
      We'll see context-info printed twice. What's worse, when we use blk
      tracer and trace events at the same time, we'll see no context-info
      for trace events at all:
      
          jbd2_commit_logging: dev dm-0:8 transaction 333227
          jbd2_end_commit: dev dm-0:8 transaction 333227 head 332814
            rm-25433 [001]  9578.307485:   8,18   m   N cfq25433 slice expired t=0
            rm-25433 [001]  9578.307486:   8,18   m   N cfq25433 put_queue
      
      This patch adds blk_tracer->set_flags(), and context-info flag is unset
      only when we set the output to classic mode.
      
      Note after this patch, one should unset context-info explicitly if he
      wants to get binary output that can be parsed by blkparse:
      
          # echo nocontext-info > trace_options
          # echo bin > trace_options
          # echo blk > current_tracer
          # cat trace_pipe | blkparse -i -
      Reported-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <49E54E60.50408@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f3948f88
    • L
      blktrace: add trace/ to /sys/block/sda · 1d54ad6d
      Li Zefan 提交于
      Impact: allow ftrace-plugin blktrace to trace device-mapper devices
      
      To trace a single partition:
        # echo 1 > /sys/block/sda/sda1/enable
      
      To trace the whole sda instead:
        # echo 1 > /sys/block/sda/enable
      
      Thus we also fix an issue reported by Ted, that ftrace-plugin blktrace
      can't be used to trace device-mapper devices.
      
      Now:
      
        # echo 1 > /sys/block/dm-0/trace/enable
        echo: write error: No such device or address
        # mount -t ext4 /dev/dm-0 /mnt
        # echo 1 > /sys/block/dm-0/trace/enable
        # echo blk > /debug/tracing/current_tracer
      Reported-by: NTheodore Tso <tytso@mit.edu>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Shawn Du <duyuyang@gmail.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      LKML-Reference: <49E42665.6020506@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1d54ad6d
    • L
      blktrace: support per-partition tracing for ftrace plugin · 9908c309
      Li Zefan 提交于
      The previous patch adds support to trace a single partition for
      relay+ioctl blktrace, and this patch is for ftrace plugin blktrace:
      
        # echo 1 > /sys/block/sda/sda7/enable
        # cat start_lba
        102398373
        # cat end_lba
        102703545
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Shawn Du <duyuyang@gmail.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      LKML-Reference: <49E42646.4060608@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9908c309
    • S
      blktrace: support per-partition tracing · d0deef5b
      Shawn Du 提交于
      Though one can specify '-d /dev/sda1' when using blktrace, it still
      traces the whole sda.
      
      To support per-partition tracing, when we start tracing, we initialize
      bt->start_lba and bt->end_lba to the start and end sector of that
      partition.
      
      Note some actions are per device, thus we don't filter 0-sector events.
      
      The original patch and discussion can be found here:
      	http://marc.info/?l=linux-btrace&m=122949374214540&w=2Signed-off-by: NShawn Du <duyuyang@gmail.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      LKML-Reference: <49E42620.4050701@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d0deef5b
  10. 12 4月, 2009 2 次提交
    • L
      blktrace: fix output of BLK_TC_PC events · 66de7792
      Li Zefan 提交于
      BLK_TC_PC events should be treated differently with BLK_TC_FS events.
      
      Before this patch:
      
       # echo 1 > /sys/block/sda/sda1/trace/enable
       # echo pc > /sys/block/sda/sda1/trace/act_mask
       # echo blk > /debugfs/tracing/current_tracer
       # (generate some BLK_TC_PC events)
       # cat trace
              bash-2184  [000]  1774.275413:   8,7    I   N [bash]
              bash-2184  [000]  1774.275435:   8,7    D   N [bash]
              bash-2184  [000]  1774.275540:   8,7    I   R [bash]
              bash-2184  [000]  1774.275547:   8,7    D   R [bash]
       ksoftirqd/0-4     [000]  1774.275580:   8,7    C   N 0 [0]
              bash-2184  [000]  1774.275648:   8,7    I   R [bash]
              bash-2184  [000]  1774.275653:   8,7    D   R [bash]
       ksoftirqd/0-4     [000]  1774.275682:   8,7    C   N 0 [0]
              bash-2184  [000]  1774.275739:   8,7    I   R [bash]
              bash-2184  [000]  1774.275744:   8,7    D   R [bash]
       ksoftirqd/0-4     [000]  1774.275771:   8,7    C   N 0 [0]
              bash-2184  [000]  1774.275804:   8,7    I   R [bash]
              bash-2184  [000]  1774.275808:   8,7    D   R [bash]
       ksoftirqd/0-4     [000]  1774.275836:   8,7    C   N 0 [0]
      
      After this patch:
      
       # cat trace
              bash-2263  [000]   366.782149:   8,7    I   N 0 (00 ..) [bash]
              bash-2263  [000]   366.782323:   8,7    D   N 0 (00 ..) [bash]
              bash-2263  [000]   366.782557:   8,7    I   R 8 (25 00 ..) [bash]
              bash-2263  [000]   366.782560:   8,7    D   R 8 (25 00 ..) [bash]
       ksoftirqd/0-4     [000]   366.782582:   8,7    C   N (25 00 ..) [0]
              bash-2263  [000]   366.782648:   8,7    I   R 8 (5a 00 3f 00) [bash]
              bash-2263  [000]   366.782650:   8,7    D   R 8 (5a 00 3f 00) [bash]
       ksoftirqd/0-4     [000]   366.782669:   8,7    C   N (5a 00 3f 00) [0]
              bash-2263  [000]   366.782710:   8,7    I   R 8 (5a 00 08 00) [bash]
              bash-2263  [000]   366.782713:   8,7    D   R 8 (5a 00 08 00) [bash]
       ksoftirqd/0-4     [000]   366.782730:   8,7    C   N (5a 00 08 00) [0]
              bash-2263  [000]   366.783375:   8,7    I   R 36 (5a 00 08 00) [bash]
              bash-2263  [000]   366.783379:   8,7    D   R 36 (5a 00 08 00) [bash]
       ksoftirqd/0-4     [000]   366.783404:   8,7    C   N (5a 00 08 00) [0]
      
      This is what we do with PC events in user-space blktrace.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <49D32387.9040106@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      66de7792
    • L
      blktrace: fix output of unknown events · b78825d6
      Li Zefan 提交于
      Not all events are pc (packet command) events. An event is a pc
      event only if it has BLK_TC_PC bit set.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <49D3236D.3090705@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b78825d6
  11. 09 4月, 2009 1 次提交
  12. 03 4月, 2009 3 次提交
  13. 31 3月, 2009 9 次提交
    • L
      blktrace: print out BLK_TN_MESSAGE properly · 18cea459
      Li Zefan 提交于
      Impact: improve ftrace plugin output
      
      Before this patch:
      
       # cat trace
               make-5383  [001]   741.240059:   8,7    P   N [make]
       __trace_note_message: cfq1074
      
       # echo 1 > options/blk_classic
       # cat trace
         8,7    1     0.692221252     0  C   W 130411392 + 1024 [0]
       Bad pc action 6361
       Bad pc action 283d
      
       # echo 0 > options/blk_classic
       # echo bin > trace_options
       # cat trace_pipe | blkparse -i -
       (can't parse messages generated by blk_add_trace_msg())
      
      After this patch:
       # cat trace
            <idle>-0     [001]   187.600933:   8,7    C   W 145220224 + 8 [0]
            <idle>-0     [001]   187.600946:   8,7    m   N cfq1076 complete
      
       # echo 1 > options/blk_classic
       # cat trace
         8,7    1     0.256378996   238  I   W 113190728 + 8 [pdflush]
         8,7    1     0.256378998   238  m   N cfq1076 insert_request
      
       # echo 0 > options/blk_classic
       # echo bin > trace_options
       # cat trace_pipe | blkparse -i -
        8,7    1        0    22.973250293     0  C   W 102770576 + 8 [0]
        8,7    1        0    22.973259213     0  m   N cfq1076 complete
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      18cea459
    • L
      blktrace: extract duplidate code · b6a4b0c3
      Li Zefan 提交于
      Impact: cleanup
      
      blk_trace_event_print() and blk_tracer_print_line() share most of the code.
      
         text    data     bss     dec     hex filename
         8605     393      12    9010    2332 kernel/trace/blktrace.o.orig
         text    data     bss     dec     hex filename
         8555     393      12    8960    2300 kernel/trace/blktrace.o
      
      This patch also prepares for the next patch, that prints out BLK_TN_MESSAGE.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b6a4b0c3
    • L
      blktrace: fix memory leak when freeing struct blk_io_trace · ad5dd549
      Li Zefan 提交于
      Impact: fix mixed ioctl and ftrace-plugin blktrace use memory leak
      
      When mixing the use of ioctl-based blktrace and ftrace-based blktrace,
      we can leak memory in this way:
      
        # btrace /dev/sda > /dev/null &
        # echo 0 > /sys/block/sda/sda1/trace/enable
      
      now we leak bt->dropped_file, bt->msg_file, bt->rchan...
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ad5dd549
    • L
      blktrace: fix blk_probes_ref chaos · 17ba97e3
      Li Zefan 提交于
      Impact: fix mixed ioctl and ftrace-plugin blktrace use refcount bugs
      
      ioctl-based blktrace allocates bt and registers tracepoints when
      ioctl(BLKTRACESETUP), and do all cleanups when ioctl(BLKTRACETEARDOWN).
      
      while ftrace-based blktrace allocates/frees bt when:
        # echo 1/0 > /sys/block/sda/sda1/trace/enable
      
      and registers/unregisters tracepoints when:
        # echo blk/nop > /debugfs/tracing/current_tracer
      or
        # echo 1/0 > /debugfs/tracing/tracing_enable
      
      The separatation of allocation and registeration causes 2 problems:
      
        1. current user-space blktrace still calls ioctl(TEARDOWN) when
           ioctl(SETUP) failed:
             # echo 1 > /sys/block/sda/sda1/trace/enable
             # blktrace /dev/sda
               BLKTRACESETUP: Device or resource busy
               ^C
           and now blk_probes_ref == -1
      
        2. Another way to make blk_probes_ref == -1:
           # plugin sdb && mount sdb1
           # echo 1 > /sys/block/sdb/sdb1/trace/enable
           # remove sdb
      
      This patch does the allocation and registeration when writing
      sdaX/trace/enable.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      17ba97e3
    • L
      blktrace: make classic output more classic · 35ac51bf
      Li Zefan 提交于
      Impact: fix ftrace plugin timestamp output
      
      In the classic user-space blktrace, the output timestamp is sec.nsec
      not sec.usec.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      35ac51bf
    • L
      blktrace: fix off-by-one bug · eb08f8eb
      Li Zefan 提交于
      'what' is used as the index of array what2act, so it can't >= the array size.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      eb08f8eb
    • L
      blktrace: fix the original blktrace · 55547204
      Li Zefan 提交于
      Currently the original blktrace, which is using relay and is used via
      ioctl, is broken. You can use ftrace to see the output of blktrace,
      but user-space blktrace is unusable.
      
      It's broken by "blktrace: add ftrace plugin"
      (c71a8961)
      
       -	if (unlikely(bt->trace_state != Blktrace_running))
       +	if (unlikely(bt->trace_state != Blktrace_running || !blk_tracer_enabled))
      		return;
      
      With this patch, both ioctl and ftrace can be used, but of course you
      can't use both of them at the same time.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      55547204
    • L
      blktrace: fix a race when creating blk_tree_root in debugfs · b5230b56
      Li Zefan 提交于
      t1                                t2
      ------                            ------
      do_blk_trace_setup()              do_blk_trace_setup()
        if (!blk_tree_root) {
                                          if (!blk_tree_root)
          blk_tree_root = create_dir()
                                            blk_tree_root = create_dir();
                                            (now blk_tree_root == NULL)
        ...
        dir = create_dir(name, blk_tree_root);
      
      Due to this race, t1 will create 'dir' in /debugfs but not /debugfs/block.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b5230b56
    • L
      blktrace: fix timestamp in binary output · 6c051ce0
      Li Zefan 提交于
      I found the timestamp is wrong:
      
       # echo bin > trace_option
       # echo blk > current_tracer
       # cat trace_pipe | blkparse -i -
       8,0    0        0     0.000000000   504  A   W ...
       ...
       8,7    1        0     0.008534097     0  C   R ...
                  (should be 8.534097xxx)
      
      user-space blkparse expects the timestamp to be nanosecond.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6c051ce0
  14. 24 3月, 2009 4 次提交
  15. 21 3月, 2009 5 次提交