1. 29 7月, 2017 2 次提交
    • S
      blktrace: add an option to allow displaying cgroup path · 69fd5c39
      Shaohua Li 提交于
      By default we output cgroup id in blktrace. This adds an option to
      display cgroup path. Since get cgroup path is a relativly heavy
      operation, we don't enable it by default.
      
      with the option enabled, blktrace will output something like this:
      dd-1353  [007] d..2   293.015252:   8,0   /test/level  D   R 24 + 8 [dd]
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      69fd5c39
    • S
      blktrace: export cgroup info in trace · ca1136c9
      Shaohua Li 提交于
      Currently blktrace isn't cgroup aware. blktrace prints out task name of
      current context, but the task of current context isn't always in the
      cgroup where the BIO comes from. We can't use task name to find out IO
      cgroup. For example, Writeback BIOs always comes from flusher thread but
      the BIOs are for different blk cgroups. Request could be requeued and
      dispatched from completely different tasks. MD/DM are another examples.
      
      This patch tries to fix the gap. We print out cgroup fhandle info in
      blktrace. Userspace can use open_by_handle_at() syscall to find the
      cgroup by fhandle. Or userspace can use name_to_handle_at() syscall to
      find fhandle for a cgroup and use a BPF program to filter out blktrace
      for a specific cgroup.
      
      We add a new 'blk_cgroup' trace option for blk tracer. It's default off.
      Application which doesn't know the new option isn't affected.  When it's
      on, we output fhandle info right after blk_io_trace with an extra bit
      set in event action. So from application point of view, blktrace with
      the option will output new actions.
      
      I didn't change blk trace event yet, since I'm not sure if changing the
      trace event output is an ABI issue. If not, I'll do it later.
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ca1136c9
  2. 09 6月, 2017 1 次提交
  3. 19 5月, 2017 1 次提交
  4. 21 4月, 2017 2 次提交
  5. 03 2月, 2017 3 次提交
  6. 01 2月, 2017 1 次提交
  7. 28 1月, 2017 1 次提交
    • C
      block: cleanup tracing · 48b77ad6
      Christoph Hellwig 提交于
      A couple tweaks to the tracing code:
      
       - trace the request size for all requests
       - trace request sector and nr_sectors only for fs requests, enforced by
         helpers
       - drop SCSI CDB tracing - we have SCSI tracing for this and are going
         to me the CDB out of the generic struct request soon.
      
      With this the tracing code stops to know about BLOCK_PC requests entirely,
      it's just FS vs passthrough requests now, where the latter includes any
      driver-private requests.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      48b77ad6
  8. 28 10月, 2016 1 次提交
    • C
      block: better op and flags encoding · ef295ecf
      Christoph Hellwig 提交于
      Now that we don't need the common flags to overflow outside the range
      of a 32-bit type we can encode them the same way for both the bio and
      request fields.  This in addition allows us to place the operation
      first (and make some room for more ops while we're at it) and to
      stop having to shift around the operation values.
      
      In addition this allows passing around only one value in the block layer
      instead of two (and eventuall also in the file systems, but we can do
      that later) and thus clean up a lot of code.
      
      Last but not least this allows decreasing the size of the cmd_flags
      field in struct request to 32-bits.  Various functions passing this
      value could also be updated, but I'd like to avoid the churn for now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      ef295ecf
  9. 16 8月, 2016 1 次提交
  10. 08 8月, 2016 1 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
  11. 18 6月, 2016 1 次提交
    • A
      blktrace: avoid using timespec · 59a37f8b
      Arnd Bergmann 提交于
      The blktrace code stores the current time in a 32-bit word in its
      user interface. This is a bad idea because 32-bit seconds overflow
      at some point.
      
      We probably have until 2106 before this one overflows, as it seems
      to use an 'unsigned' variable, but we should confirm that user
      space treats it the same way.
      
      Aside from this, we want to stop using 'struct timespec' here,
      so I'm adding a comment about the overflow and change the code
      to use timespec64 instead to make the loss of range more obvious.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      59a37f8b
  12. 09 6月, 2016 1 次提交
  13. 08 6月, 2016 3 次提交
  14. 10 5月, 2016 2 次提交
  15. 23 3月, 2016 1 次提交
  16. 04 1月, 2016 1 次提交
  17. 30 10月, 2015 1 次提交
    • D
      blktrace: re-write setting q->blk_trace · cdea01b2
      Davidlohr Bueso 提交于
      This is really about simplifying the double xchg patterns into
      a single cmpxchg, with the same logic. Other than the immediate
      cleanup, there are some subtleties this change deals with:
      
      (i) While the load of the old bt is fully ordered wrt everything,
      ie:
      
              old_bt = xchg(&q->blk_trace, bt);             [barrier]
              if (old_bt)
      	     (void) xchg(&q->blk_trace, old_bt);    [barrier]
      
      blk_trace could still be changed between the xchg and the old_bt
      load. Note that this description is merely theoretical and afaict
      very small, but doing everything in a single context with cmpxchg
      closes this potential race.
      
      (ii) Ordering guarantees are obviously kept with cmpxchg.
      
      (iii) Gets rid of the hacky-by-nature (void)xchg pattern.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      eviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      cdea01b2
  18. 01 10月, 2015 1 次提交
    • S
      tracing: Move trace_flags from global to a trace_array field · 983f938a
      Steven Rostedt (Red Hat) 提交于
      In preparation to make trace options per instance, the global trace_flags
      needs to be moved from being a global variable to a field within the trace
      instance trace_array structure.
      
      There's still more work to do, as there's some functions that use
      trace_flags without passing in a way to get to the current_trace array. For
      those, the global_trace is used directly (from trace.c). This includes
      setting and clearing the trace_flags. This means that when a new instance is
      created, it just gets the trace_flags of the global_trace and will not be
      able to modify them. Depending on the functions that have access to the
      trace_array, the flags of an instance may not affect parts of its trace,
      where the global_trace is used. These will be fixed in future changes.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      983f938a
  19. 26 9月, 2015 1 次提交
  20. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  21. 26 6月, 2015 1 次提交
    • R
      kernel/trace/blktrace.c: use strreplace() in do_blk_trace_setup() · ff14417c
      Rasmus Villemoes 提交于
      Part of the disassembly of do_blk_trace_setup:
      
          231b:       e8 00 00 00 00          callq  2320 <do_blk_trace_setup+0x50>
                              231c: R_X86_64_PC32     strlen+0xfffffffffffffffc
          2320:       eb 0a                   jmp    232c <do_blk_trace_setup+0x5c>
          2322:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
          2328:       48 83 c3 01             add    $0x1,%rbx
          232c:       48 39 d8                cmp    %rbx,%rax
          232f:       76 47                   jbe    2378 <do_blk_trace_setup+0xa8>
          2331:       41 80 3c 1c 2f          cmpb   $0x2f,(%r12,%rbx,1)
          2336:       75 f0                   jne    2328 <do_blk_trace_setup+0x58>
          2338:       41 c6 04 1c 5f          movb   $0x5f,(%r12,%rbx,1)
          233d:       4c 89 e7                mov    %r12,%rdi
          2340:       e8 00 00 00 00          callq  2345 <do_blk_trace_setup+0x75>
                              2341: R_X86_64_PC32     strlen+0xfffffffffffffffc
          2345:       eb e1                   jmp    2328 <do_blk_trace_setup+0x58>
      
      Yep, that's right: gcc isn't smart enough to realize that replacing '/' by
      '_' cannot change the strlen(), so we call it again and again (at least
      when a '/' is found).  Even if gcc were that smart, this construction
      would still loop over the string twice, once for the initial strlen() call
      and then the open-coded loop.
      
      Let's simply use strreplace() instead.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Liked-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff14417c
  22. 14 5月, 2015 1 次提交
  23. 10 12月, 2014 1 次提交
    • A
      blktrace: don't let the sysfs interface remove trace from running list · 7eca2103
      Arianna Avanzini 提交于
      Currently, blktrace can be started/stopped via its ioctl-based interface
      (used by the userspace blktrace tool) or via its ftrace interface. The
      function blk_trace_remove_queue(), called each time an "enable" tunable
      of the ftrace interface transitions to zero, removes the trace from the
      running list, even if no function from the sysfs interface adds it to
      such a list. This leads to a null pointer dereference.  This commit
      changes the blk_trace_remove_queue() function so that it does not remove
      the blk_trace from the running list.
      
      v2:
          - Now the patch removes the invocation of list_del() instead of
            adding an useless if branch, as suggested by Namhyung Kim.
      Signed-off-by: NArianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7eca2103
  24. 20 11月, 2014 1 次提交
  25. 06 3月, 2014 1 次提交
    • R
      blktrace: fix accounting of partially completed requests · af5040da
      Roman Pen 提交于
      trace_block_rq_complete does not take into account that request can
      be partially completed, so we can get the following incorrect output
      of blkparser:
      
        C   R 232 + 240 [0]
        C   R 240 + 232 [0]
        C   R 248 + 224 [0]
        C   R 256 + 216 [0]
      
      but should be:
      
        C   R 232 + 8 [0]
        C   R 240 + 8 [0]
        C   R 248 + 8 [0]
        C   R 256 + 8 [0]
      
      Also, the whole output summary statistics of completed requests and
      final throughput will be incorrect.
      
      This patch takes into account real completion size of the request and
      fixes wrong completion accounting.
      Signed-off-by: NRoman Pen <r.peniaev@gmail.com>
      CC: Steven Rostedt <rostedt@goodmis.org>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: linux-kernel@vger.kernel.org
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <axboe@fb.com>
      af5040da
  26. 21 2月, 2014 1 次提交
  27. 24 11月, 2013 1 次提交
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  28. 09 11月, 2013 1 次提交
  29. 08 11月, 2013 1 次提交
    • J
      blktrace: Send BLK_TN_PROCESS events to all running traces · a404d557
      Jan Kara 提交于
      Currently each task sends BLK_TN_PROCESS event to the first traced
      device it interacts with after a new trace is started. When there are
      several traced devices and the task accesses more devices, this logic
      can result in BLK_TN_PROCESS being sent several times to some devices
      while it is never sent to other devices. Thus blkparse doesn't display
      command name when parsing some blktrace files.
      
      Fix the problem by sending BLK_TN_PROCESS event to all traced devices
      when a task interacts with any of them.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Review-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a404d557
  30. 19 4月, 2013 1 次提交
  31. 24 3月, 2013 1 次提交
  32. 15 3月, 2013 1 次提交
    • S
      tracing: Consolidate max_tr into main trace_array structure · 12883efb
      Steven Rostedt (Red Hat) 提交于
      Currently, the way the latency tracers and snapshot feature works
      is to have a separate trace_array called "max_tr" that holds the
      snapshot buffer. For latency tracers, this snapshot buffer is used
      to swap the running buffer with this buffer to save the current max
      latency.
      
      The only items needed for the max_tr is really just a copy of the buffer
      itself, the per_cpu data pointers, the time_start timestamp that states
      when the max latency was triggered, and the cpu that the max latency
      was triggered on. All other fields in trace_array are unused by the
      max_tr, making the max_tr mostly bloat.
      
      This change removes the max_tr completely, and adds a new structure
      called trace_buffer, that holds the buffer pointer, the per_cpu data
      pointers, the time_start timestamp, and the cpu where the latency occurred.
      
      The trace_array, now has two trace_buffers, one for the normal trace and
      one for the max trace or snapshot. By doing this, not only do we remove
      the bloat from the max_trace but the instances of traces can now use
      their own snapshot feature and not have just the top level global_trace have
      the snapshot feature and latency tracers for itself.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      12883efb
  33. 22 1月, 2013 1 次提交