1. 16 6月, 2011 2 次提交
    • K
      vmscan: implement swap token priority aging · d7911ef3
      KOSAKI Motohiro 提交于
      While testing for memcg aware swap token, I observed a swap token was
      often grabbed an intermittent running process (eg init, auditd) and they
      never release a token.
      
      Why?
      
      Some processes (eg init, auditd, audispd) wake up when a process exiting.
      And swap token can be get first page-in process when a process exiting
      makes no swap token owner.  Thus such above intermittent running process
      often get a token.
      
      And currently, swap token priority is only decreased at page fault path.
      Then, if the process sleep immediately after to grab swap token, the swap
      token priority never be decreased.  That's obviously undesirable.
      
      This patch implement very poor (and lightweight) priority aging.  It only
      be affect to the above corner case and doesn't change swap tendency
      workload performance (eg multi process qsbench load)
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7911ef3
    • K
      vmscan: implement swap token trace · 83cd81a3
      KOSAKI Motohiro 提交于
      This is useful for observing swap token activity.
      
      example output:
      
                   zsh-1845  [000]   598.962716: update_swap_token_priority:
      mm=ffff88015eaf7700 old_prio=1 new_prio=0
                memtoy-1830  [001]   602.033900: update_swap_token_priority:
      mm=ffff880037a45880 old_prio=947 new_prio=949
                memtoy-1830  [000]   602.041509: update_swap_token_priority:
      mm=ffff880037a45880 old_prio=949 new_prio=951
                memtoy-1830  [000]   602.051959: update_swap_token_priority:
      mm=ffff880037a45880 old_prio=951 new_prio=953
                memtoy-1830  [000]   602.052188: update_swap_token_priority:
      mm=ffff880037a45880 old_prio=953 new_prio=955
                memtoy-1830  [001]   602.427184: put_swap_token:
      token_mm=ffff880037a45880
                   zsh-1789  [000]   602.427281: replace_swap_token:
      old_token_mm=          (null) old_prio=0 new_token_mm=ffff88015eaf7018
      new_prio=2
                   zsh-1789  [001]   602.433456: update_swap_token_priority:
      mm=ffff88015eaf7018 old_prio=2 new_prio=4
                   zsh-1789  [000]   602.437613: update_swap_token_priority:
      mm=ffff88015eaf7018 old_prio=4 new_prio=6
                   zsh-1789  [000]   602.443924: update_swap_token_priority:
      mm=ffff88015eaf7018 old_prio=6 new_prio=8
                   zsh-1789  [000]   602.451873: update_swap_token_priority:
      mm=ffff88015eaf7018 old_prio=8 new_prio=10
                   zsh-1789  [001]   602.462639: update_swap_token_priority:
      mm=ffff88015eaf7018 old_prio=10 new_prio=12
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: Rik van Riel<riel@redhat.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83cd81a3
  2. 15 6月, 2011 1 次提交
    • S
      rcu: Use softirq to address performance regression · 09223371
      Shaohua Li 提交于
      Commit a26ac245(rcu: move TREE_RCU from softirq to kthread)
      introduced performance regression. In an AIM7 test, this commit degraded
      performance by about 40%.
      
      The commit runs rcu callbacks in a kthread instead of softirq. We observed
      high rate of context switch which is caused by this. Out test system has
      64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
      which is caused by RCU's per-CPU kthread.  A trace showed that most of
      the time the RCU per-CPU kthread doesn't actually handle any callbacks,
      but instead just does a very small amount of work handling grace periods.
      This means that RCU's per-CPU kthreads are making the scheduler do quite
      a bit of work in order to allow a very small amount of RCU-related
      processing to be done.
      
      Alex Shi's analysis determined that this slowdown is due to lock
      contention within the scheduler.  Unfortunately, as Peter Zijlstra points
      out, the scheduler's real-time semantics require global action, which
      means that this contention is inherent in real-time scheduling.  (Yes,
      perhaps someone will come up with a workaround -- otherwise, -rt is not
      going to do well on large SMP systems -- but this patch will work around
      this issue in the meantime.  And "the meantime" might well be forever.)
      
      This patch therefore re-introduces softirq processing to RCU, but only
      for core RCU work.  RCU callbacks are still executed in kthread context,
      so that only a small amount of RCU work runs in softirq context in the
      common case.  This should minimize ksoftirqd execution, allowing us to
      skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Tested-by: N"Alex,Shi" <alex.shi@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      09223371
  3. 03 6月, 2011 1 次提交
    • K
      net: tracepoint of net_dev_xmit sees freed skb and causes panic · ec764bf0
      Koki Sanagi 提交于
      Because there is a possibility that skb is kfree_skb()ed and zero cleared
      after ndo_start_xmit, we should not see the contents of skb like skb->len and
      skb->dev->name after ndo_start_xmit. But trace_net_dev_xmit does that
      and causes panic by NULL pointer dereference.
      This patch fixes trace_net_dev_xmit not to see the contents of skb directly.
      
      If you want to reproduce this panic,
      
      1. Get tracepoint of net_dev_xmit on
      2. Create 2 guests on KVM
      2. Make 2 guests use virtio_net
      4. Execute netperf from one to another for a long time as a network burden
      5. host will panic(It takes about 30 minutes)
      Signed-off-by: NKoki Sanagi <sanagi.koki@jp.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec764bf0
  4. 26 5月, 2011 1 次提交
  5. 20 5月, 2011 1 次提交
  6. 12 5月, 2011 1 次提交
  7. 06 5月, 2011 1 次提交
  8. 16 4月, 2011 1 次提交
  9. 12 4月, 2011 2 次提交
  10. 28 3月, 2011 1 次提交
    • L
      Btrfs: add initial tracepoint support for btrfs · 1abe9b8a
      liubo 提交于
      Tracepoints can provide insight into why btrfs hits bugs and be greatly
      helpful for debugging, e.g
                    dd-7822  [000]  2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0
                    dd-7822  [000]  2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0
       btrfs-transacti-7804  [001]  2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0)
       btrfs-transacti-7804  [001]  2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0)
       btrfs-transacti-7804  [001]  2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8
         flush-btrfs-2-7821  [001]  2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA
         flush-btrfs-2-7821  [001]  2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0)
         flush-btrfs-2-7821  [001]  2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0)
         flush-btrfs-2-7821  [000]  2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0)
       btrfs-endio-wri-7800  [001]  2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0)
       btrfs-endio-wri-7800  [001]  2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0)
      
      Here is what I have added:
      
      1) ordere_extent:
              btrfs_ordered_extent_add
              btrfs_ordered_extent_remove
              btrfs_ordered_extent_start
              btrfs_ordered_extent_put
      
      These provide critical information to understand how ordered_extents are
      updated.
      
      2) extent_map:
              btrfs_get_extent
      
      extent_map is used in both read and write cases, and it is useful for tracking
      how btrfs specific IO is running.
      
      3) writepage:
              __extent_writepage
              btrfs_writepage_end_io_hook
      
      Pages are cirtical resourses and produce a lot of corner cases during writeback,
      so it is valuable to know how page is written to disk.
      
      4) inode:
              btrfs_inode_new
              btrfs_inode_request
              btrfs_inode_evict
      
      These can show where and when a inode is created, when a inode is evicted.
      
      5) sync:
              btrfs_sync_file
              btrfs_sync_fs
      
      These show sync arguments.
      
      6) transaction:
              btrfs_transaction_commit
      
      In transaction based filesystem, it will be useful to know the generation and
      who does commit.
      
      7) back reference and cow:
      	btrfs_delayed_tree_ref
      	btrfs_delayed_data_ref
      	btrfs_delayed_ref_head
      	btrfs_cow_block
      
      Btrfs natively supports back references, these tracepoints are helpful on
      understanding btrfs's COW mechanism.
      
      8) chunk:
      	btrfs_chunk_alloc
      	btrfs_chunk_free
      
      Chunk is a link between physical offset and logical offset, and stands for space
      infomation in btrfs, and these are helpful on tracing space things.
      
      9) reserved_extent:
      	btrfs_reserved_extent_alloc
      	btrfs_reserved_extent_free
      
      These can show how btrfs uses its space.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1abe9b8a
  11. 22 3月, 2011 1 次提交
  12. 15 3月, 2011 1 次提交
  13. 10 3月, 2011 3 次提交
  14. 03 3月, 2011 1 次提交
    • T
      blktrace: Remove blk_fill_rwbs_rq. · 2d3a8497
      Tao Ma 提交于
      If we enable trace events to trace block actions, We use
      blk_fill_rwbs_rq to analyze the corresponding actions
      in request's cmd_flags, but we only choose the minor 2 bits
      from it, so most of other flags(e.g, REQ_SYNC) are missing.
      For example, with a sync write we get:
      write_test-2409  [001]   160.013869: block_rq_insert: 3,64 W 0 () 258135 + =
      8 [write_test]
      
      Since now we have integrated the flags of both bio and request,
      it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and
      blk_fill_rwbs_rq isn't needed any more.
      
      With this patch, after a sync write we get:
      write_test-2417  [000]   226.603878: block_rq_insert: 3,64 WS 0 () 258135 +=
       8 [write_test]
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      2d3a8497
  15. 02 3月, 2011 1 次提交
  16. 22 1月, 2011 1 次提交
  17. 15 1月, 2011 1 次提交
    • S
      tracing: Only process module tracepoints once · c94fbe1d
      Steven Rostedt 提交于
      The commit:
      
       9f987b3141f086de27832514aad9f50a53f754
       tracing: Include module.h in define_trace.h
      
      only solved half the problem. If the trace/events/module.h header is
      included at the time of define_trace.h (or in ftrace.h within it),
      the module.h TRACE_SYSTEM will override the current TRACE_SYSTEM
      macro.
      
      Since define_trace.h is included when CREATE_TRACE_POINTS is set,
      and the first thing it does is to #undef CREATE_TRACE_POINTS,
      by placing the module.h TRACE_SYSTEM inside a
       #ifdef CREATE_TRACE_POINTS
      we can prevent it from overriding the TRACE_SYSTEM that is
      being processed, and still process the module.h tracepoints
      when the module code defines CREATE_TRACE_POINTS and includes
      the trace/events/module.h header.
      
      As with commit 9f987b3141, this is only an issue if module.h
      is not included before the trace/events/<event>.h file is
      included, which (luckily) has not happened yet.
      Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c94fbe1d
  18. 14 1月, 2011 4 次提交
  19. 12 1月, 2011 6 次提交
  20. 08 1月, 2011 1 次提交
  21. 07 1月, 2011 1 次提交
  22. 04 1月, 2011 1 次提交
    • T
      perf: Clean up power events by introducing new, more generic ones · 25e41933
      Thomas Renninger 提交于
      Add these new power trace events:
      
       power:cpu_idle
       power:cpu_frequency
       power:machine_suspend
      
      The old C-state/idle accounting events:
        power:power_start
        power:power_end
      
      Have now a replacement (but we are still keeping the old
      tracepoints for compatibility):
      
        power:cpu_idle
      
      and
        power:power_frequency
      
      is replaced with:
        power:cpu_frequency
      
      power:machine_suspend is newly introduced.
      
      Jean Pihet has a patch integrated into the generic layer
      (kernel/power/suspend.c) which will make use of it.
      
      the type= field got removed from both, it was never
      used and the type is differed by the event type itself.
      
      perf timechart userspace tool gets adjusted in a separate patch.
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Acked-by: NJean Pihet <jean.pihet@newoldbits.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: rjw@sisk.pl
      LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
      25e41933
  23. 06 12月, 2010 1 次提交
  24. 18 11月, 2010 1 次提交
    • F
      tracing: Allow raw syscall trace events for non privileged users · fe554203
      Frederic Weisbecker 提交于
      This allows non privileged users to use the raw syscall trace events
      for task bound tracing in perf.
      
      It is safe because raw syscall trace events don't leak system wide
      informations.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      fe554203
  25. 16 11月, 2010 1 次提交
  26. 11 11月, 2010 2 次提交
    • M
      ASoC: Add DAPM trace events · 84e90930
      Mark Brown 提交于
      Trace events for DAPM allow us to monitor the performance and behaviour
      of DAPM with logging which can be built into the kernel permanantly, is
      more suited to automated analysis and display and less likely to suffer
      interference from other logging activity.
      
      Currently trace events are generated for:
      
      - Start and stop of DAPM processing
      - Start and stop of bias level changes
      - Power decisions for widgets
      - Widget event execution start and stop
      
      giving some view as to what is happening and where latencies occur.
      
      Actual changes in widget power can be seen via the register write trace in
      soc-core.
      Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      Acked-by: NLiam Girdwood <lrg@slimlogic.co.uk>
      84e90930
    • M
      ASoC: Add trace events for ASoC register read/write · a8b1d34f
      Mark Brown 提交于
      The trace subsystem provides a convenient way of instrumenting the kernel
      which can be left on all the time with extremely low impact on the system
      unlike prints to the kernel log which can be very spammy. Begin adding
      support for instrumenting ASoC via this interface by adding trace for the
      register access primitives.
      Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      Acked-by: NLiam Girdwood <lrg@slimlogic.co.uk>
      a8b1d34f
  27. 09 11月, 2010 1 次提交