1. 19 7月, 2011 8 次提交
  2. 16 6月, 2011 2 次提交
    • K
      vmscan: implement swap token priority aging · d7911ef3
      KOSAKI Motohiro 提交于
      While testing for memcg aware swap token, I observed a swap token was
      often grabbed an intermittent running process (eg init, auditd) and they
      never release a token.
      
      Why?
      
      Some processes (eg init, auditd, audispd) wake up when a process exiting.
      And swap token can be get first page-in process when a process exiting
      makes no swap token owner.  Thus such above intermittent running process
      often get a token.
      
      And currently, swap token priority is only decreased at page fault path.
      Then, if the process sleep immediately after to grab swap token, the swap
      token priority never be decreased.  That's obviously undesirable.
      
      This patch implement very poor (and lightweight) priority aging.  It only
      be affect to the above corner case and doesn't change swap tendency
      workload performance (eg multi process qsbench load)
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7911ef3
    • K
      vmscan: implement swap token trace · 83cd81a3
      KOSAKI Motohiro 提交于
      This is useful for observing swap token activity.
      
      example output:
      
                   zsh-1845  [000]   598.962716: update_swap_token_priority:
      mm=ffff88015eaf7700 old_prio=1 new_prio=0
                memtoy-1830  [001]   602.033900: update_swap_token_priority:
      mm=ffff880037a45880 old_prio=947 new_prio=949
                memtoy-1830  [000]   602.041509: update_swap_token_priority:
      mm=ffff880037a45880 old_prio=949 new_prio=951
                memtoy-1830  [000]   602.051959: update_swap_token_priority:
      mm=ffff880037a45880 old_prio=951 new_prio=953
                memtoy-1830  [000]   602.052188: update_swap_token_priority:
      mm=ffff880037a45880 old_prio=953 new_prio=955
                memtoy-1830  [001]   602.427184: put_swap_token:
      token_mm=ffff880037a45880
                   zsh-1789  [000]   602.427281: replace_swap_token:
      old_token_mm=          (null) old_prio=0 new_token_mm=ffff88015eaf7018
      new_prio=2
                   zsh-1789  [001]   602.433456: update_swap_token_priority:
      mm=ffff88015eaf7018 old_prio=2 new_prio=4
                   zsh-1789  [000]   602.437613: update_swap_token_priority:
      mm=ffff88015eaf7018 old_prio=4 new_prio=6
                   zsh-1789  [000]   602.443924: update_swap_token_priority:
      mm=ffff88015eaf7018 old_prio=6 new_prio=8
                   zsh-1789  [000]   602.451873: update_swap_token_priority:
      mm=ffff88015eaf7018 old_prio=8 new_prio=10
                   zsh-1789  [001]   602.462639: update_swap_token_priority:
      mm=ffff88015eaf7018 old_prio=10 new_prio=12
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: Rik van Riel<riel@redhat.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83cd81a3
  3. 15 6月, 2011 1 次提交
    • S
      rcu: Use softirq to address performance regression · 09223371
      Shaohua Li 提交于
      Commit a26ac245(rcu: move TREE_RCU from softirq to kthread)
      introduced performance regression. In an AIM7 test, this commit degraded
      performance by about 40%.
      
      The commit runs rcu callbacks in a kthread instead of softirq. We observed
      high rate of context switch which is caused by this. Out test system has
      64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
      which is caused by RCU's per-CPU kthread.  A trace showed that most of
      the time the RCU per-CPU kthread doesn't actually handle any callbacks,
      but instead just does a very small amount of work handling grace periods.
      This means that RCU's per-CPU kthreads are making the scheduler do quite
      a bit of work in order to allow a very small amount of RCU-related
      processing to be done.
      
      Alex Shi's analysis determined that this slowdown is due to lock
      contention within the scheduler.  Unfortunately, as Peter Zijlstra points
      out, the scheduler's real-time semantics require global action, which
      means that this contention is inherent in real-time scheduling.  (Yes,
      perhaps someone will come up with a workaround -- otherwise, -rt is not
      going to do well on large SMP systems -- but this patch will work around
      this issue in the meantime.  And "the meantime" might well be forever.)
      
      This patch therefore re-introduces softirq processing to RCU, but only
      for core RCU work.  RCU callbacks are still executed in kthread context,
      so that only a small amount of RCU work runs in softirq context in the
      common case.  This should minimize ksoftirqd execution, allowing us to
      skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Tested-by: N"Alex,Shi" <alex.shi@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      09223371
  4. 06 6月, 2011 1 次提交
  5. 03 6月, 2011 1 次提交
    • K
      net: tracepoint of net_dev_xmit sees freed skb and causes panic · ec764bf0
      Koki Sanagi 提交于
      Because there is a possibility that skb is kfree_skb()ed and zero cleared
      after ndo_start_xmit, we should not see the contents of skb like skb->len and
      skb->dev->name after ndo_start_xmit. But trace_net_dev_xmit does that
      and causes panic by NULL pointer dereference.
      This patch fixes trace_net_dev_xmit not to see the contents of skb directly.
      
      If you want to reproduce this panic,
      
      1. Get tracepoint of net_dev_xmit on
      2. Create 2 guests on KVM
      2. Make 2 guests use virtio_net
      4. Execute netperf from one to another for a long time as a network burden
      5. host will panic(It takes about 30 minutes)
      Signed-off-by: NKoki Sanagi <sanagi.koki@jp.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec764bf0
  6. 26 5月, 2011 2 次提交
  7. 20 5月, 2011 1 次提交
  8. 12 5月, 2011 1 次提交
  9. 06 5月, 2011 1 次提交
  10. 16 4月, 2011 1 次提交
  11. 12 4月, 2011 2 次提交
  12. 28 3月, 2011 1 次提交
    • L
      Btrfs: add initial tracepoint support for btrfs · 1abe9b8a
      liubo 提交于
      Tracepoints can provide insight into why btrfs hits bugs and be greatly
      helpful for debugging, e.g
                    dd-7822  [000]  2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0
                    dd-7822  [000]  2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0
       btrfs-transacti-7804  [001]  2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0)
       btrfs-transacti-7804  [001]  2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0)
       btrfs-transacti-7804  [001]  2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8
         flush-btrfs-2-7821  [001]  2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA
         flush-btrfs-2-7821  [001]  2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0)
         flush-btrfs-2-7821  [001]  2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0)
         flush-btrfs-2-7821  [000]  2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0)
       btrfs-endio-wri-7800  [001]  2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0)
       btrfs-endio-wri-7800  [001]  2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0)
      
      Here is what I have added:
      
      1) ordere_extent:
              btrfs_ordered_extent_add
              btrfs_ordered_extent_remove
              btrfs_ordered_extent_start
              btrfs_ordered_extent_put
      
      These provide critical information to understand how ordered_extents are
      updated.
      
      2) extent_map:
              btrfs_get_extent
      
      extent_map is used in both read and write cases, and it is useful for tracking
      how btrfs specific IO is running.
      
      3) writepage:
              __extent_writepage
              btrfs_writepage_end_io_hook
      
      Pages are cirtical resourses and produce a lot of corner cases during writeback,
      so it is valuable to know how page is written to disk.
      
      4) inode:
              btrfs_inode_new
              btrfs_inode_request
              btrfs_inode_evict
      
      These can show where and when a inode is created, when a inode is evicted.
      
      5) sync:
              btrfs_sync_file
              btrfs_sync_fs
      
      These show sync arguments.
      
      6) transaction:
              btrfs_transaction_commit
      
      In transaction based filesystem, it will be useful to know the generation and
      who does commit.
      
      7) back reference and cow:
      	btrfs_delayed_tree_ref
      	btrfs_delayed_data_ref
      	btrfs_delayed_ref_head
      	btrfs_cow_block
      
      Btrfs natively supports back references, these tracepoints are helpful on
      understanding btrfs's COW mechanism.
      
      8) chunk:
      	btrfs_chunk_alloc
      	btrfs_chunk_free
      
      Chunk is a link between physical offset and logical offset, and stands for space
      infomation in btrfs, and these are helpful on tracing space things.
      
      9) reserved_extent:
      	btrfs_reserved_extent_alloc
      	btrfs_reserved_extent_free
      
      These can show how btrfs uses its space.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1abe9b8a
  13. 22 3月, 2011 1 次提交
  14. 15 3月, 2011 1 次提交
  15. 10 3月, 2011 3 次提交
  16. 03 3月, 2011 1 次提交
    • T
      blktrace: Remove blk_fill_rwbs_rq. · 2d3a8497
      Tao Ma 提交于
      If we enable trace events to trace block actions, We use
      blk_fill_rwbs_rq to analyze the corresponding actions
      in request's cmd_flags, but we only choose the minor 2 bits
      from it, so most of other flags(e.g, REQ_SYNC) are missing.
      For example, with a sync write we get:
      write_test-2409  [001]   160.013869: block_rq_insert: 3,64 W 0 () 258135 + =
      8 [write_test]
      
      Since now we have integrated the flags of both bio and request,
      it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and
      blk_fill_rwbs_rq isn't needed any more.
      
      With this patch, after a sync write we get:
      write_test-2417  [000]   226.603878: block_rq_insert: 3,64 WS 0 () 258135 +=
       8 [write_test]
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      2d3a8497
  17. 02 3月, 2011 1 次提交
  18. 03 2月, 2011 1 次提交
    • S
      tracing: Replace trace_event struct array with pointer array · e4a9ea5e
      Steven Rostedt 提交于
      Currently the trace_event structures are placed in the _ftrace_events
      section, and at link time, the linker makes one large array of all
      the trace_event structures. On boot up, this array is read (much like
      the initcall sections) and the events are processed.
      
      The problem is that there is no guarantee that gcc will place complex
      structures nicely together in an array format. Two structures in the
      same file may be placed awkwardly, because gcc has no clue that they
      are suppose to be in an array.
      
      A hack was used previous to force the alignment to 4, to pack the
      structures together. But this caused alignment issues with other
      architectures (sparc).
      
      Instead of packing the structures into an array, the structures' addresses
      are now put into the _ftrace_event section. As pointers are always the
      natural alignment, gcc should always pack them tightly together
      (otherwise initcall, extable, etc would also fail).
      
      By having the pointers to the structures in the section, we can still
      iterate the trace_events without causing unnecessary alignment problems
      with other architectures, or depending on the current behaviour of
      gcc that will likely change in the future just to tick us kernel developers
      off a little more.
      
      The _ftrace_event section is also moved into the .init.data section
      as it is now only needed at boot up.
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e4a9ea5e
  19. 22 1月, 2011 1 次提交
  20. 15 1月, 2011 1 次提交
    • S
      tracing: Only process module tracepoints once · c94fbe1d
      Steven Rostedt 提交于
      The commit:
      
       9f987b3141f086de27832514aad9f50a53f754
       tracing: Include module.h in define_trace.h
      
      only solved half the problem. If the trace/events/module.h header is
      included at the time of define_trace.h (or in ftrace.h within it),
      the module.h TRACE_SYSTEM will override the current TRACE_SYSTEM
      macro.
      
      Since define_trace.h is included when CREATE_TRACE_POINTS is set,
      and the first thing it does is to #undef CREATE_TRACE_POINTS,
      by placing the module.h TRACE_SYSTEM inside a
       #ifdef CREATE_TRACE_POINTS
      we can prevent it from overriding the TRACE_SYSTEM that is
      being processed, and still process the module.h tracepoints
      when the module code defines CREATE_TRACE_POINTS and includes
      the trace/events/module.h header.
      
      As with commit 9f987b3141, this is only an issue if module.h
      is not included before the trace/events/<event>.h file is
      included, which (luckily) has not happened yet.
      Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c94fbe1d
  21. 14 1月, 2011 4 次提交
  22. 12 1月, 2011 4 次提交