- 01 11月, 2011 1 次提交
-
-
由 Minchan Kim 提交于
Change ISOLATE_XXX macro with bitwise isolate_mode_t type. Normally, macro isn't recommended as it's type-unsafe and making debugging harder as symbol cannot be passed throught to the debugger. Quote from Johannes " Hmm, it would probably be cleaner to fully convert the isolation mode into independent flags. INACTIVE, ACTIVE, BOTH is currently a tri-state among flags, which is a bit ugly." This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h Signed-off-by: NMinchan Kim <minchan.kim@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 10月, 2011 4 次提交
-
-
由 Curt Wohlgemuth 提交于
This creates a new 'reason' field in a wb_writeback_work structure, which unambiguously identifies who initiates writeback activity. A 'wb_reason' enumeration has been added to writeback.h, to enumerate the possible reasons. The 'writeback_work_class' and tracepoint event class and 'writeback_queue_io' tracepoints are updated to include the symbolic 'reason' in all trace events. And the 'writeback_inodes_sbXXX' family of routines has had a wb_stats parameter added to them, so callers can specify why writeback is being started. Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NCurt Wohlgemuth <curtw@google.com> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
-
由 Curt Wohlgemuth 提交于
Instead of sending ->older_than_this to queue_io() and move_expired_inodes(), send the entire wb_writeback_work structure. There are other fields of a work item that are useful in these routines and in tracepoints. Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NCurt Wohlgemuth <curtw@google.com> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
-
由 Wu Fengguang 提交于
Useful for analyzing the dynamics of the throttling algorithms and debugging user reported problems. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
-
由 Wu Fengguang 提交于
It helps understand how various throttle bandwidths are updated. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
-
- 27 10月, 2011 1 次提交
-
-
由 Eric Gouriou 提交于
This patch introduces a fast path in ext4_ext_convert_to_initialized() for the case when the conversion can be performed by transferring the newly initialized blocks from the uninitialized extent into an adjacent initialized extent. Doing so removes the expensive invocations of memmove() which occur during extent insertion and the subsequent merge. In practice this should be the common case for clients performing append writes into files pre-allocated via fallocate(FALLOC_FL_KEEP_SIZE). In such a workload performed via direct IO and when using a suboptimal implementation of memmove() (x86_64 prior to the 2.6.39 rewrite), this patch reduces kernel CPU consumption by 32%. Two new trace points are added to ext4_ext_convert_to_initialized() to offer visibility into its operations. No exit trace point has been added due to the multiplicity of return points. This can be revisited once the upstream cleanup is backported. Signed-off-by: NEric Gouriou <egouriou@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 25 10月, 2011 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
This helps in more control over debugging. root@qemu-img-64:~# ls /pass/123 ls: cannot access /pass/123: No such file or directory root@qemu-img-64:~# cat /sys/kernel/debug/tracing/trace # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | ls-1536 [001] 70.928584: 9p_protocol_dump: clnt 18446612132784021504 P9_TWALK(tag = 1) 000: 16 00 00 00 6e 01 00 01 00 00 00 02 00 00 00 01 010: 00 03 00 31 32 33 00 00 00 ff ff ff ff 00 00 00 ls-1536 [001] 70.928587: <stack trace> => trace_9p_protocol_dump => p9pdu_finalize => p9_client_rpc => p9_client_walk => v9fs_vfs_lookup => d_alloc_and_lookup => walk_component => path_lookupat ls-1536 [000] 70.929696: 9p_protocol_dump: clnt 18446612132784021504 P9_RLERROR(tag = 1) 000: 0b 00 00 00 07 01 00 02 00 00 00 4e 03 00 02 00 010: 00 00 00 00 03 00 02 00 00 00 00 00 ff 43 00 00 ls-1536 [000] 70.929697: <stack trace> => trace_9p_protocol_dump => p9_client_rpc => p9_client_walk => v9fs_vfs_lookup => d_alloc_and_lookup => walk_component => path_lookupat => do_path_lookup Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
- 04 10月, 2011 1 次提交
-
-
由 Andrew Vagin 提交于
Each event adds some points to its counters. By default it adds 1, and a number of points may be transmited in event's parameters. E.g. sched:sched_stat_runtime adds how long process has been running. But this functionality was broken by v2.6.31-rc5-392-gf413cdb8 and now the event's parameters doesn't affect on a number of points. TP_perf_assign isn't defined, so __perf_count(c) isn't executed and __count is always equal to 1. Signed-off-by: NAndrew Vagin <avagin@openvz.org> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1317052535-1765247-2-git-send-email-avagin@openvz.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 03 10月, 2011 1 次提交
-
-
由 Wu Fengguang 提交于
As proposed by Chris, Dave and Jan, don't start foreground writeback IO inside balance_dirty_pages(). Instead, simply let it idle sleep for some time to throttle the dirtying task. In the mean while, kick off the per-bdi flusher thread to do background writeback IO. RATIONALS ========= - disk seeks on concurrent writeback of multiple inodes (Dave Chinner) If every thread doing writes and being throttled start foreground writeback, it leads to N IO submitters from at least N different inodes at the same time, end up with N different sets of IO being issued with potentially zero locality to each other, resulting in much lower elevator sort/merge efficiency and hence we seek the disk all over the place to service the different sets of IO. OTOH, if there is only one submission thread, it doesn't jump between inodes in the same way when congestion clears - it keeps writing to the same inode, resulting in large related chunks of sequential IOs being issued to the disk. This is more efficient than the above foreground writeback because the elevator works better and the disk seeks less. - lock contention and cache bouncing on concurrent IO submitters (Dave Chinner) With this patchset, the fs_mark benchmark on a 12-drive software RAID0 goes from CPU bound to IO bound, freeing "3-4 CPUs worth of spinlock contention". * "CPU usage has dropped by ~55%", "it certainly appears that most of the CPU time saving comes from the removal of contention on the inode_wb_list_lock" (IMHO at least 10% comes from the reduction of cacheline bouncing, because the new code is able to call much less frequently into balance_dirty_pages() and hence access the global page states) * the user space "App overhead" is reduced by 20%, by avoiding the cacheline pollution by the complex writeback code path * "for a ~5% throughput reduction", "the number of write IOs have dropped by ~25%", and the elapsed time reduced from 41:42.17 to 40:53.23. * On a simple test of 100 dd, it reduces the CPU %system time from 30% to 3%, and improves IO throughput from 38MB/s to 42MB/s. - IO size too small for fast arrays and too large for slow USB sticks The write_chunk used by current balance_dirty_pages() cannot be directly set to some large value (eg. 128MB) for better IO efficiency. Because it could lead to more than 1 second user perceivable stalls. Even the current 4MB write size may be too large for slow USB sticks. The fact that balance_dirty_pages() starts IO on itself couples the IO size to wait time, which makes it hard to do suitable IO size while keeping the wait time under control. Now it's possible to increase writeback chunk size proportional to the disk bandwidth. In a simple test of 50 dd's on XFS, 1-HDD, 3GB ram, the larger writeback size dramatically reduces the seek count to 1/10 (far beyond my expectation) and improves the write throughput by 24%. - long block time in balance_dirty_pages() hurts desktop responsiveness Many of us may have the experience: it often takes a couple of seconds or even long time to stop a heavy writing dd/cp/tar command with Ctrl-C or "kill -9". - IO pipeline broken by bumpy write() progress There are a broad class of "loop {read(buf); write(buf);}" applications whose read() pipeline will be under-utilized or even come to a stop if the write()s have long latencies _or_ don't progress in a constant rate. The current threshold based throttling inherently transfers the large low level IO completion fluctuations to bumpy application write()s, and further deteriorates with increasing number of dirtiers and/or bdi's. For example, when doing 50 dd's + 1 remote rsync to an XFS partition, the rsync progresses very bumpy in legacy kernel, and throughput is improved by 67% by this patchset. (plus the larger write chunk size, it will be 93% speedup). The new rate based throttling can support 1000+ dd's with excellent smoothness, low latency and low overheads. For the above reasons, it's much better to do IO-less and low latency pauses in balance_dirty_pages(). Jan Kara, Dave Chinner and me explored the scheme to let balance_dirty_pages() wait for enough writeback IO completions to safeguard the dirty limit. However it's found to have two problems: - in large NUMA systems, the per-cpu counters may have big accounting errors, leading to big throttle wait time and jitters. - NFS may kill large amount of unstable pages with one single COMMIT. Because NFS server serves COMMIT with expensive fsync() IOs, it is desirable to delay and reduce the number of COMMITs. So it's not likely to optimize away such kind of bursty IO completions, and the resulted large (and tiny) stall times in IO completion based throttling. So here is a pause time oriented approach, which tries to control the pause time in each balance_dirty_pages() invocations, by controlling the number of pages dirtied before calling balance_dirty_pages(), for smooth and efficient dirty throttling: - avoid useless (eg. zero pause time) balance_dirty_pages() calls - avoid too small pause time (less than 4ms, which burns CPU power) - avoid too large pause time (more than 200ms, which hurts responsiveness) - avoid big fluctuations of pause times It can control pause times at will. The default policy (in a followup patch) will be to do ~10ms pauses in 1-dd case, and increase to ~100ms in 1000-dd case. BEHAVIOR CHANGE =============== (1) dirty threshold Users will notice that the applications will get throttled once crossing the global (background + dirty)/2=15% threshold, and then balanced around 17.5%. Before patch, the behavior is to just throttle it at 20% dirtyable memory in 1-dd case. Since the task will be soft throttled earlier than before, it may be perceived by end users as performance "slow down" if his application happens to dirty more than 15% dirtyable memory. (2) smoothness/responsiveness Users will notice a more responsive system during heavy writeback. "killall dd" will take effect instantly. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
-
- 29 9月, 2011 5 次提交
-
-
由 Paul E. McKenney 提交于
Add trace events to record grace-period start and end, quiescent states, CPUs noticing grace-period start and end, grace-period initialization, call_rcu() invocation, tasks blocking in RCU read-side critical sections, tasks exiting those same critical sections, force_quiescent_state() detection of dyntick-idle and offline CPUs, CPUs entering and leaving dyntick-idle mode (except from NMIs), CPUs coming online and going offline, and CPUs being kicked for staying in dyntick-idle mode for too long (as in many weeks, even on 32-bit systems). Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> rcu: Add the rcu flavor to callback trace events The earlier trace events for registering RCU callbacks and for invoking them did not include the RCU flavor (rcu_bh, rcu_preempt, or rcu_sched). This commit adds the RCU flavor to those trace events. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
Add event-trace markers to TREE_RCU kthreads to allow including these kthread's CPU time in the utilization calculations. Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
Add a string to the rcu_batch_start() and rcu_batch_end() trace messages that indicates the RCU type ("rcu_sched", "rcu_bh", or "rcu_preempt"). The trace messages for the actual invocations themselves are not marked, as it should be clear from the rcu_batch_start() and rcu_batch_end() events before and after. Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
This commit adds the trace_rcu_utilization() marker that is to be used to allow postprocessing scripts compute RCU's CPU utilization, give or take event-trace overhead. Note that we do not include RCU's dyntick-idle interface because event tracing requires RCU protection, which is not available in dyntick-idle mode. Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Paul E. McKenney 提交于
There was recently some controversy about the overhead of invoking RCU callbacks. Add TRACE_EVENT()s to obtain fine-grained timings for the start and stop of a batch of callbacks and also for each callback invoked. Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 28 9月, 2011 1 次提交
-
-
由 Ming Lei 提交于
This patch introduces 3 trace points to prepare for tracing rpm_idle/rpm_suspend/rpm_resume functions, so we can use these trace points to replace the current dev_dbg(). Signed-off-by: NMing Lei <ming.lei@canonical.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
- 26 9月, 2011 1 次提交
-
-
由 Peter Zijlstra 提交于
We had need to see the difference between scheduling a runnable task and a runnable task being involuntarily preempted. No app should rely on the old string output (the binary trace event record format is not changed). Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1316164603.10174.11.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 23 9月, 2011 1 次提交
-
-
由 Mark Brown 提交于
The number of times we look at a potentially connected neighbour is just as important as the number of times we actually recurse into looking at that neighbour so also collect that statistic. Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
-
- 21 9月, 2011 1 次提交
-
-
由 Mark Brown 提交于
One of the longest standing areas for improvement in ASoC has been the DAPM algorithm - it repeats the same checks many times whenever it is run and makes no effort to limit the areas of the graph it checks meaning we do an awful lot of walks over the full graph. This has never mattered too much as the size of the graph has generally been small in relation to the size of the devices supported and the speed of CPUs but it is annoying. In preparation for work on improving this insert a trace point after the graph walk has been done. This gives us specific timing information for the walk, and in order to give quantifiable (non-benchmark) numbers also count every time we check a link or check the power for a widget and report those numbers. Substantial changes in the algorithm may require tweaks to the stats but they should be useful for simpler things. Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
-
- 20 9月, 2011 1 次提交
-
-
由 Dimitris Papastamos 提交于
Signed-off-by: NDimitris Papastamos <dp@opensource.wolfsonmicro.com> Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
-
- 10 9月, 2011 1 次提交
-
-
由 Aditya Kali 提交于
This patch adds some tracepoints in ext4/extents.c and updates a tracepoint in ext4/inode.c. Tested: Built and ran the kernel and verified that these tracepoints work. Also ran xfstests. Signed-off-by: NAditya Kali <adityakali@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 31 8月, 2011 1 次提交
-
-
由 Wu Fengguang 提交于
Save inode->dirtied_when in the raw trace output for reliable scripting, and to also show in formatted output the relative age in seconds for easy human reading. CC: Jan Kara <jack@suse.cz> Acked-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
-
- 11 8月, 2011 1 次提交
-
-
由 Namhyung Kim 提交于
Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or FUA follows WRITE, use the same 'F' flag for both cases and distinguish them by their (relative) position. The end results look like (other flags might be shown also): - WRITE: W - WRITE_FLUSH: FW - WRITE_FUA: WF - WRITE_FLUSH_FUA: FWF Note that we reuse TC_BARRIER due to lack of bit space of act_mask so that the older versions of blktrace tools will report flush requests as barriers from now on. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 09 8月, 2011 1 次提交
-
-
由 Mark Brown 提交于
x86_64 warns as size_t is not an int. Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com> Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
-
- 08 8月, 2011 1 次提交
-
-
由 Mark Brown 提交于
Trace single register reads and writes, plus start/stop tracepoints for the actual I/O to see where we're spending time. This makes it easy to have always on logging without overwhelming the logs and also lets us take advantage of all the context and time information that the trace subsystem collects for us. We don't currently trace register values for bulk operations as this would add complexity and overhead parsing the cooked data that's being worked with. Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
-
- 31 7月, 2011 1 次提交
-
-
由 Theodore Ts'o 提交于
As requested by Al Viro, since umode_t may be changing to a u32 for some architectures. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Al Viro <viro@ZenIV.linux.org.uk>
-
- 26 7月, 2011 1 次提交
-
-
由 Jeremy Fitzhardinge 提交于
When CONFIG_FUNCTION_TRACER is disabled, compilation fails as follows: CC arch/x86/xen/setup.o In file included from arch/x86/include/asm/xen/hypercall.h:42, from arch/x86/xen/setup.c:19: include/trace/events/xen.h:31: warning: 'struct multicall_entry' declared inside parameter list include/trace/events/xen.h:31: warning: its scope is only this definition or declaration, which is probably not what you want include/trace/events/xen.h:31: warning: 'struct multicall_entry' declared inside parameter list include/trace/events/xen.h:31: warning: 'struct multicall_entry' declared inside parameter list include/trace/events/xen.h:31: warning: 'struct multicall_entry' declared inside parameter list [...] arch/x86/xen/trace.c:5: error: '__HYPERVISOR_set_trap_table' undeclared here (not in a function) arch/x86/xen/trace.c:5: error: array index in initializer not of integer type arch/x86/xen/trace.c:5: error: (near initialization for 'xen_hypercall_names') arch/x86/xen/trace.c:6: error: '__HYPERVISOR_mmu_update' undeclared here (not in a function) arch/x86/xen/trace.c:6: error: array index in initializer not of integer type arch/x86/xen/trace.c:6: error: (near initialization for 'xen_hypercall_names') Fix this by making sure struct multicall_entry has a declaration in scope at all times, and don't bother compiling xen/trace.c when tracing is disabled. Reported-by: NRandy Dunlap <rdunlap@xenotime.net> Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
- 20 7月, 2011 1 次提交
-
-
由 Dave Chinner 提交于
It is impossible to understand what the shrinkers are actually doing without instrumenting the code, so add a some tracepoints to allow insight to be gained. Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 19 7月, 2011 9 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
由 Jeremy Fitzhardinge 提交于
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
-
- 11 7月, 2011 3 次提交
-
-
由 Tao Ma 提交于
Add ext4_trim_extent and ext4_trim_all_free. Reviewed-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This will help debug who is responsible for starting a jbd2 transaction. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Using function calls in TP_printk causes perf heartburn, so print the MAJOR/MINOR device numbers instead. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 10 7月, 2011 1 次提交
-
-
由 Wu Fengguang 提交于
Add trace event balance_dirty_state for showing the global dirty page counts and thresholds at each global_dirty_limits() invocation. This will cover the callers throttle_vm_writeout(), over_bground_thresh() and each balance_dirty_pages() loop. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
-