- 17 6月, 2009 9 次提交
-
-
由 Wu Fengguang 提交于
We need this in one particular case and two more general ones. Now we do async readahead for sequential mmap reads, and do it with the help of PG_readahead. For normal reads, PG_readahead is the sufficient condition to do a sequential readahead. But unfortunately, for mmap reads, there is a tiny nuisance: [11736.998347] readahead-init0(process: sh/23926, file: sda1/w3m, offset=0:4503599627370495, ra=0+4-3) = 4 [11737.014985] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:0, ra=290+32-0) = 17 [11737.019488] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:0, ra=118+32-0) = 32 [11737.024921] readahead-interleaved(process: w3m/23926, file: sda1/w3m, offset=0:2, ra=4+6-6) = 6 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~ An unfavorably small readahead. The original dumb read-around size could be more efficient. That happened because ld-linux.so does a read(832) in L1 before mmap(), which triggers a 4-page readahead, with the second page tagged PG_readahead. L0: open("/lib/libc.so.6", O_RDONLY) = 3 L1: read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\342"..., 832) = 832 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ L2: fstat(3, {st_mode=S_IFREG|0755, st_size=1420624, ...}) = 0 L3: mmap(NULL, 3527256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fac6e51d000 L4: mprotect(0x7fac6e671000, 2097152, PROT_NONE) = 0 L5: mmap(0x7fac6e871000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x154000) = 0x7fac6e871000 L6: mmap(0x7fac6e876000, 16984, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fac6e876000 L7: close(3) = 0 In general, the PG_readahead flag will also be hit in cases - sequential reads - clustered random reads A full readahead size is desirable in both cases. Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
Auto-detect sequential mmap reads and do readahead for them. The sequential mmap readahead will be triggered when - sync readahead: it's a major fault and (prev_offset == offset-1); - async readahead: minor fault on PG_readahead page with valid readahead state. The benefits of doing readahead instead of read-around: - less I/O wait thanks to async readahead - double real I/O size and no more cache hits The single stream case is improved a little. For 100,000 sequential mmap reads: user system cpu total (1-1) plain -mm, 128KB readaround: 3.224 2.554 48.40% 11.838 (1-2) plain -mm, 256KB readaround: 3.170 2.392 46.20% 11.976 (2) patched -mm, 128KB readahead: 3.117 2.448 47.33% 11.607 The patched (2) has smallest total time, since it has no cache hit overheads and less I/O block time(thanks to async readahead). Here the I/O size makes no much difference, since there's only one single stream. Note that (1-1)'s real I/O size is 64KB and (1-2)'s real I/O size is 128KB, since the half of the read-around pages will be readahead cache hits. This is going to make _real_ differences for _concurrent_ IO streams. Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
This shouldn't really change behavior all that much, but the single rather complex function with read-ahead inside a loop etc is broken up into more manageable pieces. The behaviour is also less subtle, with the read-ahead being done up-front rather than inside some subtle loop and thus avoiding the now unnecessary extra state variables (ie "did_readaround" is gone). Fengguang: the code split in fact fixed a bug reported by Pavel Levshin: the PGMAJFAULT accounting used to be bypassed when MADV_RANDOM is set, in which case the original code will directly jump to no_cached_page reading. Cc: Pavel Levshin <lpk@581.spb.su> Cc: <wli@movementarian.org> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
The readahead call scheme is error-prone in that it expects the call sites to check for async readahead after doing a sync one. I.e. if (!page) page_cache_sync_readahead(); page = find_get_page(); if (page && PageReadahead(page)) page_cache_async_readahead(); This is because PG_readahead could be set by a sync readahead for the _current_ newly faulted in page, and the readahead code simply expects one more callback on the same page to start the async readahead. If the caller fails to do so, it will miss the PG_readahead bits and never able to start an async readahead. Eliminate this insane constraint by piggy-backing the async part into the current readahead window. Now if an async readahead should be started immediately after a sync one, the readahead logic itself will do it. So the following code becomes valid: (the 'else' in particular) if (!page) page_cache_sync_readahead(); else if (PageReadahead(page)) page_cache_async_readahead(); Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
Make sure interleaved readahead size is larger than request size. This also makes the readahead window grow up more quickly. Reported-by: NXu Chenfeng <xcf@ustc.edu.cn> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
(hit_readahead_marker != 0) means the page at @offset is present, so we can search for non-present page starting from @offset+1. Reported-by: NXu Chenfeng <xcf@ustc.edu.cn> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
Just in case someone aggressively sets a huge readahead size. Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
Impact: code simplification. Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
* create mm/init-mm.c, move init_mm there * remove INIT_MM, initialize init_mm with C99 initializer * unexport init_mm on all arches: init_mm is already unexported on x86. One strange place is some OMAP driver (drivers/video/omap/) which won't build modular, but it's already wants get_vm_area() export. Somebody should look there. [akpm@linux-foundation.org: add missing #includes] Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: Mike Frysinger <vapier.adi@gmail.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 6月, 2009 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Remove the shrinking of memory from the suspend-to-RAM code, where it is not really necessary. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NNigel Cunningham <nigel@tuxonice.net> Acked-by: NWu Fengguang <fengguang.wu@intel.com>
-
- 12 6月, 2009 18 次提交
-
-
由 Pekka Enberg 提交于
Fixes the following boot-time warning: [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at kernel/smp.c:369 smp_call_function_many+0x56/0x1bc() [ 0.000000] Hardware name: [ 0.000000] Modules linked in: [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30 #492 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff8149e021>] ? _spin_unlock+0x4f/0x5c [ 0.000000] [<ffffffff8108f11b>] ? smp_call_function_many+0x56/0x1bc [ 0.000000] [<ffffffff81061764>] warn_slowpath_common+0x7c/0xa9 [ 0.000000] [<ffffffff810617a5>] warn_slowpath_null+0x14/0x16 [ 0.000000] [<ffffffff8108f11b>] smp_call_function_many+0x56/0x1bc [ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54 [ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54 [ 0.000000] [<ffffffff8108f2be>] smp_call_function+0x3d/0x68 [ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54 [ 0.000000] [<ffffffff81066fd8>] on_each_cpu+0x31/0x7c [ 0.000000] [<ffffffff810f64f5>] do_tune_cpucache+0x119/0x454 [ 0.000000] [<ffffffff81087080>] ? lockdep_init_map+0x94/0x10b [ 0.000000] [<ffffffff818133b0>] ? kmem_cache_init+0x421/0x593 [ 0.000000] [<ffffffff810f69cf>] enable_cpucache+0x68/0xad [ 0.000000] [<ffffffff818133c3>] kmem_cache_init+0x434/0x593 [ 0.000000] [<ffffffff8180987c>] ? mem_init+0x156/0x161 [ 0.000000] [<ffffffff817f8aae>] start_kernel+0x1cc/0x3b9 [ 0.000000] [<ffffffff817f829a>] x86_64_start_reservations+0xaa/0xae [ 0.000000] [<ffffffff817f837f>] x86_64_start_kernel+0xe1/0xe8 [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
-
由 Pekka Enberg 提交于
As explained by Benjamin Herrenschmidt: Oh and btw, your patch alone doesn't fix powerpc, because it's missing a whole bunch of GFP_KERNEL's in the arch code... You would have to grep the entire kernel for things that check slab_is_available() and even then you'll be missing some. For example, slab_is_available() didn't always exist, and so in the early days on powerpc, we used a mem_init_done global that is set form mem_init() (not perfect but works in practice). And we still have code using that to do the test. Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators in early boot code to avoid enabling interrupts. Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
-
由 Pekka Enberg 提交于
Fixes the following warning during bootup when compiling with CONFIG_SLAB: [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at kernel/lockdep.c:2282 lockdep_trace_alloc+0x91/0xb9() [ 0.000000] Hardware name: [ 0.000000] Modules linked in: [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30 #491 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff81087d84>] ? lockdep_trace_alloc+0x91/0xb9 [ 0.000000] [<ffffffff81061764>] warn_slowpath_common+0x7c/0xa9 [ 0.000000] [<ffffffff810617a5>] warn_slowpath_null+0x14/0x16 [ 0.000000] [<ffffffff81087d84>] lockdep_trace_alloc+0x91/0xb9 [ 0.000000] [<ffffffff810f5b03>] kmem_cache_alloc_node_notrace+0x26/0xdf [ 0.000000] [<ffffffff81487f4e>] ? setup_cpu_cache+0x7e/0x210 [ 0.000000] [<ffffffff81487fe3>] setup_cpu_cache+0x113/0x210 [ 0.000000] [<ffffffff810f73ff>] kmem_cache_create+0x409/0x486 [ 0.000000] [<ffffffff818131c1>] kmem_cache_init+0x232/0x593 [ 0.000000] [<ffffffff8180987c>] ? mem_init+0x156/0x161 [ 0.000000] [<ffffffff817f8aae>] start_kernel+0x1cc/0x3b9 [ 0.000000] [<ffffffff817f829a>] x86_64_start_reservations+0xaa/0xae [ 0.000000] [<ffffffff817f837f>] x86_64_start_kernel+0xe1/0xe8 [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
-
由 Heiko Carstens 提交于
probe_kernel_write() gets used to write to the kernel address space. E.g. to patch the kernel (kgdb, ftrace, kprobes...). Some architectures however enable write protection for the kernel text section, so that writes to this region would fault. This patch allows to specify an architecture specific version of probe_kernel_write() which allows to handle and bypass write protection of the text segment. That way it is still possible to catch random writes to kernel text and explicitly allow writes via this interface. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 KAMEZAWA Hiroyuki 提交于
Now, SLAB is configured in very early stage and it can be used in init routine now. But replacing alloc_bootmem() in FLAT/DISCONTIGMEM's page_cgroup() initialization breaks the allocation, now. (Works well in SPARSEMEM case...it supports MEMORY_HOTPLUG and size of page_cgroup is in reasonable size (< 1 << MAX_ORDER.) This patch revive FLATMEM+memory cgroup by using alloc_bootmem. In future, We stop to support FLATMEM (if no users) or rewrite codes for flatmem completely.But this will adds more messy codes and overheads. Reported-by: NLi Zefan <lizf@cn.fujitsu.com> Tested-by: NLi Zefan <lizf@cn.fujitsu.com> Tested-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
-
由 Yinghai Lu 提交于
The bootmem allocator is no longer available for page_cgroup_init() because we set up the kernel slab allocator much earlier now. Cc: Ingo Molnar <mingo@elte.hu> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
-
由 Pekka Enberg 提交于
We can call vmalloc_init() after kmem_cache_init() and use kzalloc() instead of the bootmem allocator when initializing vmalloc data structures. Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Acked-by: NNick Piggin <npiggin@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
-
由 Pekka Enberg 提交于
This patch makes kmalloc() available earlier in the boot sequence so we can get rid of some bootmem allocations. The bulk of the changes are due to kmem_cache_init() being called with interrupts disabled which requires some changes to allocator boostrap code. Note: 32-bit x86 does WP protect test in mem_init() so we must setup traps before we call mem_init() during boot as reported by Ingo Molnar: We have a hard crash in the WP-protect code: [ 0.000000] Checking if this processor honours the WP bit even in supervisor mode...BUG: Int 14: CR2 ffcff000 [ 0.000000] EDI 00000188 ESI 00000ac7 EBP c17eaf9c ESP c17eaf8c [ 0.000000] EBX 000014e0 EDX 0000000e ECX 01856067 EAX 00000001 [ 0.000000] err 00000003 EIP c10135b1 CS 00000060 flg 00010002 [ 0.000000] Stack: c17eafa8 c17fd410 c16747bc c17eafc4 c17fd7e5 000011fd f8616000 c18237cc [ 0.000000] 00099800 c17bb000 c17eafec c17f1668 000001c5 c17f1322 c166e039 c1822bf0 [ 0.000000] c166e033 c153a014 c18237cc 00020800 c17eaff8 c17f106a 00020800 01ba5003 [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip-02161-g7a74539-dirty #52203 [ 0.000000] Call Trace: [ 0.000000] [<c15357c2>] ? printk+0x14/0x16 [ 0.000000] [<c10135b1>] ? do_test_wp_bit+0x19/0x23 [ 0.000000] [<c17fd410>] ? test_wp_bit+0x26/0x64 [ 0.000000] [<c17fd7e5>] ? mem_init+0x1ba/0x1d8 [ 0.000000] [<c17f1668>] ? start_kernel+0x164/0x2f7 [ 0.000000] [<c17f1322>] ? unknown_bootoption+0x0/0x19c [ 0.000000] [<c17f106a>] ? __init_begin+0x6a/0x6f Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by Linus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matt Mackall <mpm@selenic.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
-
由 Pekka Enberg 提交于
If the user requested bootmem allocation on a specific node, we should use kzalloc_node() for the fallback allocation. Cc: Ingo Molnar <mingo@elte.hu> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
-
由 Pekka Enberg 提交于
As a preparation for initializing the slab allocator early, make sure the bootmem allocator does not crash and burn if someone calls it after slab is up; otherwise we'd need a flag day for switching to early slab. Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Matt Mackall <mpm@selenic.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
-
由 Catalin Marinas 提交于
This patch adds a loadable module that deliberately leaks memory. It is used for testing various memory leaking scenarios. Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Catalin Marinas 提交于
This patch adds the Kconfig.debug and Makefile entries needed for building kmemleak into the kernel. Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Catalin Marinas 提交于
The alloc_large_system_hash function is called from various places in the kernel and it contains pointers to other allocated structures. It therefore needs to be traced by kmemleak. Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Catalin Marinas 提交于
This patch adds the callbacks to kmemleak_(alloc|free) functions from vmalloc/vfree. Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Catalin Marinas 提交于
This patch adds the callbacks to kmemleak_(alloc|free) functions from the slub allocator. Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Cc: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
-
由 Catalin Marinas 提交于
This patch adds the callbacks to kmemleak_(alloc|free) functions from the slob allocator. Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Acked-by: NMatt Mackall <mpm@selenic.com> Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
-
由 Catalin Marinas 提交于
This patch adds the callbacks to kmemleak_(alloc|free) functions from the slab allocator. The patch also adds the SLAB_NOLEAKTRACE flag to avoid recursive calls to kmemleak when it allocates its own data structures. Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
-
由 Catalin Marinas 提交于
This patch adds the base support for the kernel memory leak detector. It traces the memory allocation/freeing in a way similar to the Boehm's conservative garbage collector, the difference being that the unreferenced objects are not freed but only shown in /sys/kernel/debug/kmemleak. Enabling this feature introduces an overhead to memory allocations. Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: NPekka Enberg <penberg@cs.helsinki.fi> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 10 6月, 2009 2 次提交
-
-
由 Paul Mundt 提交于
With the "security: use mmap_min_addr indepedently of security models" change, mmap_min_addr is used in common areas, which susbsequently blows up the nommu build. This stubs in the definition in the nommu case as well. Signed-off-by: NPaul Mundt <lethal@linux-sh.org> -- mm/nommu.c | 3 +++ 1 file changed, 3 insertions(+) Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Li Zefan 提交于
TRACE_EVENT is a more generic way to define tracepoints. Doing so adds these new capabilities to this tracepoint: - zero-copy and per-cpu splice() tracing - binary tracing without printf overhead - structured logging records exposed under /debug/tracing/events - trace events embedded in function tracer output and other plugins - user-defined, per tracepoint filter expressions ... Cons: - no dev_t info for the output of plug, unplug_timer and unplug_io events. no dev_t info for getrq and sleeprq events if bio == NULL. no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL. This is mainly because we can't get the deivce from a request queue. But this may change in the future. - A packet command is converted to a string in TP_assign, not TP_print. While blktrace do the convertion just before output. Since pc requests should be rather rare, this is not a big issue. - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT has a unique format, which means we have some unused data in a trace entry. The overhead is minimized by using __dynamic_array() instead of __array(). I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing: dd dd + ioctl blktrace dd + TRACE_EVENT (splice) 1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s 2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s 3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s So the overhead of tracing is very small, and no regression when using those trace events vs blktrace. And the binary output of TRACE_EVENT is much smaller than blktrace: # ls -l -h -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0 -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1 -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out Following are some comparisons between TRACE_EVENT and blktrace: plug: kjournald-480 [000] 303.084981: block_plug: [kjournald] kjournald-480 [000] 303.084981: 8,0 P N [kjournald] unplug_io: kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1 kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1 remap: kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384 kjournald-480 [000] 303.085043: 8,0 A W 102736992 + 8 <- (8,8) 33384 bio_backmerge: kjournald-480 [000] 303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald] kjournald-480 [000] 303.085086: 8,0 M W 102737032 + 8 [kjournald] getrq: kjournald-480 [000] 303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald] kjournald-480 [000] 303.084975: 8,0 G W 102736984 + 8 [kjournald] bash-2066 [001] 1072.953770: 8,0 G N [bash] bash-2066 [001] 1072.953773: block_getrq: 0,0 N 0 + 0 [bash] rq_complete: konsole-2065 [001] 300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0] konsole-2065 [001] 300.053191: 8,0 C W 103669040 + 16 [0] ksoftirqd/1-7 [001] 1072.953811: 8,0 C N (5a 00 08 00 00 00 00 00 24 00) [0] ksoftirqd/1-7 [001] 1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0] rq_insert: kjournald-480 [000] 303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald] kjournald-480 [000] 303.084986: 8,0 I W 102736984 + 8 [kjournald] Changelog from v2 -> v3: - use the newly introduced __dynamic_array(). Changelog from v1 -> v2: - use __string() instead of __array() to minimize the memory required to store hex dump of rq->cmd(). - support large pc requests. - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT. - some cleanups. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 09 6月, 2009 1 次提交
-
-
由 Peter Zijlstra 提交于
Some JIT compilers allocate memory for generated code with posix_memalign() + mprotect() so we need to hook into mprotect() to make sure 'perf' is aware that we're executing code in anonymous memory. [ penberg@cs.helsinki.fi: move the hook to sys_mprotect() ] Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <Pine.LNX.4.64.0906082111030.12407@melkki.cs.Helsinki.FI> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 05 6月, 2009 1 次提交
-
-
由 Peter Zijlstra 提交于
In order to track the vdso also generate mmap events for install_special_mapping(). Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 04 6月, 2009 2 次提交
-
-
由 Peter Zijlstra 提交于
In name of keeping it simple, only track mmap events. Userspace will have to remove old overlapping maps when it encounters them. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Christoph Lameter 提交于
This patch removes the dependency of mmap_min_addr on CONFIG_SECURITY. It also sets a default mmap_min_addr of 4096. mmapping of addresses below 4096 will only be possible for processes with CAP_SYS_RAWIO. Signed-off-by: NChristoph Lameter <cl@linux-foundation.org> Acked-by: NEric Paris <eparis@redhat.com> Looks-ok-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 29 5月, 2009 4 次提交
-
-
由 Nikanth Karthikesan 提交于
Fix build warning, "mem_cgroup_is_obsolete defined but not used" when CONFIG_DEBUG_VM is not set. Also avoid checking for !mem again and again. Signed-off-by: NNikanth Karthikesan <knikanth@suse.de> Acked-by: NPekka Enberg <penberg@cs.helsinki.fi> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13302 hugetlbfs reserves huge pages but does not fault them at mmap() time to ensure that future faults succeed. The reservation behaviour differs depending on whether the mapping was mapped MAP_SHARED or MAP_PRIVATE. For MAP_SHARED mappings, hugepages are reserved when mmap() is first called and are tracked based on information associated with the inode. Other processes mapping MAP_SHARED use the same reservation. MAP_PRIVATE track the reservations based on the VMA created as part of the mmap() operation. Each process mapping MAP_PRIVATE must make its own reservation. hugetlbfs currently checks if a VMA is MAP_SHARED with the VM_SHARED flag and not VM_MAYSHARE. For file-backed mappings, such as hugetlbfs, VM_SHARED is set only if the mapping is MAP_SHARED and the file was opened read-write. If a shared memory mapping was mapped shared-read-write for populating of data and mapped shared-read-only by other processes, then hugetlbfs would account for the mapping as if it was MAP_PRIVATE. This causes processes to fail to map the file MAP_SHARED even though it should succeed as the reservation is there. This patch alters mm/hugetlb.c and replaces VM_SHARED with VM_MAYSHARE when the intent of the code was to check whether the VMA was mapped MAP_SHARED or MAP_PRIVATE. Signed-off-by: NMel Gorman <mel@csn.ul.ie> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <starlight@binnacle.cx> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daisuke Nishimura 提交于
mapping->tree_lock can be acquired from interrupt context. Then, following dead lock can occur. Assume "A" as a page. CPU0: lock_page_cgroup(A) interrupted -> take mapping->tree_lock. CPU1: take mapping->tree_lock -> lock_page_cgroup(A) This patch tries to fix above deadlock by moving memcg's hook to out of mapping->tree_lock. charge/uncharge of pagecache/swapcache is protected by page lock, not tree_lock. After this patch, lock_page_cgroup() is not called under mapping->tree_lock. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
When /proc/sys/vm/oom_dump_tasks is enabled, it is possible to get a NULL pointer for tasks that have detached mm's since task_lock() is not held during the tasklist scan. Add the task_lock(). Acked-by: NNick Piggin <npiggin@suse.de> Acked-by: NMel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 5月, 2009 1 次提交
-
-
由 Martin K. Petersen 提交于
Convert all external users of queue limits to using wrapper functions instead of poking the request queue variables directly. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 22 5月, 2009 1 次提交
-
-
由 Mimi Zohar 提交于
Based on discussion on lkml (Andrew Morton and Eric Paris), move ima_counts_get down a layer into shmem/hugetlb__file_setup(). Resolves drm shmem_file_setup() usage case as well. HD comment: I still think you're doing this at the wrong level, but recognize that you probably won't be persuaded until a few more users of alloc_file() emerge, all wanting your ima_counts_get(). Resolving GEM's shmem_file_setup() is an improvement, so I'll say Acked-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: NMimi Zohar <zohar@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-