- 23 12月, 2008 4 次提交
-
-
由 Ingo Molnar 提交于
Impact: fix per task clock counter precision Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: rename field names Shorten them. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: schedule in groups atomically If there are multiple groups in a task, make sure they are scheduled in and out atomically. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: remove debug checks Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 17 12月, 2008 1 次提交
-
-
由 Ingo Molnar 提交于
Impact: implement default-off counters Make sure that counters that are created with counter.hw_event.disabled=1, get created in disabled state. They can be enabled via: prctl(PR_TASK_PERF_COUNTERS_ENABLE); Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 15 12月, 2008 8 次提交
-
-
由 Ingo Molnar 提交于
If counters are exiting via do_exit() not via filp close, then the CPU context needs to be released - otherwise future percpu counter creations might fail. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Make perf_max_counters default to at least 1 - this allows the sw counters to be used. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: add new feature, new sw counter Add a counter that counts the number of pagefaults a task is experiencing. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: add new feature, new sw counter Add a counter that counts the number of cross-CPU migrations a task is suffering. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: add new feature, new sw counter Add a counter that counts the number of context-switches a task is doing. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: bugfix Update the task clock counter to the new math. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: implement new performance feature Counter inheritance can be used to run performance counters in a workload, transparently - and pipe back the counter results to the parent counter. Inheritance for performance counters works the following way: when creating a counter it can be marked with the .inherit=1 flag. Such counters are then 'inherited' by all child tasks (be they fork()-ed or clone()-ed). These counters get inherited through exec() boundaries as well (except through setuid boundaries). The counter values get added back to the parent counter(s) when the child task(s) exit - much like stime/utime statistics are gathered. So inherited counters are ideal to gather summary statistics about an application's behavior via shell commands, without having to modify that application. The timec.c command utilizes counter inheritance: http://redhat.com/~mingo/perfcounters/timec.c Sample output: $ ./timec -e 1 -e 3 -e 5 ls -lR /usr/include/ >/dev/null Performance counter stats for 'ls': 163516953 instructions 2295 cache-misses 2855182 branch-misses Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: restructure code Change counter math from absolute values to clear delta logic. We try to extract elapsed deltas from the raw hw counter - and put that into the generic counter. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 11 12月, 2008 14 次提交
-
-
由 Ingo Molnar 提交于
Impact: cleanup Introduce a proper enum for the 3 states of a counter: PERF_COUNTER_STATE_OFF = -1 PERF_COUNTER_STATE_INACTIVE = 0 PERF_COUNTER_STATE_ACTIVE = 1 and rename counter->active to counter->state and propagate the changes everywhere. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Add a way for self-monitoring tasks to disable/enable counters summarily, via a prctl: PR_TASK_PERF_COUNTERS_DISABLE 31 PR_TASK_PERF_COUNTERS_ENABLE 32 Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: add new perf-counter type The 'task clock' counter counts the amount of time a task is executing, in nanoseconds. It stops ticking when a task is scheduled out either due to it blocking, sleeping or it being preempted. This counter type is a Linux kernel based abstraction, it is available even if the hardware does not support native hardware performance counters. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: cleanup Rename them to better match up the usual IRQ disable/enable APIs: hw_perf_disable_all() => hw_perf_save_disable() hw_perf_restore_ctrl() => hw_perf_restore() Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: add new perf-counter type The 'CPU clock' counter counts the amount of CPU clock time that is elapsing, in nanoseconds. (regardless of how much of it the task is spending on a CPU executing) This counter type is a Linux kernel based abstraction, it is available even if the hardware does not support native hardware performance counters. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: restructure code, introduce hw_ops driver abstraction Introduce this abstraction to handle counter details: struct hw_perf_counter_ops { void (*hw_perf_counter_enable) (struct perf_counter *counter); void (*hw_perf_counter_disable) (struct perf_counter *counter); void (*hw_perf_counter_read) (struct perf_counter *counter); }; This will be useful to support assymetric hw details, and it will also be useful to implement "software counters". (Counters that count kernel managed sw events such as pagefaults, context-switches, wall-clock time or task-local time.) Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: bugfix Check that a group does not span outside the context of a CPU or a task. Also, do not allow deep recursive hierarchies. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: add group counters This patch adds the "counter groups" abstraction. Groups of counters behave much like normal 'single' counters, with a few semantic and behavioral extensions on top of that. A counter group is created by creating a new counter with the open() syscall's group-leader group_fd file descriptor parameter pointing to another, already existing counter. Groups of counters are scheduled in and out in one atomic group, and they are also roundrobin-scheduled atomically. Counters that are member of a group can also record events with an (atomic) extended timestamp that extends to all members of the group, if the record type is set to PERF_RECORD_GROUP. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
Impact: clean up new API Thorough cleanup of the new perf counters API, we now get clean separation of the various concepts: - introduce perf_counter_hw_event to separate out the event source details - move special type flags into separate attributes: PERF_COUNT_NMI, PERF_COUNT_RAW - extend the type to u64 and reserve it fully to the architecture in the raw type case. And make use of all these changes in the core and x86 perfcounters code. Also change the syscall signature to: asmlinkage int sys_perf_counter_open( struct perf_counter_hw_event *hw_event_uptr __user, pid_t pid, int cpu, int group_fd); ( Note that group_fd is unused for now - it's reserved for the counter groups abstraction. ) Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Thomas Gleixner 提交于
Impact: change syscall, cleanup Make use of the new perf_counters event type. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Thomas Gleixner 提交于
Impact: cleanup Introduce a separate hw_event type. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Hugh Dickins 提交于
Lee Schermerhorn noticed yesterday that I broke the mapping_writably_mapped test in 2.6.7! Bad bad bug, good good find. The i_mmap_writable count must be incremented for VM_SHARED (just as i_writecount is for VM_DENYWRITE, but while holding the i_mmap_lock) when dup_mmap() copies the vma for fork: it has its own more optimal version of __vma_link_file(), and I missed this out. So the count was later going down to 0 (dangerous) when one end unmapped, then wrapping negative (inefficient) when the other end unmapped. The only impact on x86 would have been that setting a mandatory lock on a file which has at some time been opened O_RDWR and mapped MAP_SHARED (but not necessarily PROT_WRITE) across a fork, might fail with -EAGAIN when it should succeed, or succeed when it should fail. But those architectures which rely on flush_dcache_page() to flush userspace modifications back into the page before the kernel reads it, may in some cases have skipped the flush after such a fork - though any repetitive test will soon wrap the count negative, in which case it will flush_dcache_page() unnecessarily. Fix would be a two-liner, but mapping variable added, and comment moved. Reported-by: NLee Schermerhorn <Lee.Schermerhorn@hp.com> Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Miles Lane tailing /sys files hit a BUG which Pekka Enberg has tracked to my 966c8c12 sprint_symbol(): use less stack exposing a bug in slub's list_locations() - kallsyms_lookup() writes a 0 to namebuf[KSYM_NAME_LEN-1], but that was beyond the end of page provided. The 100 slop which list_locations() allows at end of page looks roughly enough for all the other stuff it might print after the symbol before it checks again: break out KSYM_SYMBOL_LEN earlier than before. Latencytop and ftrace and are using KSYM_NAME_LEN buffers where they need KSYM_SYMBOL_LEN buffers, and vmallocinfo a 2*KSYM_NAME_LEN buffer where it wants a KSYM_SYMBOL_LEN buffer: fix those before anyone copies them. [akpm@linux-foundation.org: ftrace.h needs module.h] Signed-off-by: NHugh Dickins <hugh@veritas.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc Miles Lane <miles.lane@gmail.com> Acked-by: NPekka Enberg <penberg@cs.helsinki.fi> Acked-by: NSteven Rostedt <srostedt@redhat.com> Acked-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tom Zanussi 提交于
Running kmemtraced, which uses splice() on relayfs, causes a hard lock on x86-64 SMP. As described by Tom Zanussi: It looks like you hit the same problem as described here: commit 8191ecd1 splice: fix infinite loop in generic_file_splice_read() relay uses the same loop but it never got noticed or fixed. Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Tested-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NTom Zanussi <tzanussi@gmail.com> Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 12月, 2008 1 次提交
-
-
由 Brian King 提交于
Impact: fix possible deadlock in CPU hot-remove path This patch fixes a possible deadlock scenario in the CPU remove path. migration_call grabs rq->lock, then wakes up everything on rq->migration_queue with the lock held. Then one of the tasks on the migration queue ends up calling tg_shares_up which then also tries to acquire the same rq->lock. [c000000058eab2e0] c000000000502078 ._spin_lock_irqsave+0x98/0xf0 [c000000058eab370] c00000000008011c .tg_shares_up+0x10c/0x20c [c000000058eab430] c00000000007867c .walk_tg_tree+0xc4/0xfc [c000000058eab4d0] c0000000000840c8 .try_to_wake_up+0xb0/0x3c4 [c000000058eab590] c0000000000799a0 .__wake_up_common+0x6c/0xe0 [c000000058eab640] c00000000007ada4 .complete+0x54/0x80 [c000000058eab6e0] c000000000509fa8 .migration_call+0x5fc/0x6f8 [c000000058eab7c0] c000000000504074 .notifier_call_chain+0x68/0xe0 [c000000058eab860] c000000000506568 ._cpu_down+0x2b0/0x3f4 [c000000058eaba60] c000000000506750 .cpu_down+0xa4/0x108 [c000000058eabb10] c000000000507e54 .store_online+0x44/0xa8 [c000000058eabba0] c000000000396260 .sysdev_store+0x3c/0x50 [c000000058eabc10] c0000000001a39b8 .sysfs_write_file+0x124/0x18c [c000000058eabcd0] c00000000013061c .vfs_write+0xd0/0x1bc [c000000058eabd70] c0000000001308a4 .sys_write+0x68/0x114 [c000000058eabe30] c0000000000086b4 syscall_exit+0x0/0x40 Signed-off-by: NBrian King <brking@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 09 12月, 2008 4 次提交
-
-
由 Al Viro 提交于
Timestamp in audit_context is valid only if ->in_syscall is set. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Randy Dunlap 提交于
Delete excess kernel-doc notation in kernel/auditsc.c: Warning(linux-2.6.27-git10//kernel/auditsc.c:1481): Excess function parameter or struct member 'tsk' description in 'audit_syscall_entry' Warning(linux-2.6.27-git10//kernel/auditsc.c:1564): Excess function parameter or struct member 'tsk' description in 'audit_syscall_exit' Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric Paris 提交于
Currently audit=0 on the kernel command line does absolutely nothing. Audit always loads and always uses its resources such as creating the kernel netlink socket. This patch causes audit=0 to actually disable audit. Audit will use no resources and starting the userspace auditd daemon will not cause the kernel audit system to activate. Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 08 12月, 2008 1 次提交
-
-
由 Thomas Gleixner 提交于
Implement the core kernel bits of Performance Counters subsystem. The Linux Performance Counter subsystem provides an abstraction of performance counter hardware capabilities. It provides per task and per CPU counters, and it provides event capabilities on top of those. Performance counters are accessed via special file descriptors. There's one file descriptor per virtual counter used. The special file descriptor is opened via the perf_counter_open() system call: int perf_counter_open(u32 hw_event_type, u32 hw_event_period, u32 record_type, pid_t pid, int cpu); The syscall returns the new fd. The fd can be used via the normal VFS system calls: read() can be used to read the counter, fcntl() can be used to set the blocking mode, etc. Multiple counters can be kept open at a time, and the counters can be poll()ed. See more details in Documentation/perf-counters.txt. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 04 12月, 2008 2 次提交
-
-
由 Al Viro 提交于
it had been put there to mark the call of blkdev_put() that needed proper argument propagated to it; later patch in the same series had done just that. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 john stultz 提交于
Impact: fix time warp bug Alex Shi, along with Yanmin Zhang have been noticing occasional time inconsistencies recently. Through their great diagnosis, they found that the xtime_nsec value used in update_wall_time was occasionally going negative. After looking through the code for awhile, I realized we have the possibility for an underflow when three conditions are met in update_wall_time(): 1) We have accumulated a second's worth of nanoseconds, so we incremented xtime.tv_sec and appropriately decrement xtime_nsec. (This doesn't cause xtime_nsec to go negative, but it can cause it to be small). 2) The remaining offset value is large, but just slightly less then cycle_interval. 3) clocksource_adjust() is speeding up the clock, causing a corrective amount (compensating for the increase in the multiplier being multiplied against the unaccumulated offset value) to be subtracted from xtime_nsec. This can cause xtime_nsec to underflow. Unfortunately, since we notify the NTP subsystem via second_overflow() whenever we accumulate a full second, and this effects the error accumulation that has already occured, we cannot simply revert the accumulated second from xtime nor move the second accumulation to after the clocksource_adjust call without a change in behavior. This leaves us with (at least) two options: 1) Simply return from clocksource_adjust() without making a change if we notice the adjustment would cause xtime_nsec to go negative. This would work, but I'm concerned that if a large adjustment was needed (due to the error being large), it may be possible to get stuck with an ever increasing error that becomes too large to correct (since it may always force xtime_nsec negative). This may just be paranoia on my part. 2) Catch xtime_nsec if it is negative, then add back the amount its negative to both xtime_nsec and the error. This second method is consistent with how we've handled earlier rounding issues, and also has the benefit that the error being added is always in the oposite direction also always equal or smaller then the correction being applied. So the risk of a corner case where things get out of control is lessened. This patch fixes bug 11970, as tested by Yanmin Zhang http://bugzilla.kernel.org/show_bug.cgi?id=11970 Reported-by: alex.shi@intel.com Signed-off-by: NJohn Stultz <johnstul@us.ibm.com> Acked-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Tested-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 03 12月, 2008 1 次提交
-
-
由 Roel Kluin 提交于
Impact: fix warnings-limit cutoff check for debug feature unsigned sysctl_hung_task_warnings cannot be less than 0 Signed-off-by: NRoel Kluin <roel.kluin@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 02 12月, 2008 2 次提交
-
-
由 Arjan van de Ven 提交于
The description for 'D' was missing in the comment... (causing me a minute of WTF followed by looking at more of the code) Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davide Libenzi 提交于
It has been thought that the per-user file descriptors limit would also limit the resources that a normal user can request via the epoll interface. Vegard Nossum reported a very simple program (a modified version attached) that can make a normal user to request a pretty large amount of kernel memory, well within the its maximum number of fds. To solve such problem, default limits are now imposed, and /proc based configuration has been introduced. A new directory has been created, named /proc/sys/fs/epoll/ and inside there, there are two configuration points: max_user_instances = Maximum number of devices - per user max_user_watches = Maximum number of "watched" fds - per user The current default for "max_user_watches" limits the memory used by epoll to store "watches", to 1/32 of the amount of the low RAM. As example, a 256MB 32bit machine, will have "max_user_watches" set to roughly 90000. That should be enough to not break existing heavy epoll users. The default value for "max_user_instances" is set to 128, that should be enough too. This also changes the userspace, because a new error code can now come out from EPOLL_CTL_ADD (-ENOSPC). The EMFILE from epoll_create() was already listed, so that should be ok. [akpm@linux-foundation.org: use get_current_user()] Signed-off-by: NDavide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: <stable@kernel.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Reported-by: NVegard Nossum <vegardno@ifi.uio.no> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 12月, 2008 2 次提交
-
-
由 Christoph Hellwig 提交于
All architectures now use the generic compat_sys_ptrace, as should every new architecture that needs 32bit compat (if we'll ever get another). Remove the now superflous __ARCH_WANT_COMPAT_SYS_PTRACE define, and also kill a comment about __ARCH_SYS_PTRACE that was added after __ARCH_SYS_PTRACE was already gone. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-