- 10 11月, 2007 2 次提交
-
-
由 Eric Dumazet 提交于
1) hardcoded 1000000000 value is used five times in places where NSEC_PER_SEC might be more readable. 2) A conversion from nsec to msec uses the hardcoded 1000000 value, which is a candidate for NSEC_PER_MSEC. no code changed: text data bss dec hex filename 44359 3326 36 47721 ba69 sched.o.before 44359 3326 36 47721 ba69 sched.o.after Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
we lost the sched_min_granularity tunable to a clever optimization that uses the sched_latency/min_granularity ratio - but the ratio is quite unintuitive to users and can also crash the kernel if the ratio is set to 0. So reintroduce the min_granularity tunable, while keeping the ratio maintained internally. no functionality changed. [ mingo@elte.hu: some fixlets. ] Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 20 10月, 2007 2 次提交
-
-
由 Pavel Emelyanov 提交于
This is the largest patch in the set. Make all (I hope) the places where the pid is shown to or get from user operate on the virtual pids. The idea is: - all in-kernel data structures must store either struct pid itself or the pid's global nr, obtained with pid_nr() call; - when seeking the task from kernel code with the stored id one should use find_task_by_pid() call that works with global pids; - when showing pid's numerical value to the user the virtual one should be used, but however when one shows task's pid outside this task's namespace the global one is to be used; - when getting the pid from userspace one need to consider this as the virtual one and use appropriate task/pid-searching functions. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: yet nuther build fix] [akpm@linux-foundation.org: remove unneeded casts] Signed-off-by: NPavel Emelyanov <xemul@openvz.org> Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Serge E. Hallyn 提交于
is_init() is an ambiguous name for the pid==1 check. Split it into is_global_init() and is_container_init(). A cgroup init has it's tsk->pid == 1. A global init also has it's tsk->pid == 1 and it's active pid namespace is the init_pid_ns. But rather than check the active pid namespace, compare the task structure with 'init_pid_ns.child_reaper', which is initialized during boot to the /sbin/init process and never changes. Changelog: 2.6.22-rc4-mm2-pidns1: - Use 'init_pid_ns.child_reaper' to determine if a given task is the global init (/sbin/init) process. This would improve performance and remove dependence on the task_pid(). 2.6.21-mm2-pidns2: - [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc, ppc,avr32}/traps.c for the _exception() call to is_global_init(). This way, we kill only the cgroup if the cgroup's init has a bug rather than force a kernel panic. [akpm@linux-foundation.org: fix comment] [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c] [bunk@stusta.de: kernel/pid.c: remove unused exports] [sukadev@us.ibm.com: Fix capability.c to work with threaded init] Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com> Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: NPavel Emelianov <xemul@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Herbert Poetzel <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 10月, 2007 9 次提交
-
-
由 Andrew Morgan 提交于
The non-filesystem capability meaning of CAP_SETPCAP is that a process, p1, can change the capabilities of another process, p2. This is not the meaning that was intended for this capability at all, and this implementation came about purely because, without filesystem capabilities, there was no way to use capabilities without one process bestowing them on another. Since we now have a filesystem support for capabilities we can fix the implementation of CAP_SETPCAP. The most significant thing about this change is that, with it in effect, no process can set the capabilities of another process. The capabilities of a program are set via the capability convolution rules: pI(post-exec) = pI(pre-exec) pP(post-exec) = (X(aka cap_bset) & fP) | (pI(post-exec) & fI) pE(post-exec) = fE ? pP(post-exec) : 0 at exec() time. As such, the only influence the pre-exec() program can have on the post-exec() program's capabilities are through the pI capability set. The correct implementation for CAP_SETPCAP (and that enabled by this patch) is that it can be used to add extra pI capabilities to the current process - to be picked up by subsequent exec()s when the above convolution rules are applied. Here is how it works: Let's say we have a process, p. It has capability sets, pE, pP and pI. Generally, p, can change the value of its own pI to pI' where (pI' & ~pI) & ~pP = 0. That is, the only new things in pI' that were not present in pI need to be present in pP. The role of CAP_SETPCAP is basically to permit changes to pI beyond the above: if (pE & CAP_SETPCAP) { pI' = anything; /* ie., even (pI' & ~pI) & ~pP != 0 */ } This capability is useful for things like login, which (say, via pam_cap) might want to raise certain inheritable capabilities for use by the children of the logged-in user's shell, but those capabilities are not useful to or needed by the login program itself. One such use might be to limit who can run ping. You set the capabilities of the 'ping' program to be "= cap_net_raw+i", and then only shells that have (pI & CAP_NET_RAW) will be able to run it. Without CAP_SETPCAP implemented as described above, login(pam_cap) would have to also have (pP & CAP_NET_RAW) in order to raise this capability and pass it on through the inheritable set. Signed-off-by: NAndrew Morgan <morgan@kernel.org> Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric W. Biederman 提交于
After adding checking to register_sysctl_table and finding a whole new set of bugs. Missed by countless code reviews and testers I have finally lost patience with the binary sysctl interface. The binary sysctl interface has been sort of deprecated for years and finding a user space program that uses the syscall is more difficult then finding a needle in a haystack. Problems continue to crop up, with the in kernel implementation. So since supporting something that no one uses is silly, deprecate sys_sysctl with a sufficient grace period and notice that the handful of user space applications that care can be fixed or replaced. The /proc/sys sysctl interface that people use will continue to be supported indefinitely. This patch moves the tested warning about sysctls from the path where sys_sysctl to a separate path called from both implementations of sys_sysctl, and it adds a proper entry into Documentation/feature-removal-schedule. Allowing us to revisit this in a couple years time and actually kill sys_sysctl. [lethal@linux-sh.org: sysctl: Fix syscall disabled build] Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NPaul Mundt <lethal@linux-sh.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric W. Biederman 提交于
After going through the kernels sysctl tables several times it has become clear that code review and testing is just not effective in prevent problematic sysctl tables from being used in the stable kernel. I certainly can't seem to fix the problems as fast as they are introduced. Therefore this patch adds sysctl_check_table which is called when a sysctl table is registered and checks to see if we have a problematic sysctl table. The biggest part of the code is the table of valid binary sysctl entries, but since we have frozen our set of binary sysctls this table should not need to change, and it makes it much easier to detect when someone unintentionally adds a new binary sysctl value. As best as I can determine all of the several hundred errors spewed on boot up now are legitimate. [bunk@kernel.org: kernel/sysctl_check.c must #include <linux/string.h>] Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: NAdrian Bunk <bunk@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric W. Biederman 提交于
It looks like we inadvertently killed the cad_pid binary sysctl support when cap_pid was changed to be a struct pid. Since no one has complained just remove the binary path. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric W. Biederman 提交于
Instead of having a bunch of ifdefs in sysctl.c move all of the pty sysctl logic into drivers/char/pty.c As well as cleaning up the logic this prevents sysctl_check_table from complaining that the root table has a NULL data pointer on something with generic methods. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric W. Biederman 提交于
aio-nr, aio-max-nr, acpi_video_flags are unsigned long values which sysctl does not handle properly with a 64bit kernel and a 32bit user space. Since no one is likely to be using the binary sysctl values and the ascii interface still works, this patch just removes support for the binary sysctl interface from the kernel. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric W. Biederman 提交于
These functions are all wrapper functions for the proc interface that are needed for them to work correctly. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: NAndrew Morgan <morgan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric W. Biederman 提交于
There as been no easy way to wrap the default sysctl strategy routine except for returning 0. Which is not always what we want. The few instances I have seen that want different behaviour have written their own version of sysctl_data. While not too hard it is unnecessary code and has the potential for extra bugs. So to make these situations easier and make that part of sysctl more symetric I have factord sysctl_data out of do_sysctl_strategy and exported as a function everyone can use. Further having sysctl_data be an explicit function makes checking for badly formed sysctl tables much easier. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric W. Biederman 提交于
In sysctl.h the typedef struct ctl_table ctl_table violates coding style isn't needed and is a bit of a nuisance because it makes it harder to recognize ctl_table is a type name. So this patch removes it from the generic sysctl code. Hopefully I will have enough energy to send the rest of my patches will follow and to remove it from the rest of the kernel. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 10月, 2007 4 次提交
-
-
由 Ravikiran G Thirumalai 提交于
Control the trigger limit for softlockup warnings. This is useful for debugging softlockups, by lowering the softlockup_thresh to identify possible softlockups earlier. This patch: 1. Adds a sysctl softlockup_thresh with valid values of 1-60s (Higher value to disable false positives) 2. Changes the softlockup printk to print the cpu softlockup time [akpm@linux-foundation.org: Fix various warnings and add definition of "two"] Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org> Signed-off-by: NShai Fultheim <shai@scalex86.org> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
Adds a new sysctl, 'oom_kill_allocating_task', which will automatically kill the OOM-triggering task instead of scanning through the tasklist to find a memory-hogging target. This is helpful for systems with an insanely large number of tasks where scanning the tasklist significantly degrades performance. Cc: Andrea Arcangeli <andrea@suse.de> Acked-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Scale writeback cache per backing device, proportional to its writeout speed. By decoupling the BDI dirty thresholds a number of problems we currently have will go away, namely: - mutual interference starvation (for any number of BDIs); - deadlocks with stacked BDIs (loop, FUSE and local NFS mounts). It might be that all dirty pages are for a single BDI while other BDIs are idling. By giving each BDI a 'fair' share of the dirty limit, each one can have dirty pages outstanding and make progress. A global threshold also creates a deadlock for stacked BDIs; when A writes to B, and A generates enough dirty pages to get throttled, B will never start writeback until the dirty pages go away. Again, by giving each BDI its own 'independent' dirty limit, this problem is avoided. So the problem is to determine how to distribute the total dirty limit across the BDIs fairly and efficiently. A DBI that has a large dirty limit but does not have any dirty pages outstanding is a waste. What is done is to keep a floating proportion between the DBIs based on writeback completions. This way faster/more active devices get a larger share than slower/idle devices. [akpm@linux-foundation.org: fix warnings] [hugh@veritas.com: Fix occasional hang when a task couldn't get out of balance_dirty_pages] Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Adam Litke 提交于
The maximum size of the huge page pool can be controlled using the overall size of the hugetlb filesystem (via its 'size' mount option). However in the common case the this will not be set as the pool is traditionally fixed in size at boot time. In order to maintain the expected semantics, we need to prevent the pool expanding by default. This patch introduces a new sysctl controlling dynamic pool resizing. When this is enabled the pool will expand beyond its base size up to the size of the hugetlb filesystem. It is disabled by default. Signed-off-by: NAdam Litke <agl@us.ibm.com> Acked-by: NAndy Whitcroft <apw@shadowen.org> Acked-by: NDave McCracken <dave.mccracken@oracle.com> Cc: William Irwin <bill.irwin@oracle.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Ken Chen <kenchen@google.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 10月, 2007 4 次提交
-
-
由 Ingo Molnar 提交于
reintroduce a simplified version of cache-hot/cold scheduling affinity. This improves performance with certain SMP workloads, such as sysbench. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
speed up and simplify vslice calculations. [ From: Mike Galbraith <efault@gmx.de>: build fix ] Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Ingo Molnar 提交于
remove the wait_runtime-limit fields and the code depending on it, now that the math has been changed over to rely on the vruntime metric. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NMike Galbraith <efault@gmx.de> Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Ingo Molnar 提交于
remove the stat_gran code - it was disabled by default and it causes unnecessary overhead. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NMike Galbraith <efault@gmx.de> Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
-
- 14 10月, 2007 1 次提交
-
-
由 Al Viro 提交于
a) include/asm-um/arch can't just point to include/asm-$(SUBARCH) now b) arch/{i386,x86_64}/crypto are merged now c) subarch-obj needed changes d) cpufeature_64.h should pull "cpufeature_32.h", not <asm/cpufeature_32.h> since it can be included from asm-um/cpufeature.h e) in case of uml-i386 we need CONFIG_X86_32 for make and gcc, but not for Kconfig f) sysctl.c shouldn't do vdso_enabled for uml-i386 (actually, that one should be registered from corresponding arch/*/kernel/*, with ifdef going away; that's a separate patch, though). With that and with Stephen's patch ("[PATCH net-2.6] uml: hard_header fix") we have uml allmodconfig building both on i386 and amd64. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 10月, 2007 1 次提交
-
-
由 Olof Johansson 提交于
Implement show_unhandled_signals sysctl + support to print when a process is killed due to unhandled signals just as i386 and x86_64 does. Default to having it off, unlike x86 that defaults on. Signed-off-by: NOlof Johansson <olof@lixom.net> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 20 9月, 2007 1 次提交
-
-
由 Ingo Molnar 提交于
add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield() more agressive, by moving the yielding task to the last position in the rbtree. with sched_compat_yield=0: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2539 mingo 20 0 1576 252 204 R 50 0.0 0:02.03 loop_yield 2541 mingo 20 0 1576 244 196 R 50 0.0 0:02.05 loop with sched_compat_yield=1: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2584 mingo 20 0 1576 248 196 R 99 0.0 0:52.45 loop 2582 mingo 20 0 1576 256 204 R 0 0.0 0:00.00 loop_yield Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
-
- 26 8月, 2007 3 次提交
-
-
由 Ingo Molnar 提交于
due to adaptive granularity scheduling the role of sched_granularity has changed to "minimum granularity", so rename the variable (and the tunable) accordingly. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
-
由 Peter Zijlstra 提交于
Instead of specifying the preemption granularity, specify the wanted latency. By fixing the granlarity to a constany the wakeup latency it a function of the number of running tasks on the rq. Invert this relation. sysctl_sched_granularity becomes a minimum for the dynamic granularity computed from the new sysctl_sched_latency. Then use this latency to do more intelligent granularity decisions: if there are fewer tasks running then we can schedule coarser. This helps performance while still always keeping the latency target. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Make the lockdep sysctls not depend on CONFIG_SCHED_DEBUG. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 20 8月, 2007 1 次提交
-
-
由 Christian Heim 提交于
Remove the second inclusion of linux/capability.h, which has been introduced with "[PATCH] move capable() to capability.h" (commit c59ede7b) Signed-off-by: NChristian Heim <phreak@gentoo.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 8月, 2007 1 次提交
-
-
由 Lee Schermerhorn 提交于
Misplaced #endif is hiding the numa_zonelist_order sysctl when !SECURITY. Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 7月, 2007 1 次提交
-
-
由 Len Brown 提交于
Restore the 2.6.22 CONFIG_ACPI_SLEEP build option, but now shadowing the new CONFIG_PM_SLEEP option. Signed-off-by: NLen Brown <len.brown@intel.com> [ Modified to work with the PM config setup changes. ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 7月, 2007 1 次提交
-
-
由 Len Brown 提交于
As it was a synonym for (CONFIG_ACPI && CONFIG_X86), the ifdefs for it were more clutter than they were worth. For ia64, just add a few stubs in anticipation of future S3 or S4 support. Signed-off-by: NLen Brown <len.brown@intel.com>
-
- 23 7月, 2007 1 次提交
-
-
This patch makes the i386 behave the same way that x86_64 does when a segfault happens. A line gets printed to the kernel log so that tools that need to check for failures can behave more uniformly between debug.show_unhandled_signals sysctl variable to 0 (or by doing echo 0 > /proc/sys/debug/exception-trace) Also, all of the lines being printed are now using printk_ratelimit() to deny the ability of DoS from a local user with a program like the following: main() { while (1) if (!fork()) *(int *)0 = 0; } This new revision also includes the fix that Andrew did which got rid of new sysctl that was added to the system in earlier versions of this. Also, 'show-unhandled-signals' sysctl has been renamed back to the old 'exception-trace' to avoid breakage of people's scripts. AK: Enabling by default for i386 will be likely controversal, but let's see what happens AK: Really folks, before complaining just fix your segfaults AK: I bet this will find a lot of silent issues Signed-off-by: NMasoud Sharbiani <masouds@google.com> Signed-off-by: NAndi Kleen <ak@suse.de> [ Personally, I've found the complaints useful on x86-64, so I'm all for this. That said, I wonder if we could do it more prettily.. -Linus ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 7月, 2007 5 次提交
-
-
由 Andrew Morton 提交于
I've been chasing these comments around this file all week. Hopefully we're straight now. Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Introduce the core lock statistics code. Lock statistics provides lock wait-time and hold-time (as well as the count of corresponding contention and acquisitions events). Also, the first few call-sites that encounter contention are tracked. Lock wait-time is the time spent waiting on the lock. This provides insight into the locking scheme, that is, a heavily contended lock is indicative of a too coarse locking scheme. Lock hold-time is the duration the lock was held, this provides a reference for the wait-time numbers, so they can be put into perspective. 1) lock 2) ... do stuff .. unlock 3) The time between 1 and 2 is the wait-time. The time between 2 and 3 is the hold-time. The lockdep held-lock tracking code is reused, because it already collects locks into meaningful groups (classes), and because it is an existing infrastructure for lock instrumentation. Currently lockdep tracks lock acquisition with two hooks: lock() lock_acquire() _lock() ... code protected by lock ... unlock() lock_release() _unlock() We need to extend this with two more hooks, in order to measure contention. lock_contended() - used to measure contention events lock_acquired() - completion of the contention These are then placed the following way: lock() lock_acquire() if (!_try_lock()) lock_contended() _lock() lock_acquired() ... do locked stuff ... unlock() lock_release() _unlock() (Note: the try_lock() 'trick' is used to avoid instrumenting all platform dependent lock primitive implementations.) It is also possible to toggle the two lockdep features at runtime using: /proc/sys/kernel/prove_locking /proc/sys/kernel/lock_stat (esp. turning off the O(n^2) prove_locking functionaliy can help) [akpm@linux-foundation.org: build fixes] [akpm@linux-foundation.org: nuke unneeded ifdefs] Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NIngo Molnar <mingo@elte.hu> Acked-by: NJason Baron <jbaron@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kawai, Hidehiro 提交于
This patch series is version 5 of the core dump masking feature, which controls which VMAs should be dumped based on their memory types and per-process flags. I adopted most of Andrew's suggestion at the previous version. He also suggested using system call instead of /proc/<pid>/ interface, I decided to use the latter continuously because adding new system call with pid argument will give a big impact on the kernel. You can access the per-process flags via /proc/<pid>/coredump_filter interface. coredump_filter represents a bitmask of memory types, and if a bit is set, VMAs of corresponding memory type are written into a core file when the process is dumped. The bitmask is inherited from the parent process when a process is created. The original purpose is to avoid longtime system slowdown when a number of processes which share a huge shared memory are dumped at the same time. To achieve this purpose, this patch series adds an ability to suppress dumping anonymous shared memory for specified processes. In this version, three other memory types are also supported. Here are the coredump_filter bits: bit 0: anonymous private memory bit 1: anonymous shared memory bit 2: file-backed private memory bit 3: file-backed shared memory The default value of coredump_filter is 0x3. This means the new core dump routine has the same behavior as conventional behavior by default. In this version, coredump_filter bits and mm.dumpable are merged into mm.flags, and it is accessed by atomic bitops. The supported core file formats are ELF and ELF-FDPIC. ELF has been tested, but ELF-FDPIC has not been built and tested because I don't have the test environment. This patch limits a value of suid_dumpable sysctl to the range of 0 to 2. Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
The purpose of audit_bprm() is to log the argv array to a userspace daemon at the end of the execve system call. Since user-space hasn't had time to run, this array is still in pristine state on the process' stack; so no need to copy it, we can just grab it from there. In order to minimize the damage to audit_log_*() copy each string into a temporary kernel buffer first. Currently the audit code requires that the full argument vector fits in a single packet. So currently it does clip the argv size to a (sysctl) limit, but only when execve auditing is enabled. If the audit protocol gets extended to allow for multiple packets this check can be removed. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NOllie Wild <aaw@google.com> Cc: <linux-audit@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pavel Machek 提交于
Move "debug during resume from s2ram" into the variable we already use for real-mode flags to simplify code. It also closes nasty trap for the user in acpi_sleep_setup; order of parameters actually mattered there, acpi_sleep=s3_bios,s3_mode doing something different from acpi_sleep=s3_mode,s3_bios. Signed-off-by: NPavel Machek <pavel@suse.cz> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 7月, 2007 3 次提交
-
-
由 Jeremy Fitzhardinge 提交于
Various pieces of code around the kernel want to be able to trigger an orderly poweroff. This pulls them together into a single implementation. By default the poweroff command is /sbin/poweroff, but it can be set via sysctl: kernel/poweroff_cmd. This is split at whitespace, so it can include command-line arguments. This patch replaces four other instances of invoking either "poweroff" or "shutdown -h now": two sbus drivers, and acpi thermal management. sparc64 has its own "powerd"; still need to determine whether it should be replaced by orderly_poweroff(). Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com> Acked-by: NLen Brown <lenb@kernel.org> Signed-off-by: NChris Wright <chrisw@sous-sol.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Andi Kleen <ak@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David S. Miller <davem@davemloft.net>
-
由 Adrian Bunk 提交于
Add a proper prototype for proc_nr_files() in include/linux/fs.h Signed-off-by: NAdrian Bunk <bunk@stusta.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Huge pages are not movable so are not allocated from ZONE_MOVABLE. However, as ZONE_MOVABLE will always have pages that can be migrated or reclaimed, it can be used to satisfy hugepage allocations even when the system has been running a long time. This allows an administrator to resize the hugepage pool at runtime depending on the size of ZONE_MOVABLE. This patch adds a new sysctl called hugepages_treat_as_movable. When a non-zero value is written to it, future allocations for the huge page pool will use ZONE_MOVABLE. Despite huge pages being non-movable, we do not introduce additional external fragmentation of note as huge pages are always the largest contiguous block we care about. [akpm@linux-foundation.org: various fixes] Signed-off-by: NMel Gorman <mel@csn.ul.ie> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-