- 02 10月, 2009 2 次提交
-
-
由 Jun'ichi Nomura 提交于
Since 2.6.31 now has request-based device-mapper, it's useful to have a tracepoint for request-remapping as well as bio-remapping. This patch adds a tracepoint for request-remapping, trace_block_rq_remap(). Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Zdenek Kabelac 提交于
Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs introduced in commit 1d54ad6d. Release kobject also in case the request_fn is NULL. Problem was noticed via kmemleak backtrace when some sysfs entries were note properly destroyed during device removal: unreferenced object 0xffff88001aa76640 (size 80): comm "lvcreate", pid 2120, jiffies 4294885144 hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff .........e...... 90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff .f........S..... backtrace: [<ffffffff813f9cc6>] kmemleak_alloc+0x26/0x60 [<ffffffff8111d693>] kmem_cache_alloc+0x133/0x1c0 [<ffffffff81195891>] sysfs_new_dirent+0x41/0x120 [<ffffffff81194b0c>] sysfs_add_file_mode+0x3c/0xb0 [<ffffffff81197c81>] internal_create_group+0xc1/0x1a0 [<ffffffff81197d93>] sysfs_create_group+0x13/0x20 [<ffffffff810d8004>] blk_trace_init_sysfs+0x14/0x20 [<ffffffff8123f45c>] blk_register_queue+0x3c/0xf0 [<ffffffff812447e4>] add_disk+0x94/0x160 [<ffffffffa00d8b08>] dm_create+0x598/0x6e0 [dm_mod] [<ffffffffa00de951>] dev_create+0x51/0x350 [dm_mod] [<ffffffffa00de823>] ctl_ioctl+0x1a3/0x240 [dm_mod] [<ffffffffa00de8f2>] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod] [<ffffffff81177bfd>] compat_sys_ioctl+0xcd/0x4f0 [<ffffffff81036ed8>] sysenter_dispatch+0x7/0x2c [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: NZdenek Kabelac <zkabelac@redhat.com> Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 28 9月, 2009 1 次提交
-
-
由 Alexey Dobriyan 提交于
* mark struct vm_area_struct::vm_ops as const * mark vm_ops in AGP code But leave TTM code alone, something is fishy there with global vm_ops being used. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 9月, 2009 2 次提交
-
-
由 Martin Schwidefsky 提交于
git commit 75c5158f converted the clocksource spinlock to a mutex. This causes the following BUG: BUG: sleeping function called from invalid context at kernel/mutex.c:280 in_atomic(): 0, irqs_disabled(): 1, pid: 2473, name: pm-suspend 2 locks held by pm-suspend/2473: #0: (&buffer->mutex){......}, at: [<ffffffff8115ab13>] sysfs_write_file+0x3c/0x137 #1: (pm_mutex){......}, at: [<ffffffff810865b5>] enter_state+0x39/0x130 Pid: 2473, comm: pm-suspend Not tainted 2.6.31 #1 Call Trace: [<ffffffff810792f0>] ? __debug_show_held_locks+0x22/0x24 [<ffffffff8104a2ef>] __might_sleep+0x107/0x10b [<ffffffff8141fca9>] mutex_lock_nested+0x25/0x43 [<ffffffff81073537>] clocksource_resume+0x1c/0x60 [<ffffffff81072902>] timekeeping_resume+0x1e/0x1c8 [<ffffffff812aee62>] __sysdev_resume+0x25/0xcf [<ffffffff812aef79>] sysdev_resume+0x6d/0xae [<ffffffff810864f8>] suspend_devices_and_enter+0x12b/0x1af [<ffffffff8108665b>] enter_state+0xdf/0x130 [<ffffffff81085dc3>] state_store+0xb6/0xd3 [<ffffffff81204c73>] kobj_attr_store+0x17/0x19 [<ffffffff8115abd2>] sysfs_write_file+0xfb/0x137 [<ffffffff811057d2>] vfs_write+0xae/0x10b [<ffffffff81208392>] ? __up_read+0x1a/0x7f [<ffffffff811058ef>] sys_write+0x4a/0x6e [<ffffffff81011b82>] system_call_fastpath+0x16/0x1b clocksource_resume is called early in the resume process, there is only one cpu, no processes are running and the interrupts are disabled. It is therefore possible to resume the clocksources without taking the clocksource mutex. Reported-by: NXiaotian Feng <xtfeng@gmail.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Tested-by: NMichal Schmidt <mschmidt@redhat.com> Cc: Xiaotian Feng <xtfeng@gmail.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090924172952.49697825@mschwide.boeblingen.de.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Darren Hart 提交于
The memory barrier semantics of futex_wait_queue_me() are non-obvious. Add some commentary to try and clarify it. Signed-off-by: NDarren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> LKML-Reference: <20090924185447.694.38948.stgit@Aeon> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 24 9月, 2009 35 次提交
-
-
由 Rusty Russell 提交于
The general one handles NULL, the static obsolescent (CONFIG_HAVE_LEGACY_PER_CPU_AREA) one in module.c doesn't; Eric's commit 720eba31 assumed it did, and various frobbings since then kept that assumption. All other callers in module.c all protect it with an if; this effectively does the same as free_init is only goto if we fail percpu_modalloc(). Reported-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Américo Wang <xiyou.wangcong@gmail.com> Tested-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
-
由 Rusty Russell 提交于
Normally the twisty paths of sysfs will free the attributes, but not if we fail before we hook it into sysfs (which is the last thing we do in load_module). (This sysfs code is a turd, no doubt there are other issues lurking too). Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Cc: Catalin Marinas <catalin.marinas@arm.com> Tested-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
-
由 Peter Oberparleiter 提交于
Some boot mechanisms require that kernel parameters are stored in a separate file which is loaded to memory without further processing (e.g. the "Load from FTP" method on s390). When such a file contains newline characters, the kernel parameter preceding the newline might not be correctly parsed (due to the newline being stuck to the end of the actual parameter value) which can lead to boot failures. This patch improves kernel command line usability in such a situation by allowing generic whitespace characters as separators between kernel parameters. Signed-off-by: NPeter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Jan Beulich 提交于
Also remove all parts of the string table (referenced by the symbol table) that are not needed for kallsyms use (i.e. which were only referenced by symbols discarded by the previous patch, or not referenced at all for whatever reason). Signed-off-by: NJan Beulich <jbeulich@novell.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Jan Beulich 提交于
Discard all symbols not interesting for kallsyms use: absolute, section, and in the common case (!KALLSYMS_ALL) data ones. Signed-off-by: NJan Beulich <jbeulich@novell.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Hiroshi Shimamoto 提交于
Because the binfmt is not different between threads in the same process, it can be moved from task_struct to mm_struct. And binfmt moudle is handled per mm_struct instead of task_struct. Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
->ioctx_lock and ->ioctx_list are used only under CONFIG_AIO. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sukadev Bhattiprolu 提交于
CLONE_PARENT was used to implement an older threading model. For consistency with the CLONE_THREAD check in copy_pid_ns(), disable CLONE_PARENT with CLONE_NEWPID, at least until the required semantics of pid namespaces are clear. Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: NRoland McGrath <roland@redhat.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Cc: Oren Laadan <orenl@cs.columbia.edu> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sukadev Bhattiprolu 提交于
When global or container-init processes use CLONE_PARENT, they create a multi-rooted process tree. Besides siblings of global init remain as zombies on exit since they are not reaped by their parent (swapper). So prevent global and container-inits from creating siblings. Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: NEric W. Biederman <ebiederm@xmission.com> Acked-by: NRoland McGrath <roland@redhat.com> Cc: Oren Laadan <orenl@cs.columbia.edu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
It's unused. It isn't needed -- read or write flag is already passed and sysctl shouldn't care about the rest. It _was_ used in two places at arch/frv for some reason. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Roland McGrath 提交于
__fatal_signal_pending inlines to one instruction on x86, probably two instructions on other machines. It takes two longer x86 instructions just to call it and test its return value, not to mention the function itself. On my random x86_64 config, this saved 70 bytes of text (59 of those being __fatal_signal_pending itself). Signed-off-by: NRoland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Introduce do_send_sig_info() and convert group_send_sig_info(), send_sig_info(), do_send_specific() to use this helper. Hopefully it will have more users soon, it allows to specify specific/group behaviour via "bool group" argument. Shaves 80 bytes from .text. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stephane eranian <eranian@googlemail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Neil Horman 提交于
Introduce core pipe limiting sysctl. Since we can dump cores to pipe, rather than directly to the filesystem, we create a condition in which a user can create a very high load on the system simply by running bad applications. If the pipe reader specified in core_pattern is poorly written, we can have lots of ourstandig resources and processes in the system. This sysctl introduces an ability to limit that resource consumption. core_pipe_limit defines how many in-flight dumps may be run in parallel, dumps beyond this value are skipped and a note is made in the kernel log. A special value of 0 in core_pipe_limit denotes unlimited core dumps may be handled (this is the default value). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Reported-by: NEarl Chew <earl_chew@agilent.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Roland McGrath 提交于
This changes tracehook_notify_jctl() so it's called with the siglock held, and changes its argument and return value definition. These clean-ups make it a better fit for what new tracing hooks need to check. Tracing needs the siglock here, held from the time TASK_STOPPED was set, to avoid potential SIGCONT races if it wants to allow any blocking in its tracing hooks. This also folds the finish_stop() function into its caller do_signal_stop(). The function is short, called only once and only unconditionally. It aids readability to fold it in. [oleg@redhat.com: do not call tracehook_notify_jctl() in TASK_STOPPED state] [oleg@redhat.com: introduce tracehook_finish_jctl() helper] Signed-off-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vitaly Mayatskikh 提交于
Current behaviour of sys_waitid() looks odd. If user passes infop == NULL, sys_waitid() returns success. When user additionally specifies flag WNOWAIT, sys_waitid() returns -EFAULT on the same conditions. When user combines WNOWAIT with WCONTINUED, sys_waitid() again returns success. This patch adds check for ->wo_info in wait_noreap_copyout(). User-visible change: starting from this commit, sys_waitid() always checks infop != NULL and does not fail if it is NULL. Signed-off-by: NVitaly Mayatskikh <v.mayatskih@gmail.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vitaly Mayatskikh 提交于
do_wait() checks ->wo_info to figure out who is the caller. If it's not NULL the caller should be sys_waitid(), in that case do_wait() fixes up the retval or zeros ->wo_info, depending on retval from underlying function. This is bug: user can pass ->wo_info == NULL and sys_waitid() will return incorrect value. man 2 waitid says: waitid(): returns 0 on success Test-case: int main(void) { if (fork()) assert(waitid(P_ALL, 0, NULL, WEXITED) == 0); return 0; } Result: Assertion `waitid(P_ALL, 0, ((void *)0), 4) == 0' failed. Move that code to sys_waitid(). User-visible change: sys_waitid() will return 0 on success, either infop is set or not. Note, there's another bug in wait_noreap_copyout() which affects return value of sys_waitid(). It will be fixed in next patch. Signed-off-by: NVitaly Mayatskikh <v.mayatskih@gmail.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Kill the unused "parent" argument in wait_consider_task(), it was never used. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ratan Nalumasu <rnalumasu@gmail.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
task_pid_type() is only used by eligible_pid() which has to check wo_type != PIDTYPE_MAX anyway. Remove this check from task_pid_type() and factor out ->pids[type] access, this shrinks .text a bit and simplifies the code. The matches the behaviour of other similar helpers, say get_task_pid(). The caller must ensure that pid_type is valid, not the callee. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
child_wait_callback()->eligible_child() is not right, we can miss the wakeup if the task was detached before __wake_up_parent() and the caller of do_wait() didn't use __WALL. Move ->wo_pid checks from eligible_child() to the new helper, eligible_pid(), and change child_wait_callback() to use it instead of eligible_child(). Note: actually I think it would be better to fix the __WCLONE check in eligible_child(), it doesn't look exactly right. But it is not clear what is the supposed behaviour, and any change is user-visible. Reported-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Suggested by Roland. do_wait(__WNOTHREAD) can only succeed if the caller is either ptracer, or it is ->real_parent and the child is not traced. IOW, caller == p->parent otherwise we should not wake up. Change child_wait_callback() to check this. Ratan reports the workload with CPU load >99% caused by unnecessary wakeups, should be fixed by this patch. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NRoland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ratan Nalumasu <rnalumasu@gmail.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Ratan Nalumasu reported that in a process with many threads doing unnecessary wakeups. Every waiting thread in the process wakes up to loop through the children and see that the only ones it cares about are still not ready. Now that we have struct wait_opts we can change do_wait/__wake_up_parent to use filtered wakeups. We can make child_wait_callback() more clever later, right now it only checks eligible_child(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NRoland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ratan Nalumasu <rnalumasu@gmail.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Tested-by: NValdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
do_wait() wakeup optimization: shift security_task_wait() from eligible_child() to wait_consider_task() Preparation, no functional changes. eligible_child() has a single caller, wait_consider_task(). We can move security_task_wait() out from eligible_child(), this allows us to use it for filtered wake_up(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NRoland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ratan Nalumasu <rnalumasu@gmail.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
The bug is old, it wasn't cause by recent changes. Test case: static void *tfunc(void *arg) { int pid = (long)arg; assert(ptrace(PTRACE_ATTACH, pid, NULL, NULL) == 0); kill(pid, SIGKILL); sleep(1); return NULL; } int main(void) { pthread_t th; long pid = fork(); if (!pid) pause(); signal(SIGCHLD, SIG_IGN); assert(pthread_create(&th, NULL, tfunc, (void*)pid) == 0); int r = waitpid(-1, NULL, __WNOTHREAD); printf("waitpid: %d %m\n", r); return 0; } Before the patch this program hangs, after this patch waitpid() correctly fails with errno == -ECHILD. The problem is, __ptrace_detach() reaps the EXIT_ZOMBIE tracee if its ->real_parent is our sub-thread and we ignore SIGCHLD. But in this case we should wake up other threads which can sleep in do_wait(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Balbir Singh 提交于
Organize cgroups over soft limit in a RB-Tree Introduce an RB-Tree for storing memory cgroups that are over their soft limit. The overall goal is to 1. Add a memory cgroup to the RB-Tree when the soft limit is exceeded. We are careful about updates, updates take place only after a particular time interval has passed 2. We remove the node from the RB-Tree when the usage goes below the soft limit The next set of patches will exploit the RB-Tree to get the group that is over its soft limit by the largest amount and reclaim from it, when we face memory contention. [hugh.dickins@tiscali.co.uk: CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_PREEMPT=y fails to boot] Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Balbir Singh 提交于
Add an interface to allow get/set of soft limits. Soft limits for memory plus swap controller (memsw) is currently not supported. Resource counters have been enhanced to support soft limits and new type RES_SOFT_LIMIT has been added. Unlike hard limits, soft limits can be directly set and do not need any reclaim or checks before setting them to a newer value. Kamezawa-San raised a question as to whether soft limit should belong to res_counter. Since all resources understand the basic concepts of hard and soft limits, it is justified to add soft limits here. Soft limits are a generic resource usage feature, even file system quotas support soft limits. Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ben Blum 提交于
Alter the ss->can_attach and ss->attach functions to be able to deal with a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a pre-patch to cgroup-procs-writable.patch.) Currently, new mode of the attach function can only tell the subsystem about the old cgroup of the threadgroup leader. No subsystem currently needs that information for each thread that's being moved, but if one were to be added (for example, one that counts tasks within a group) this bit would need to be reworked a bit to tell the subsystem the right information. [hidave.darkstar@gmail.com: fix build] Signed-off-by: NBen Blum <bblum@google.com> Signed-off-by: NPaul Menage <menage@google.com> Acked-by: NLi Zefan <lizf@cn.fujitsu.com> Reviewed-by: NMatt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ben Blum 提交于
Changes css_set freeing mechanism to be under RCU This is a prepatch for making the procs file writable. In order to free the old css_sets for each task to be moved as they're being moved, the freeing mechanism must be RCU-protected, or else we would have to have a call to synchronize_rcu() for each task before freeing its old css_set. Signed-off-by: NBen Blum <bblum@google.com> Signed-off-by: NPaul Menage <menage@google.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Acked-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ben Blum 提交于
Separates all pidlist allocation requests to a separate function that judges based on the requested size whether or not the array needs to be vmalloced or can be gotten via kmalloc, and similar for kfree/vfree. Signed-off-by: NBen Blum <bblum@google.com> Signed-off-by: NPaul Menage <menage@google.com> Acked-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ben Blum 提交于
Previously there was the problem in which two processes from different pid namespaces reading the tasks or procs file could result in one process seeing results from the other's namespace. Rather than one pidlist for each file in a cgroup, we now keep a list of pidlists keyed by namespace and file type (tasks versus procs) in which entries are placed on demand. Each pidlist has its own lock, and that the pidlists themselves are passed around in the seq_file's private pointer means we don't have to touch the cgroup or its master list except when creating and destroying entries. Signed-off-by: NBen Blum <bblum@google.com> Signed-off-by: NPaul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ben Blum 提交于
struct cgroup used to have a bunch of fields for keeping track of the pidlist for the tasks file. Those are now separated into a new struct cgroup_pidlist, of which two are had, one for procs and one for tasks. The way the seq_file operations are set up is changed so that just the pidlist struct gets passed around as the private data. Interface example: Suppose a multithreaded process has pid 1000 and other threads with ids 1001, 1002, 1003: $ cat tasks 1000 1001 1002 1003 $ cat cgroup.procs 1000 $ Signed-off-by: NBen Blum <bblum@google.com> Signed-off-by: NPaul Menage <menage@google.com> Acked-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Menage 提交于
The following series adds a "cgroup.procs" file to each cgroup that reports unique tgids rather than pids, and allows all threads in a threadgroup to be atomically moved to a new cgroup. The subsystem "attach" interface is modified to support attaching whole threadgroups at a time, which could introduce potential problems if any subsystem were to need to access the old cgroup of every thread being moved. The attach interface may need to be revised if this becomes the case. Also added is functionality for read/write locking all CLONE_THREAD fork()ing within a threadgroup, by means of an rwsem that lives in the sighand_struct, for per-threadgroup-ness and also for sharing a cacheline with the sighand's atomic count. This scheme should introduce no extra overhead in the fork path when there's no contention. The final patch reveals potential for a race when forking before a subsystem's attach function is called - one potential solution in case any subsystem has this problem is to hang on to the group's fork mutex through the attach() calls, though no subsystem yet demonstrates need for an extended critical section. This patch: Revert commit 096b7fe0 Author: Li Zefan <lizf@cn.fujitsu.com> AuthorDate: Wed Jul 29 15:04:04 2009 -0700 Commit: Linus Torvalds <torvalds@linux-foundation.org> CommitDate: Wed Jul 29 19:10:35 2009 -0700 cgroups: fix pid namespace bug This is in preparation for some clashing cgroups changes that subsume the original commit's functionaliy. The original commit fixed a pid namespace bug which Ben Blum fixed independently (in the same way, but with different code) as part of a series of patches. I played around with trying to reconcile Ben's patch series with Li's patch, but concluded that it was simpler to just revert Li's, given that Ben's patch series contained essentially the same fix. Signed-off-by: NPaul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Menage 提交于
This patch removes the restriction that a cgroup hierarchy must have at least one bound subsystem. The mount option "none" is treated as an explicit request for no bound subsystems. A hierarchy with no subsystems can be useful for plain task tracking, and is also a step towards the support for multiply-bindable subsystems. As part of this change, the hierarchy id is no longer calculated from the bitmask of subsystems in the hierarchy (since this is not guaranteed to be unique) but is allocated via an ida. Reference counts on cgroups from css_set objects are now taken explicitly one per hierarchy, rather than one per subsystem. Example usage: mount -t cgroup -o none,name=foo cgroup /mnt/cgroup Based on the "no-op"/"none" subsystem concept proposed by kamezawa.hiroyu@jp.fujitsu.com Signed-off-by: NPaul Menage <menage@google.com> Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Menage 提交于
Currently the cgroups code makes the assumption that the subsystem pointers in a struct css_set uniquely identify the hierarchy->cgroup mappings associated with the css_set; and there's no way to directly identify the associated set of cgroups other than by indirecting through the appropriate subsystem state pointers. This patch removes the need for that assumption by adding a back-pointer from struct cg_cgroup_link object to its associated cgroup; this allows the set of cgroups to be determined by traversing the cg_links list in the struct css_set. Signed-off-by: NPaul Menage <menage@google.com> Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Menage 提交于
While it's architecturally clean to have the cgroup debug subsystem be completely independent of the cgroups framework, it limits its usefulness for debugging the contents of internal data structures. Move the debug subsystem code into the scope of all the cgroups data structures to make more detailed debugging possible. Signed-off-by: NPaul Menage <menage@google.com> Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Menage 提交于
To simplify referring to cgroup hierarchies in mount statements, and to allow disambiguation in the presence of empty hierarchies and multiply-bindable subsystems this patch adds support for naming a new cgroup hierarchy via the "name=" mount option A pre-existing hierarchy may be specified by either name or by subsystems; a hierarchy's name cannot be changed by a remount operation. Example usage: # To create a hierarchy called "foo" containing the "cpu" subsystem mount -t cgroup -oname=foo,cpu cgroup /mnt/cgroup1 # To mount the "foo" hierarchy on a second location mount -t cgroup -oname=foo cgroup /mnt/cgroup2 Signed-off-by: NPaul Menage <menage@google.com> Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-