- 24 9月, 2009 40 次提交
-
-
由 Hiroshi Shimamoto 提交于
Because the binfmt is not different between threads in the same process, it can be moved from task_struct to mm_struct. And binfmt moudle is handled per mm_struct instead of task_struct. Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
->ioctx_lock and ->ioctx_list are used only under CONFIG_AIO. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sukadev Bhattiprolu 提交于
CLONE_PARENT was used to implement an older threading model. For consistency with the CLONE_THREAD check in copy_pid_ns(), disable CLONE_PARENT with CLONE_NEWPID, at least until the required semantics of pid namespaces are clear. Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: NRoland McGrath <roland@redhat.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Cc: Oren Laadan <orenl@cs.columbia.edu> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sukadev Bhattiprolu 提交于
When global or container-init processes use CLONE_PARENT, they create a multi-rooted process tree. Besides siblings of global init remain as zombies on exit since they are not reaped by their parent (swapper). So prevent global and container-inits from creating siblings. Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: NEric W. Biederman <ebiederm@xmission.com> Acked-by: NRoland McGrath <roland@redhat.com> Cc: Oren Laadan <orenl@cs.columbia.edu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
It's unused. It isn't needed -- read or write flag is already passed and sysctl shouldn't care about the rest. It _was_ used in two places at arch/frv for some reason. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Roland McGrath 提交于
__fatal_signal_pending inlines to one instruction on x86, probably two instructions on other machines. It takes two longer x86 instructions just to call it and test its return value, not to mention the function itself. On my random x86_64 config, this saved 70 bytes of text (59 of those being __fatal_signal_pending itself). Signed-off-by: NRoland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Introduce do_send_sig_info() and convert group_send_sig_info(), send_sig_info(), do_send_specific() to use this helper. Hopefully it will have more users soon, it allows to specify specific/group behaviour via "bool group" argument. Shaves 80 bytes from .text. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stephane eranian <eranian@googlemail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Neil Horman 提交于
Introduce core pipe limiting sysctl. Since we can dump cores to pipe, rather than directly to the filesystem, we create a condition in which a user can create a very high load on the system simply by running bad applications. If the pipe reader specified in core_pattern is poorly written, we can have lots of ourstandig resources and processes in the system. This sysctl introduces an ability to limit that resource consumption. core_pipe_limit defines how many in-flight dumps may be run in parallel, dumps beyond this value are skipped and a note is made in the kernel log. A special value of 0 in core_pipe_limit denotes unlimited core dumps may be handled (this is the default value). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Reported-by: NEarl Chew <earl_chew@agilent.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Roland McGrath 提交于
This changes tracehook_notify_jctl() so it's called with the siglock held, and changes its argument and return value definition. These clean-ups make it a better fit for what new tracing hooks need to check. Tracing needs the siglock here, held from the time TASK_STOPPED was set, to avoid potential SIGCONT races if it wants to allow any blocking in its tracing hooks. This also folds the finish_stop() function into its caller do_signal_stop(). The function is short, called only once and only unconditionally. It aids readability to fold it in. [oleg@redhat.com: do not call tracehook_notify_jctl() in TASK_STOPPED state] [oleg@redhat.com: introduce tracehook_finish_jctl() helper] Signed-off-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vitaly Mayatskikh 提交于
Current behaviour of sys_waitid() looks odd. If user passes infop == NULL, sys_waitid() returns success. When user additionally specifies flag WNOWAIT, sys_waitid() returns -EFAULT on the same conditions. When user combines WNOWAIT with WCONTINUED, sys_waitid() again returns success. This patch adds check for ->wo_info in wait_noreap_copyout(). User-visible change: starting from this commit, sys_waitid() always checks infop != NULL and does not fail if it is NULL. Signed-off-by: NVitaly Mayatskikh <v.mayatskih@gmail.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vitaly Mayatskikh 提交于
do_wait() checks ->wo_info to figure out who is the caller. If it's not NULL the caller should be sys_waitid(), in that case do_wait() fixes up the retval or zeros ->wo_info, depending on retval from underlying function. This is bug: user can pass ->wo_info == NULL and sys_waitid() will return incorrect value. man 2 waitid says: waitid(): returns 0 on success Test-case: int main(void) { if (fork()) assert(waitid(P_ALL, 0, NULL, WEXITED) == 0); return 0; } Result: Assertion `waitid(P_ALL, 0, ((void *)0), 4) == 0' failed. Move that code to sys_waitid(). User-visible change: sys_waitid() will return 0 on success, either infop is set or not. Note, there's another bug in wait_noreap_copyout() which affects return value of sys_waitid(). It will be fixed in next patch. Signed-off-by: NVitaly Mayatskikh <v.mayatskih@gmail.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Kill the unused "parent" argument in wait_consider_task(), it was never used. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ratan Nalumasu <rnalumasu@gmail.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
task_pid_type() is only used by eligible_pid() which has to check wo_type != PIDTYPE_MAX anyway. Remove this check from task_pid_type() and factor out ->pids[type] access, this shrinks .text a bit and simplifies the code. The matches the behaviour of other similar helpers, say get_task_pid(). The caller must ensure that pid_type is valid, not the callee. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
child_wait_callback()->eligible_child() is not right, we can miss the wakeup if the task was detached before __wake_up_parent() and the caller of do_wait() didn't use __WALL. Move ->wo_pid checks from eligible_child() to the new helper, eligible_pid(), and change child_wait_callback() to use it instead of eligible_child(). Note: actually I think it would be better to fix the __WCLONE check in eligible_child(), it doesn't look exactly right. But it is not clear what is the supposed behaviour, and any change is user-visible. Reported-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Suggested by Roland. do_wait(__WNOTHREAD) can only succeed if the caller is either ptracer, or it is ->real_parent and the child is not traced. IOW, caller == p->parent otherwise we should not wake up. Change child_wait_callback() to check this. Ratan reports the workload with CPU load >99% caused by unnecessary wakeups, should be fixed by this patch. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NRoland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ratan Nalumasu <rnalumasu@gmail.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Ratan Nalumasu reported that in a process with many threads doing unnecessary wakeups. Every waiting thread in the process wakes up to loop through the children and see that the only ones it cares about are still not ready. Now that we have struct wait_opts we can change do_wait/__wake_up_parent to use filtered wakeups. We can make child_wait_callback() more clever later, right now it only checks eligible_child(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NRoland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ratan Nalumasu <rnalumasu@gmail.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Tested-by: NValdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
do_wait() wakeup optimization: shift security_task_wait() from eligible_child() to wait_consider_task() Preparation, no functional changes. eligible_child() has a single caller, wait_consider_task(). We can move security_task_wait() out from eligible_child(), this allows us to use it for filtered wake_up(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NRoland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ratan Nalumasu <rnalumasu@gmail.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
The bug is old, it wasn't cause by recent changes. Test case: static void *tfunc(void *arg) { int pid = (long)arg; assert(ptrace(PTRACE_ATTACH, pid, NULL, NULL) == 0); kill(pid, SIGKILL); sleep(1); return NULL; } int main(void) { pthread_t th; long pid = fork(); if (!pid) pause(); signal(SIGCHLD, SIG_IGN); assert(pthread_create(&th, NULL, tfunc, (void*)pid) == 0); int r = waitpid(-1, NULL, __WNOTHREAD); printf("waitpid: %d %m\n", r); return 0; } Before the patch this program hangs, after this patch waitpid() correctly fails with errno == -ECHILD. The problem is, __ptrace_detach() reaps the EXIT_ZOMBIE tracee if its ->real_parent is our sub-thread and we ignore SIGCHLD. But in this case we should wake up other threads which can sleep in do_wait(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Balbir Singh 提交于
Organize cgroups over soft limit in a RB-Tree Introduce an RB-Tree for storing memory cgroups that are over their soft limit. The overall goal is to 1. Add a memory cgroup to the RB-Tree when the soft limit is exceeded. We are careful about updates, updates take place only after a particular time interval has passed 2. We remove the node from the RB-Tree when the usage goes below the soft limit The next set of patches will exploit the RB-Tree to get the group that is over its soft limit by the largest amount and reclaim from it, when we face memory contention. [hugh.dickins@tiscali.co.uk: CONFIG_CGROUP_MEM_RES_CTLR=y CONFIG_PREEMPT=y fails to boot] Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Balbir Singh 提交于
Add an interface to allow get/set of soft limits. Soft limits for memory plus swap controller (memsw) is currently not supported. Resource counters have been enhanced to support soft limits and new type RES_SOFT_LIMIT has been added. Unlike hard limits, soft limits can be directly set and do not need any reclaim or checks before setting them to a newer value. Kamezawa-San raised a question as to whether soft limit should belong to res_counter. Since all resources understand the basic concepts of hard and soft limits, it is justified to add soft limits here. Soft limits are a generic resource usage feature, even file system quotas support soft limits. Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ben Blum 提交于
Alter the ss->can_attach and ss->attach functions to be able to deal with a whole threadgroup at a time, for use in cgroup_attach_proc. (This is a pre-patch to cgroup-procs-writable.patch.) Currently, new mode of the attach function can only tell the subsystem about the old cgroup of the threadgroup leader. No subsystem currently needs that information for each thread that's being moved, but if one were to be added (for example, one that counts tasks within a group) this bit would need to be reworked a bit to tell the subsystem the right information. [hidave.darkstar@gmail.com: fix build] Signed-off-by: NBen Blum <bblum@google.com> Signed-off-by: NPaul Menage <menage@google.com> Acked-by: NLi Zefan <lizf@cn.fujitsu.com> Reviewed-by: NMatt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ben Blum 提交于
Changes css_set freeing mechanism to be under RCU This is a prepatch for making the procs file writable. In order to free the old css_sets for each task to be moved as they're being moved, the freeing mechanism must be RCU-protected, or else we would have to have a call to synchronize_rcu() for each task before freeing its old css_set. Signed-off-by: NBen Blum <bblum@google.com> Signed-off-by: NPaul Menage <menage@google.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Acked-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ben Blum 提交于
Separates all pidlist allocation requests to a separate function that judges based on the requested size whether or not the array needs to be vmalloced or can be gotten via kmalloc, and similar for kfree/vfree. Signed-off-by: NBen Blum <bblum@google.com> Signed-off-by: NPaul Menage <menage@google.com> Acked-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ben Blum 提交于
Previously there was the problem in which two processes from different pid namespaces reading the tasks or procs file could result in one process seeing results from the other's namespace. Rather than one pidlist for each file in a cgroup, we now keep a list of pidlists keyed by namespace and file type (tasks versus procs) in which entries are placed on demand. Each pidlist has its own lock, and that the pidlists themselves are passed around in the seq_file's private pointer means we don't have to touch the cgroup or its master list except when creating and destroying entries. Signed-off-by: NBen Blum <bblum@google.com> Signed-off-by: NPaul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ben Blum 提交于
struct cgroup used to have a bunch of fields for keeping track of the pidlist for the tasks file. Those are now separated into a new struct cgroup_pidlist, of which two are had, one for procs and one for tasks. The way the seq_file operations are set up is changed so that just the pidlist struct gets passed around as the private data. Interface example: Suppose a multithreaded process has pid 1000 and other threads with ids 1001, 1002, 1003: $ cat tasks 1000 1001 1002 1003 $ cat cgroup.procs 1000 $ Signed-off-by: NBen Blum <bblum@google.com> Signed-off-by: NPaul Menage <menage@google.com> Acked-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Menage 提交于
The following series adds a "cgroup.procs" file to each cgroup that reports unique tgids rather than pids, and allows all threads in a threadgroup to be atomically moved to a new cgroup. The subsystem "attach" interface is modified to support attaching whole threadgroups at a time, which could introduce potential problems if any subsystem were to need to access the old cgroup of every thread being moved. The attach interface may need to be revised if this becomes the case. Also added is functionality for read/write locking all CLONE_THREAD fork()ing within a threadgroup, by means of an rwsem that lives in the sighand_struct, for per-threadgroup-ness and also for sharing a cacheline with the sighand's atomic count. This scheme should introduce no extra overhead in the fork path when there's no contention. The final patch reveals potential for a race when forking before a subsystem's attach function is called - one potential solution in case any subsystem has this problem is to hang on to the group's fork mutex through the attach() calls, though no subsystem yet demonstrates need for an extended critical section. This patch: Revert commit 096b7fe0 Author: Li Zefan <lizf@cn.fujitsu.com> AuthorDate: Wed Jul 29 15:04:04 2009 -0700 Commit: Linus Torvalds <torvalds@linux-foundation.org> CommitDate: Wed Jul 29 19:10:35 2009 -0700 cgroups: fix pid namespace bug This is in preparation for some clashing cgroups changes that subsume the original commit's functionaliy. The original commit fixed a pid namespace bug which Ben Blum fixed independently (in the same way, but with different code) as part of a series of patches. I played around with trying to reconcile Ben's patch series with Li's patch, but concluded that it was simpler to just revert Li's, given that Ben's patch series contained essentially the same fix. Signed-off-by: NPaul Menage <menage@google.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Menage 提交于
This patch removes the restriction that a cgroup hierarchy must have at least one bound subsystem. The mount option "none" is treated as an explicit request for no bound subsystems. A hierarchy with no subsystems can be useful for plain task tracking, and is also a step towards the support for multiply-bindable subsystems. As part of this change, the hierarchy id is no longer calculated from the bitmask of subsystems in the hierarchy (since this is not guaranteed to be unique) but is allocated via an ida. Reference counts on cgroups from css_set objects are now taken explicitly one per hierarchy, rather than one per subsystem. Example usage: mount -t cgroup -o none,name=foo cgroup /mnt/cgroup Based on the "no-op"/"none" subsystem concept proposed by kamezawa.hiroyu@jp.fujitsu.com Signed-off-by: NPaul Menage <menage@google.com> Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Menage 提交于
Currently the cgroups code makes the assumption that the subsystem pointers in a struct css_set uniquely identify the hierarchy->cgroup mappings associated with the css_set; and there's no way to directly identify the associated set of cgroups other than by indirecting through the appropriate subsystem state pointers. This patch removes the need for that assumption by adding a back-pointer from struct cg_cgroup_link object to its associated cgroup; this allows the set of cgroups to be determined by traversing the cg_links list in the struct css_set. Signed-off-by: NPaul Menage <menage@google.com> Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Menage 提交于
While it's architecturally clean to have the cgroup debug subsystem be completely independent of the cgroups framework, it limits its usefulness for debugging the contents of internal data structures. Move the debug subsystem code into the scope of all the cgroups data structures to make more detailed debugging possible. Signed-off-by: NPaul Menage <menage@google.com> Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Menage 提交于
To simplify referring to cgroup hierarchies in mount statements, and to allow disambiguation in the presence of empty hierarchies and multiply-bindable subsystems this patch adds support for naming a new cgroup hierarchy via the "name=" mount option A pre-existing hierarchy may be specified by either name or by subsystems; a hierarchy's name cannot be changed by a remount operation. Example usage: # To create a hierarchy called "foo" containing the "cpu" subsystem mount -t cgroup -oname=foo,cpu cgroup /mnt/cgroup1 # To mount the "foo" hierarchy on a second location mount -t cgroup -oname=foo cgroup /mnt/cgroup2 Signed-off-by: NPaul Menage <menage@google.com> Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Xiaotian Feng 提交于
Make the last unlock sequence consistent with previous unlock sequeue. Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Acked-by: NPaul Menage <menage@google.com> Signed-off-by: NXiaotian Feng <dfeng@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zhaolei 提交于
There are many similar code in kernel for one object: convert time between calendar time and broken-down time. Here is some source I found: fs/ncpfs/dir.c fs/smbfs/proc.c fs/fat/misc.c fs/udf/udftime.c fs/cifs/netmisc.c net/netfilter/xt_time.c drivers/scsi/ips.c drivers/input/misc/hp_sdc_rtc.c drivers/rtc/rtc-lib.c arch/ia64/hp/sim/boot/fw-emu.c arch/m68k/mac/misc.c arch/powerpc/kernel/time.c arch/parisc/include/asm/rtc.h ... We can make a common function for this type of conversion, At least we can get following benefit: 1: Make kernel simple and unify 2: Easy to fix bug in converting code 3: Reduce clone of code in future For example, I'm trying to make ftrace display walltime, this patch will make me easy. This code is based on code from glibc-2.6 Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Machek <pavel@ucw.cz> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Paris 提交于
Audit will not respond to signal requests if selinux is disabled since it is unable to translate the 0 sid from the sending process to a context. This patch just doesn't send the context info if there isn't any. Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric Paris 提交于
pahole pointed out that on x86_64 struct audit_context can be rearrainged to save 16 bytes per struct. Since we have an audit_context per task this can acually be a pretty significant gain. Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric Paris 提交于
pahole showed that struct audit_watch had two holes: struct audit_watch { atomic_t count; /* 0 4 */ /* XXX 4 bytes hole, try to pack */ char * path; /* 8 8 */ dev_t dev; /* 16 4 */ /* XXX 4 bytes hole, try to pack */ long unsigned int ino; /* 24 8 */ struct audit_parent * parent; /* 32 8 */ struct list_head wlist; /* 40 16 */ struct list_head rules; /* 56 16 */ /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */ /* size: 72, cachelines: 2, members: 7 */ /* sum members: 64, holes: 2, sum holes: 8 */ /* last cacheline: 8 bytes */ }; /* definitions: 1 */ by moving dev after count we save 8 bytes, actually improving cacheline usage. There are typically very few of these in the kernel so it won't be a large savings, but it's a good thing no matter what. Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Alexey Dobriyan 提交于
* remove asm/atomic.h inclusion from linux/utsname.h -- not needed after kref conversion * remove linux/utsname.h inclusion from files which do not need it NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however due to some personality stuff it _is_ needed -- cowardly leave ELF-related headers and files alone. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
This reverts commit c02e3f36 ("kmod: fix race in usermodehelper code") The patch is wrong. UMH_WAIT_EXEC is called with VFORK what ensures that the child finishes prior returing back to the parent. No race. In fact, the patch makes it even worse because it does the thing it claims not do: - It calls ->complete() on UMH_WAIT_EXEC - the complete() callback may de-allocated subinfo as seen in the following call chain: [<c009f904>] (__link_path_walk+0x20/0xeb4) from [<c00a094c>] (path_walk+0x48/0x94) [<c00a094c>] (path_walk+0x48/0x94) from [<c00a0a34>] (do_path_lookup+0x24/0x4c) [<c00a0a34>] (do_path_lookup+0x24/0x4c) from [<c00a158c>] (do_filp_open+0xa4/0x83c) [<c00a158c>] (do_filp_open+0xa4/0x83c) from [<c009ba90>] (open_exec+0x24/0xe0) [<c009ba90>] (open_exec+0x24/0xe0) from [<c009bfa8>] (do_execve+0x7c/0x2e4) [<c009bfa8>] (do_execve+0x7c/0x2e4) from [<c0026a80>] (kernel_execve+0x34/0x80) [<c0026a80>] (kernel_execve+0x34/0x80) from [<c004b514>] (____call_usermodehelper+0x130/0x148) [<c004b514>] (____call_usermodehelper+0x130/0x148) from [<c0024858>] (kernel_thread_exit+0x0/0x8) and the path pointer was NULL. Good that ARM's kernel_execve() doesn't check the pointer for NULL or else I wouldn't notice it. The only race there might be is with UMH_NO_WAIT but it is too late for me to investigate it now. UMH_WAIT_PROC could probably also use VFORK and we could save one exec. So the only race I see is with UMH_NO_WAIT and recent scheduler changes where the child does not always run first might have trigger here something but as I said, it is late.... Signed-off-by: NSebastian Andrzej Siewior <sebastian@breakpoint.cc> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rusty Russell 提交于
Now everyone is converted to arch_send_call_function_ipi_mask, remove the shim and the #defines. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Li Zefan 提交于
Remove open-coded zalloc_cpumask_var() and zalloc_cpumask_var_node(). Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Randy Dunlap 提交于
ERROR: "creds_are_invalid" [fs/cachefiles/cachefiles.ko] undefined! Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NJames Morris <jmorris@namei.org>
-