- 20 7月, 2017 1 次提交
-
-
由 Eric W. Biederman 提交于
It is pointless and confusing to allow a pid namespace hierarchy and the user namespace hierarchy to get out of sync. The owner of a child pid namespace should be the owner of the parent pid namespace or a descendant of the owner of the parent pid namespace. Otherwise it is possible to construct scenarios where a process has a capability over a parent pid namespace but does not have the capability over a child pid namespace. Which confusingly makes permission checks non-transitive. It requires use of setns into a pid namespace (but not into a user namespace) to create such a scenario. Add the function in_userns to help in making this determination. v2: Optimized in_userns by using level as suggested by: Kirill Tkhai <ktkhai@virtuozzo.com> Ref: 49f4d8b9 ("pidns: Capture the user namespace and filter ns_last_pid") Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 14 5月, 2017 1 次提交
-
-
由 Eric W. Biederman 提交于
The code can potentially sleep for an indefinite amount of time in zap_pid_ns_processes triggering the hung task timeout, and increasing the system average. This is undesirable. Sleep with a task state of TASK_INTERRUPTIBLE instead of TASK_UNINTERRUPTIBLE to remove these undesirable side effects. Apparently under heavy load this has been allowing Chrome to trigger the hung time task timeout error and cause ChromeOS to reboot. Reported-by: NVovo Yang <vovoy@google.com> Reported-by: NGuenter Roeck <linux@roeck-us.net> Tested-by: NGuenter Roeck <linux@roeck-us.net> Fixes: 6347e900 ("pidns: guarantee that the pidns init will be the last pidns process reaped") Cc: stable@vger.kernel.org Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 09 5月, 2017 1 次提交
-
-
由 Kirill Tkhai 提交于
pid_ns_for_children set by a task is known only to the task itself, and it's impossible to identify it from outside. It's a big problem for checkpoint/restore software like CRIU, because it can't correctly handle tasks, that do setns(CLONE_NEWPID) in proccess of their work. This patch solves the problem, and it exposes pid_ns_for_children to ns directory in standard way with the name "pid_for_children": ~# ls /proc/5531/ns -l | grep pid lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid -> pid:[4026531836] lrwxrwxrwx 1 root root 0 Jan 14 16:38 pid_for_children -> pid:[4026532286] Link: http://lkml.kernel.org/r/149201123914.6007.2187327078064239572.stgit@localhost.localdomainSigned-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Cc: Andrei Vagin <avagin@virtuozzo.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 3月, 2017 3 次提交
-
-
由 Ingo Molnar 提交于
Instead of including the full <linux/signal.h>, we are going to include the types-only <linux/signal_types.h> header in <linux/sched.h>, to further decouple the scheduler header from the signal headers. This means that various files which relied on the full <linux/signal.h> need to be updated to gain an explicit dependency on it. Update the code that relies on sched.h's inclusion of the <linux/signal.h> header. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/task.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h doing that for them. Note that even if the count where we need to add extra headers seems high, it's still a net win, because <linux/sched.h> is included in over 2,200 files ... Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 1月, 2017 1 次提交
-
-
由 Andrei Vagin 提交于
========================================================= [ INFO: possible irq lock inversion dependency detected ] 4.10.0-rc2-00024-g4aecec9-dirty #118 Tainted: G W --------------------------------------------------------- swapper/1/0 just changed the state of lock: (&(&sighand->siglock)->rlock){-.....}, at: [<ffffffffbd0a1bc6>] __lock_task_sighand+0xb6/0x2c0 but this lock took another, HARDIRQ-unsafe lock in the past: (ucounts_lock){+.+...} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Chain exists of: &(&sighand->siglock)->rlock --> &(&tty->ctrl_lock)->rlock --> ucounts_lock Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(ucounts_lock); local_irq_disable(); lock(&(&sighand->siglock)->rlock); lock(&(&tty->ctrl_lock)->rlock); <Interrupt> lock(&(&sighand->siglock)->rlock); *** DEADLOCK *** This patch removes a dependency between rlock and ucount_lock. Fixes: f333c700 ("pidns: Add a limit on the number of pid namespaces") Cc: stable@vger.kernel.org Signed-off-by: NAndrei Vagin <avagin@openvz.org> Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 23 9月, 2016 3 次提交
-
-
由 Andrey Vagin 提交于
Pid and user namepaces are hierarchical. There is no way to discover parent-child relationships. In a future we will use this interface to dump and restore nested namespaces. Acked-by: NSerge Hallyn <serge@hallyn.com> Signed-off-by: NAndrei Vagin <avagin@openvz.org> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 Andrey Vagin 提交于
Return -EPERM if an owning user namespace is outside of a process current user namespace. v2: In a first version ns_get_owner returned ENOENT for init_user_ns. This special cases was removed from this version. There is nothing outside of init_user_ns, so we can return EPERM. v3: rename ns->get_owner() to ns->owner(). get_* usually means that it grabs a reference. Acked-by: NSerge Hallyn <serge@hallyn.com> Signed-off-by: NAndrei Vagin <avagin@openvz.org> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> -
由 Eric W. Biederman 提交于
The current error codes returned when a the per user per user namespace limit are hit (EINVAL, EUSERS, and ENFILE) are wrong. I asked for advice on linux-api and it we made clear that those were the wrong error code, but a correct effor code was not suggested. The best general error code I have found for hitting a resource limit is ENOSPC. It is not perfect but as it is unambiguous it will serve until someone comes up with a better error code. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 09 8月, 2016 1 次提交
-
-
由 Eric W. Biederman 提交于
Acked-by: NKees Cook <keescook@chromium.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 11 12月, 2014 1 次提交
-
-
由 Oleg Nesterov 提交于
The comments in zap_pid_ns_processes() are not clear, we need to explain how this code actually works. 1. "Ignore SIGCHLD" looks like optimization but it is not, we also need this for correctness. 2. The comment above sys_wait4() could tell more. EXIT_ZOMBIE child is only possible if it has exited before we ignored SIGCHLD. Or if it is traced from the parent namespace, but in this case it will be reaped by debugger after detach, sys_wait4() acts as a synchronization point. 3. The comment about TASK_DEAD (EXIT_DEAD in fact) children is outdated. Contrary to what it says we do not need to make sure they all go away after 0a01f2cc "pidns: Make the pidns proc mount/umount logic obvious". At the same time, we do need to wait for nr_hashed==init_pids, but the reasons are quite different and not obvious: setns(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Sterling Alexander <stalexan@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 12月, 2014 5 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> -
由 Al Viro 提交于
take struct ns_common *, for now simply wrappers around proc_{alloc,free}_inum() Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> -
由 Al Viro 提交于
We can do that now. And kill ->inum(), while we are at it - all instances are identical. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> -
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> -
由 Al Viro 提交于
for now - just move corresponding ->proc_inum instances over there Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 03 4月, 2014 1 次提交
-
-
由 Oleg Nesterov 提交于
pidns_get()->get_pid_ns() can hit ns == NULL. This task_struct can't go away, but task_active_pid_ns(task) is NULL if release_task(task) was already called. Alternatively we could change get_pid_ns(ns) to check ns != NULL, but it seems that other callers are fine. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman ebiederm@xmission.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 10月, 2013 1 次提交
-
-
由 Al Viro 提交于
makes procfs ->premission() instances safety in RCU mode independent from vfsmount_lock. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 31 8月, 2013 1 次提交
-
-
由 Eric W. Biederman 提交于
nsown_capable is a special case of ns_capable essentially for just CAP_SETUID and CAP_SETGID. For the existing users it doesn't noticably simplify things and from the suggested patches I have seen it encourages people to do the wrong thing. So remove nsown_capable. Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 28 8月, 2013 1 次提交
-
-
由 Andy Lutomirski 提交于
nsproxy.pid_ns is *not* the task's pid namespace. The name should clarify that. This makes it more obvious that setns on a pid namespace is weird -- it won't change the pid namespace shown in procfs. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 5月, 2013 1 次提交
-
-
由 David Howells 提交于
Split the proc namespace stuff out into linux/proc_ns.h. Signed-off-by: NDavid Howells <dhowells@redhat.com> cc: netdev@vger.kernel.org cc: Serge E. Hallyn <serge.hallyn@ubuntu.com> cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 01 5月, 2013 1 次提交
-
-
由 Raphael S.Carvalho 提交于
Move BITS_PER_PAGE from pid_namespace.c to pid_namespace.h, since we can simplify the define PID_MAP_ENTRIES by using the BITS_PER_PAGE. [akpm@linux-foundation.org: kernel/pid.c:54:1: warning: "BITS_PER_PAGE" redefined] Signed-off-by: NRaphael S.Carvalho <raphael.scarv@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 3月, 2013 1 次提交
-
-
由 Eric W. Biederman 提交于
When a multi-threaded init exits and the initial thread is not the last thread to exit the initial thread hangs around as a zombie until the last thread exits. In that case zap_pid_ns_processes needs to wait until there are only 2 hashed pids in the pid namespace not one. v2. Replace thread_pid_vnr(me) == 1 with the test thread_group_leader(me) as suggested by Oleg. Cc: stable@vger.kernel.org Cc: Oleg Nesterov <oleg@redhat.com> Reported-by: NCaj Larsson <caj@omnicloud.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 26 12月, 2012 1 次提交
-
-
由 Eric W. Biederman 提交于
Oleg pointed out that in a pid namespace the sequence. - pid 1 becomes a zombie - setns(thepidns), fork,... - reaping pid 1. - The injected processes exiting. Can lead to processes attempting access their child reaper and instead following a stale pointer. That waitpid for init can return before all of the processes in the pid namespace have exited is also unfortunate. Avoid these problems by disabling the allocation of new pids in a pid namespace when init dies, instead of when the last process in a pid namespace is reaped. Pointed-out-by: NOleg Nesterov <oleg@redhat.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 15 12月, 2012 1 次提交
-
-
由 Eric W. Biederman 提交于
Andy Lutomirski <luto@amacapital.net> found a nasty little bug in the permissions of setns. With unprivileged user namespaces it became possible to create new namespaces without privilege. However the setns calls were relaxed to only require CAP_SYS_ADMIN in the user nameapce of the targed namespace. Which made the following nasty sequence possible. pid = clone(CLONE_NEWUSER | CLONE_NEWNS); if (pid == 0) { /* child */ system("mount --bind /home/me/passwd /etc/passwd"); } else if (pid != 0) { /* parent */ char path[PATH_MAX]; snprintf(path, sizeof(path), "/proc/%u/ns/mnt"); fd = open(path, O_RDONLY); setns(fd, 0); system("su -"); } Prevent this possibility by requiring CAP_SYS_ADMIN in the current user namespace when joing all but the user namespace. Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 20 11月, 2012 1 次提交
-
-
由 Eric W. Biederman 提交于
Assign a unique proc inode to each namespace, and use that inode number to ensure we only allocate at most one proc inode for every namespace in proc. A single proc inode per namespace allows userspace to test to see if two processes are in the same namespace. This has been a long requested feature and only blocked because a naive implementation would put the id in a global space and would ultimately require having a namespace for the names of namespaces, making migration and certain virtualization tricks impossible. We still don't have per superblock inode numbers for proc, which appears necessary for application unaware checkpoint/restart and migrations (if the application is using namespace file descriptors) but that is now allowd by the design if it becomes important. I have preallocated the ipc and uts initial proc inode numbers so their structures can be statically initialized. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 19 11月, 2012 6 次提交
-
-
由 Eric W. Biederman 提交于
Unsharing of the pid namespace unlike unsharing of other namespaces does not take affect immediately. Instead it affects the children created with fork and clone. The first of these children becomes the init process of the new pid namespace, the rest become oddball children of pid 0. From the point of view of the new pid namespace the process that created it is pid 0, as it's pid does not map. A couple of different semantics were considered but this one was settled on because it is easy to implement and it is usable from pam modules. The core reasons for the existence of unshare. I took a survey of the callers of pam modules and the following appears to be a representative sample of their logic. { setup stuff include pam child = fork(); if (!child) { setuid() exec /bin/bash } waitpid(child); pam and other cleanup } As you can see there is a fork to create the unprivileged user space process. Which means that the unprivileged user space process will appear as pid 1 in the new pid namespace. Further most login processes do not cope with extraneous children which means shifting the duty of reaping extraneous child process to the creator of those extraneous children makes the system more comprehensible. The practical reason for this set of pid namespace semantics is that it is simple to implement and verify they work correctly. Whereas an implementation that requres changing the struct pid on a process comes with a lot more races and pain. Not the least of which is that glibc caches getpid(). These semantics are implemented by having two notions of the pid namespace of a proces. There is task_active_pid_ns which is the pid namspace the process was created with and the pid namespace that all pids are presented to that process in. The task_active_pid_ns is stored in the struct pid of the task. Then there is the pid namespace that will be used for children that pid namespace is stored in task->nsproxy->pid_ns. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> -
由 Eric W. Biederman 提交于
- Pid namespaces are designed to be inescapable so verify that the passed in pid namespace is a child of the currently active pid namespace or the currently active pid namespace itself. Allowing the currently active pid namespace is important so the effects of an earlier setns can be cancelled. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> -
由 Eric W. Biederman 提交于
task_active_pid_ns(current) != current->ns_proxy->pid_ns will soon be allowed to support unshare and setns. The definition of creating a child pid namespace when task_active_pid_ns(current) != current->ns_proxy->pid_ns could be that we create a child pid namespace of current->ns_proxy->pid_ns. However that leads to strange cases like trying to have a single process be init in multiple pid namespaces, which is racy and hard to think about. The definition of creating a child pid namespace when task_active_pid_ns(current) != current->ns_proxy->pid_ns could be that we create a child pid namespace of task_active_pid_ns(current). While that seems less racy it does not provide any utility. Therefore define the semantics of creating a child pid namespace when task_active_pid_ns(current) != current->ns_proxy->pid_ns to be that the pid namespace creation fails. That is easy to implement and easy to think about. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> -
由 Eric W. Biederman 提交于
Looking at pid_ns->nr_hashed is a bit simpler and it works for disjoint process trees that an unshare or a join of a pid_namespace may create. Acked-by: N"Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Track the number of pids in the proc hash table. When the number of pids goes to 0 schedule work to unmount the kernel mount of proc. Move the mount of proc into alloc_pid when we allocate the pid for init. Remove the surprising calls of pid_ns_release proc in fork and proc_flush_task. Those code paths really shouldn't know about proc namespace implementation details and people have demonstrated several times that finding and understanding those code paths is difficult and non-obvious. Because of the call path detach pid is alwasy called with the rtnl_lock held free_pid is not allowed to sleep, so the work to unmounting proc is moved to a work queue. This has the side benefit of not blocking the entire world waiting for the unnecessary rcu_barrier in deactivate_locked_super. In the process of making the code clear and obvious this fixes a bug reported by Gao feng <gaofeng@cn.fujitsu.com> where we would leak a mount of proc during clone(CLONE_NEWPID|CLONE_NEWNET) if copy_pid_ns succeeded and copy_net_ns failed. Acked-by: N"Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
- Capture the the user namespace that creates the pid namespace - Use that user namespace to test if it is ok to write to /proc/sys/kernel/ns_last_pid. Zhao Hongjiang <zhaohongjiang@huawei.com> noticed I was missing a put_user_ns in when destroying a pid_ns. I have foloded his patch into this one so that bisects will work properly. Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 26 10月, 2012 1 次提交
-
-
由 Andrew Vagin 提交于
'struct pid' is a "variable sized struct" - a header with an array of upids at the end. The size of the array depends on a level (depth) of pid namespaces. Now a level of pidns is not limited, so 'struct pid' can be more than one page. Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL is not calculated from PAGE_SIZE, because in this case it depends on architectures, config options and it will be reduced, if someone adds a new fields in struct pid or struct upid. I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to expand "struct pid" and it's more than enough for all known for me use-cases. When someone finds a reasonable use case, we can add a config option or a sysctl parameter. In addition it will reduce the effect of another problem, when we have many nested namespaces and the oldest one starts dying. zap_pid_ns_processe will be called for each namespace and find_vpid will be called for each process in a namespace. find_vpid will be called minimum max_level^2 / 2 times. The reason of that is that when we found a bit in pidmap, we can't determine this pidns is top for this process or it isn't. vpid is a heavy operation, so a fork bomb, which create many nested namespace, can make a system inaccessible for a long time. For example my system becomes inaccessible for a few minutes with 4000 processes. [akpm@linux-foundation.org: return -EINVAL in response to excessive nesting, not -ENOMEM] Signed-off-by: NAndrew Vagin <avagin@openvz.org> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 10月, 2012 1 次提交
-
-
由 Cyrill Gorcunov 提交于
free_pid_ns() operates in a recursive fashion: free_pid_ns(parent) put_pid_ns(parent) kref_put(&ns->kref, free_pid_ns); free_pid_ns thus if there was a huge nesting of namespaces the userspace may trigger avalanche calling of free_pid_ns leading to kernel stack exhausting and a panic eventually. This patch turns the recursion into an iterative loop. Based on a patch by Andrew Vagin. [akpm@linux-foundation.org: export put_pid_ns() to modules] Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 9月, 2012 1 次提交
-
-
由 Andrew Vagin 提交于
The kernel doesn't check the pid for negative values, so if you try to write -2 to /proc/sys/kernel/ns_last_pid, you will get a kernel panic. The crash happens because the next pid is -1, and alloc_pidmap() will try to access to a nonexistent pidmap. map = &pid_ns->pidmap[pid/BITS_PER_PAGE]; Signed-off-by: NAndrew Vagin <avagin@openvz.org> Acked-by: NCyrill Gorcunov <gorcunov@openvz.org> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 8月, 2012 1 次提交
-
-
由 Eric W. Biederman 提交于
There is a least one modular user so export free_pid_ns so modules can capture and use the pid namespace on the very rare occasion when it makes sense. Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 21 6月, 2012 1 次提交
-
-
由 Eric W. Biederman 提交于
Today we have a twofold bug. Sometimes release_task on pid == 1 in a pid namespace can run before other processes in a pid namespace have had release task called. With the result that pid_ns_release_proc can be called before the last proc_flus_task() is done using upid->ns->proc_mnt, resulting in the use of a stale pointer. This same set of circumstances can lead to waitpid(...) returning for a processes started with clone(CLONE_NEWPID) before the every process in the pid namespace has actually exited. To fix this modify zap_pid_ns_processess wait until all other processes in the pid namespace have exited, even EXIT_DEAD zombies. The delay_group_leader and related tests ensure that the thread gruop leader will be the last thread of a process group to be reaped, or to become EXIT_DEAD and self reap. With the change to zap_pid_ns_processes we get the guarantee that pid == 1 in a pid namespace will be the last task that release_task is called on. With pid == 1 being the last task to pass through release_task pid_ns_release_proc can no longer be called too early nor can wait return before all of the EXIT_DEAD tasks in a pid namespace have exited. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Louis Rilling <louis.rilling@kerlabs.com> Cc: Mike Galbraith <efault@gmx.de> Acked-by: NPavel Emelyanov <xemul@parallels.com> Tested-by: NAndrew Wagin <avagin@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 6月, 2012 2 次提交
-
-
由 Cyrill Gorcunov 提交于
For those who doesn't need C/R functionality there is no need to control last pid, ie the pid for the next fork() call. Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric W. Biederman 提交于
Force SIGCHLD handling to SIG_IGN so that signals are not generated and so that the children autoreap. This increases the parallelize and in general the speed of network namespace shutdown. Note self reaping childrean can exist past zap_pid_ns_processess but they will all be reaped before we allow the pid namespace init task with pid == 1 to be reaped. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Louis Rilling <louis.rilling@kerlabs.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-