- 19 11月, 2012 4 次提交
-
-
由 Eric W. Biederman 提交于
Track the number of pids in the proc hash table. When the number of pids goes to 0 schedule work to unmount the kernel mount of proc. Move the mount of proc into alloc_pid when we allocate the pid for init. Remove the surprising calls of pid_ns_release proc in fork and proc_flush_task. Those code paths really shouldn't know about proc namespace implementation details and people have demonstrated several times that finding and understanding those code paths is difficult and non-obvious. Because of the call path detach pid is alwasy called with the rtnl_lock held free_pid is not allowed to sleep, so the work to unmounting proc is moved to a work queue. This has the side benefit of not blocking the entire world waiting for the unnecessary rcu_barrier in deactivate_locked_super. In the process of making the code clear and obvious this fixes a bug reported by Gao feng <gaofeng@cn.fujitsu.com> where we would leak a mount of proc during clone(CLONE_NEWPID|CLONE_NEWNET) if copy_pid_ns succeeded and copy_net_ns failed. Acked-by: N"Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
The expressions tsk->nsproxy->pid_ns and task_active_pid_ns aka ns_of_pid(task_pid(tsk)) should have the same number of cache line misses with the practical difference that ns_of_pid(task_pid(tsk)) is released later in a processes life. Furthermore by using task_active_pid_ns it becomes trivial to write an unshare implementation for the the pid namespace. So I have used task_active_pid_ns everywhere I can. In fork since the pid has not yet been attached to the process I use ns_of_pid, to achieve the same effect. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Now that we have s_fs_info pointing to our pid namespace the original reason for the proc root inode having a struct pid is gone. Caching a pid in the root inode has led to some complicated code. Now that we don't need the struct pid, just remove it. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
I had visions at one point of splitting proc into two filesystems. If that had happened proc/self being the the part of proc that actually deals with pids would have been a nice cleanup. As it is proc/self requires a lot of unnecessary infrastructure for a single file. The only user visible change is that a mounted /proc for a pid namespace that is dead now shows a broken proc symlink, instead of being completely invisible. I don't think anyone will notice or care. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 06 10月, 2012 1 次提交
-
-
由 Sachin Kamat 提交于
This cleanup also fixes the following sparse warning: fs/proc/root.c:64:45: warning: Using plain integer as NULL pointer Signed-off-by: NSachin Kamat <sachin.kamat@linaro.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 7月, 2012 2 次提交
-
-
由 David Howells 提交于
Pass mount flags to sget() so that it can use them in initialising a new superblock before the set function is called. They could also be passed to the compare function. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Just the flags; only NFS cares even about that, but there are legitimate uses for such argument. And getting rid of that completely would require splitting ->lookup() into a couple of methods (at least), so let's leave that alone for now... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 5月, 2012 1 次提交
-
-
由 Eric W. Biederman 提交于
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 06 4月, 2012 1 次提交
-
-
由 Vasiliy Kulikov 提交于
The proc_parse_options() call from proc_mount() runs only once at boot time. So on any later mount attempt, any mount options are ignored because ->s_root is already initialized. As a consequence, "mount -o <options>" will ignore the options. The only way to change mount options is "mount -o remount,<options>". To fix this, parse the mount options unconditionally. Signed-off-by: NVasiliy Kulikov <segoon@openwall.com> Reported-by: NArkadiusz Miskiewicz <a.miskiewicz@gmail.com> Tested-by: NArkadiusz Miskiewicz <a.miskiewicz@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 1月, 2012 2 次提交
-
-
由 Vasiliy Kulikov 提交于
Add support for mount options to restrict access to /proc/PID/ directories. The default backward-compatible "relaxed" behaviour is left untouched. The first mount option is called "hidepid" and its value defines how much info about processes we want to be available for non-owners: hidepid=0 (default) means the old behavior - anybody may read all world-readable /proc/PID/* files. hidepid=1 means users may not access any /proc/<pid>/ directories, but their own. Sensitive files like cmdline, sched*, status are now protected against other users. As permission checking done in proc_pid_permission() and files' permissions are left untouched, programs expecting specific files' modes are not confused. hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other users. It doesn't mean that it hides whether a process exists (it can be learned by other means, e.g. by kill -0 $PID), but it hides process' euid and egid. It compicates intruder's task of gathering info about running processes, whether some daemon runs with elevated privileges, whether another user runs some sensitive program, whether other users run any program at all, etc. gid=XXX defines a group that will be able to gather all processes' info (as in hidepid=0 mode). This group should be used instead of putting nonroot user in sudoers file or something. However, untrusted users (like daemons, etc.) which are not supposed to monitor the tasks in the whole system should not be added to the group. hidepid=1 or higher is designed to restrict access to procfs files, which might reveal some sensitive private information like precise keystrokes timings: http://www.openwall.com/lists/oss-security/2011/11/05/3 hidepid=1/2 doesn't break monitoring userspace tools. ps, top, pgrep, and conky gracefully handle EPERM/ENOENT and behave as if the current user is the only user running processes. pstree shows the process subtree which contains "pstree" process. Note: the patch doesn't deal with setuid/setgid issues of keeping preopened descriptors of procfs files (like https://lkml.org/lkml/2011/2/7/368). We rely on that the leaked information like the scheduling counters of setuid apps doesn't threaten anybody's privacy - only the user started the setuid program may read the counters. Signed-off-by: NVasiliy Kulikov <segoon@openwall.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg KH <greg@kroah.com> Cc: Theodore Tso <tytso@MIT.EDU> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: James Morris <jmorris@namei.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vasiliy Kulikov 提交于
Add support for procfs mount options. Actual mount options are coming in the next patches. Signed-off-by: NVasiliy Kulikov <segoon@openwall.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg KH <greg@kroah.com> Cc: Theodore Tso <tytso@MIT.EDU> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: James Morris <jmorris@namei.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 12月, 2011 1 次提交
-
-
由 Al Viro 提交于
kern_mount() doesn't pair with plain mntput()... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 28 7月, 2011 1 次提交
-
-
由 David Howells 提交于
Since __proc_create() appends the name it is given to the end of the PDE structure that it allocates, there isn't a need to store a name pointer. Instead we can just replace the name pointer with a terminal char array of _unspecified_ length. The compiler will simply append the string to statically defined variables of PDE type overlapping any hole at the end of the structure and, unlike specifying an explicitly _zero_ length array, won't give a warning if you try to statically initialise it with a string of more than zero length. Also, whilst we're at it: (1) Move namelen to end just prior to name and reduce it to a single byte (name shouldn't be longer than NAME_MAX). (2) Move pde_unload_lock two places further on so that if it's four bytes in size on a 64-bit machine, it won't cause an unused hole in the PDE struct. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 6月, 2011 1 次提交
-
-
由 Al Viro 提交于
set_anon_super() can fail... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 24 3月, 2011 2 次提交
-
-
由 Oleg Nesterov 提交于
After the previous cleanup in proc_get_sb() the global proc_mnt has no reasons to exists, kill it. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr> Cc: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: NSerge E. Hallyn <serge@hallyn.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric W. Biederman 提交于
Reorganize proc_get_sb() so it can be called before the struct pid of the first process is allocated. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: NSerge E. Hallyn <serge@hallyn.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 10月, 2010 2 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
take that to kern_mount_data()-using callers Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 10月, 2010 1 次提交
-
-
由 Arnd Bergmann 提交于
All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode *i, struct file *f) { <+... ( nonseekable_open(...) | nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable */ }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, /* open uses nonseekable */ }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, /* we have seq_read */ }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, /* readdir is present */ }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, /* read accesses f_pos */ }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, /* write accesses f_pos */ }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, /* read and write both use no f_pos */ }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, /* write uses no f_pos */ }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, /* read uses no f_pos */ }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, /* no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
-
- 28 5月, 2010 1 次提交
-
-
由 Dan Carpenter 提交于
I removed 3 unused assignments. The first two get reset on the first statement of their functions. For "err" in root.c we don't return an error and we don't use the variable again. Signed-off-by: NDan Carpenter <error27@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 3月, 2010 1 次提交
-
-
由 Helight.Xu 提交于
EXPORT_SYMBOL(proc_symlink); EXPORT_SYMBOL(proc_mkdir); EXPORT_SYMBOL(create_proc_entry); EXPORT_SYMBOL(proc_create_data); EXPORT_SYMBOL(remove_proc_entry); Those EXPORT_SYMBOL shouldn't be in fs/proc/root.c, should be in fs/proc/generic.c. Signed-off-by: NHelight.Xu <helight.xu@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 09 5月, 2009 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 28 3月, 2009 1 次提交
-
-
由 Sukadev Bhattiprolu 提交于
simple_set_mnt() is defined as returning 'int' but always returns 0. Callers assume simple_set_mnt() never fails and don't properly cleanup if it were to _ever_ fail. For instance, get_sb_single() and get_sb_nodev() should: up_write(sb->s_unmount); deactivate_super(sb); if simple_set_mnt() fails. Since simple_set_mnt() never fails, would be cleaner if it did not return anything. [akpm@linux-foundation.org: fix build] Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 1月, 2009 1 次提交
-
-
由 Alexey Dobriyan 提交于
There are four BKL users in proc: de_put(), proc_lookup_de(), proc_readdir_de(), proc_root_readdir(), 1) de_put() ----------- de_put() is classic atomic_dec_and_test() refcount wrapper -- no BKL needed. BKL doesn't matter to possible refcount leak as well. 2) proc_lookup_de() ------------------- Walking PDE list is protected by proc_subdir_lock(), proc_get_inode() is potentially blocking, all callers of proc_lookup_de() eventually end up from ->lookup hooks which is protected by directory's ->i_mutex -- BKL doesn't protect anything. 3) proc_readdir_de() -------------------- "." and ".." part doesn't need BKL, walking PDE list is under proc_subdir_lock, calling filldir callback is potentially blocking because it writes to luserspace. All proc_readdir_de() callers eventually come from ->readdir hook which is under directory's ->i_mutex -- BKL doesn't protect anything. 4) proc_root_readdir_de() ------------------------- proc_root_readdir_de is ->readdir hook, see (3). Since readdir hooks doesn't use BKL anymore, switch to generic_file_llseek, since it also takes directory's i_mutex. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
- 23 10月, 2008 2 次提交
-
-
由 Alexey Dobriyan 提交于
Now that everything was moved to their more or less expected places, apply rm(1). Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
由 Alexey Dobriyan 提交于
kmem_cache creation code will panic, don't return anything. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
-
- 29 4月, 2008 5 次提交
-
-
由 Denis V. Lunev 提交于
This set of patches fixes an proc ->open'less usage due to ->proc_fops flip in the most part of the kernel code. The original OOPS is described in the commit 2d3a4e36: Typical PDE creation code looks like: pde = create_proc_entry("foo", 0, NULL); if (pde) pde->proc_fops = &foo_proc_fops; Notice that PDE is first created, only then ->proc_fops is set up to final value. This is a problem because right after creation a) PDE is fully visible in /proc , and b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's possible to ->read without ->open (see one class of oopses below). The fix is new API called proc_create() which makes sure ->proc_fops are set up before gluing PDE to main tree. Typical new code looks like: pde = proc_create("foo", 0, NULL, &foo_proc_fops); if (!pde) return -ENOMEM; Fix most networking users for a start. In the long run, create_proc_entry() for regular files will go. In addition to this, proc_create_data is introduced to fix reading from proc without PDE->data. The race is basically the same as above. create_proc_entries is replaced in the entire kernel code as new method is also simply better. This patch: The problem is the same as for de->proc_fops. Right now PDE becomes visible without data set. So, the entry could be looked up without data. This, in most cases, will simply OOPS. proc_create_data call is created to address this issue. proc_create now becomes a wrapper around it. Signed-off-by: NDenis V. Lunev <den@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Chris Mason <chris.mason@oracle.com> Acked-by: NDavid Howells <dhowells@redhat.com> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Jaroslav Kysela <perex@suse.cz> Cc: Jeff Garzik <jgarzik@pobox.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Karsten Keil <kkeil@suse.de> Cc: Kyle McMartin <kyle@parisc-linux.org> Cc: Len Brown <lenb@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Mikael Starvik <starvik@axis.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Neil Brown <neilb@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Osterlund <petero2@telia.com> Cc: Pierre Peiffer <peifferp@gmail.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Takashi Iwai <tiwai@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Remove proc_root export. Creation and removal works well if parent PDE is supplied as NULL -- it worked always that way. So, one useless export removed and consistency added, some drivers created PDEs with &proc_root as parent but removed them as NULL and so on. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Use creation by full path: "driver/foo". Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Use creation by full path instead: "fs/foo". Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Remove proc_bus export and variable itself. Using pathnames works fine and is slightly more understandable and greppable. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 2月, 2008 1 次提交
-
-
由 Alexey Dobriyan 提交于
Typical PDE creation code looks like: pde = create_proc_entry("foo", 0, NULL); if (pde) pde->proc_fops = &foo_proc_fops; Notice that PDE is first created, only then ->proc_fops is set up to final value. This is a problem because right after creation a) PDE is fully visible in /proc , and b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's possible to ->read without ->open (see one class of oopses below). The fix is new API called proc_create() which makes sure ->proc_fops are set up before gluing PDE to main tree. Typical new code looks like: pde = proc_create("foo", 0, NULL, &foo_proc_fops); if (!pde) return -ENOMEM; Fix most networking users for a start. In the long run, create_proc_entry() for regular files will go. BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024 printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000 Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/block/sda/sda1/dev Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2) EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0 EIP is at mutex_lock_nested+0x75/0x25d EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570 ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000) Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb 00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246 00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550 Call Trace: [<c106f7ce>] seq_read+0x24/0x28a [<c106f7aa>] seq_read+0x0/0x28a [<c106f7ce>] seq_read+0x24/0x28a [<c106f7aa>] seq_read+0x0/0x28a [<c10818b8>] proc_reg_read+0x60/0x73 [<c1081858>] proc_reg_read+0x0/0x73 [<c105a34f>] vfs_read+0x6c/0x8b [<c105a6f3>] sys_read+0x3c/0x63 [<c10025f2>] sysenter_past_esp+0x5f/0xa5 [<c10697a7>] destroy_inode+0x24/0x33 ======================= INFO: lockdep is turned off. Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33 EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8 [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 12月, 2007 1 次提交
-
-
由 Alexey Dobriyan 提交于
Creating PDEs with refcount 0 and "deleted" flag has problems (see below). Switch to usual scheme: * PDE is created with refcount 1 * every de_get does +1 * every de_put() and remove_proc_entry() do -1 * once refcount reaches 0, PDE is freed. This elegantly fixes at least two following races (both observed) without introducing new locks, without abusing old locks, without spreading lock_kernel(): 1) PDE leak remove_proc_entry de_put ----------------- ------ [refcnt = 1] if (atomic_read(&de->count) == 0) if (atomic_dec_and_test(&de->count)) if (de->deleted) /* also not taken! */ free_proc_entry(de); else de->deleted = 1; [refcount=0, deleted=1] 2) use after free remove_proc_entry de_put ----------------- ------ [refcnt = 1] if (atomic_dec_and_test(&de->count)) if (atomic_read(&de->count) == 0) free_proc_entry(de); /* boom! */ if (de->deleted) free_proc_entry(de); BUG: unable to handle kernel paging request at virtual address 6b6b6b6b printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c086340 #4) EIP: 0060:[<c10acdda>] EFLAGS: 00210097 CPU: 1 EIP is at strnlen+0x6/0x18 EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000) Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400 c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400 f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34 Call Trace: [<c10ac4f0>] vsnprintf+0x2ad/0x49b [<c10ac779>] vscnprintf+0x14/0x1f [<c1018e6b>] vprintk+0xc5/0x2f9 [<c10379f1>] handle_fasteoi_irq+0x0/0xab [<c1004f44>] do_IRQ+0x9f/0xb7 [<c117db3b>] preempt_schedule_irq+0x3f/0x5b [<c100264e>] need_resched+0x1f/0x21 [<c10190ba>] printk+0x1b/0x1f [<c107c8ad>] de_put+0x3d/0x50 [<c107c8f8>] proc_delete_inode+0x38/0x41 [<c107c8c0>] proc_delete_inode+0x0/0x41 [<c1066298>] generic_delete_inode+0x5e/0xc6 [<c1065aa9>] iput+0x60/0x62 [<c1063c8e>] d_kill+0x2d/0x46 [<c1063fa9>] dput+0xdc/0xe4 [<c10571a1>] __fput+0xb0/0xcd [<c1054e49>] filp_close+0x48/0x4f [<c1055ee9>] sys_close+0x67/0xa5 [<c10026b6>] sysenter_past_esp+0x5f/0x85 ======================= Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9 EIP: [<c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:f380be44 Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds, module is already pinned and remove_proc_entry() can't happen => nobody can mark PDE deleted. Dummy proc root in netns code is not marked with refcount 1. AFAICS, we never get it, it's just for proper /proc/net removal. I double checked CLONE_NETNS continues to work. Patch survives many hours of modprobe/rmmod/cat loops without new bugs which can be attributed to refcounting. Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 11月, 2007 1 次提交
-
-
由 Alexey Dobriyan 提交于
proc_kill_inodes() can clear ->i_fop in the middle of vfs_readdir resulting in NULL dereference during "file->f_op->readdir(file, buf, filler)". The solution is to remove proc_kill_inodes() completely: a) we don't have tricky modules implementing their tricky readdir hooks which could keeping this revoke from hell. b) In a situation when module is gone but PDE still alive, standard readdir will return only "." and "..", because pde->next was cleared by remove_proc_entry(). c) the race proc_kill_inode() destined to prevent is not completely fixed, just race window made smaller, because vfs_readdir() is run without sb_lock held and without file_list_lock held. Effectively, ->i_fop is cleared at random moment, which can't fix properly anything. BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018 printing eip: c1061205 *pdpt = 0000000005b22001 *pde = 0000000000000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw sr_mod k8temp cdrom hwmon amd_rng Pid: 2033, comm: find Not tainted (2.6.24-rc1-b1d08ac0 #2) EIP: 0060:[<c1061205>] EFLAGS: 00010246 CPU: 0 EIP is at vfs_readdir+0x47/0x74 EAX: c6b6a780 EBX: 00000000 ECX: c1061040 EDX: c5decf94 ESI: c6b6a780 EDI: fffffffe EBP: c9797c54 ESP: c5decf78 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process find (pid: 2033, ti=c5dec000 task=c64bba90 task.ti=c5dec000) Stack: c5decf94 c1061040 fffffff7 0805ffbc 00000000 c6b6a780 c1061295 0805ffbc 00000000 00000400 00000000 00000004 0805ffbc 4588eff4 c5dec000 c10026ba 00000004 0805ffbc 00000400 0805ffbc 4588eff4 bfdc6c70 000000dc 0000007b Call Trace: [<c1061040>] filldir64+0x0/0xc5 [<c1061295>] sys_getdents64+0x63/0xa5 [<c10026ba>] sysenter_past_esp+0x5f/0x85 ======================= Code: 49 83 78 18 00 74 43 8d 6b 74 bf fe ff ff ff 89 e8 e8 b8 c0 12 00 f6 83 2c 01 00 00 10 75 22 8b 5e 10 8b 4c 24 04 89 f0 8b 14 24 <ff> 53 18 f6 46 1a 04 89 c7 75 0b 8b 56 0c 8b 46 08 e8 c8 66 00 EIP: [<c1061205>] vfs_readdir+0x47/0x74 SS:ESP 0068:c5decf78 hch: "Nice, getting rid of this is a very good step formwards. Unfortunately we have another copy of this junk in security/selinux/selinuxfs.c:sel_remove_entries() which would need the same treatment." Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru> Acked-by: NChristoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 11月, 2007 1 次提交
-
-
由 Eric W. Biederman 提交于
It appears we overlooked support for removing generic proc files when we added support for multiple proc super blocks. Handle that now. [akpm@linux-foundation.org: coding-style cleanups] Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Acked-by: NPavel Emelyanov <xemul@openvz.org> Cc: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: NSukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 10月, 2007 2 次提交
-
-
由 Pavel Emelyanov 提交于
The namespace's proc_mnt must be kern_mount-ed to make this pointer always valid, independently of whether the user space mounted the proc or not. This solves raced in proc_flush_task, etc. with the proc_mnt switching from NULL to not-NULL. The initialization is done after the init's pid is created and hashed to make proc_get_sb() finr it and get for root inode. Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the superblock holds the namespace we must explicitly break this circle to destroy all the stuff. This is done after the init of the namespace dies. Running a few steps forward - when init exits it will kill all its children, so no proc_mnt will be needed after its death. Signed-off-by: NPavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pavel Emelyanov 提交于
Each pid namespace have to be visible through its own proc mount. Thus we need to have per-namespace proc trees with their own superblocks. We cannot easily show different pid namespace via one global proc tree, since each pid refers to different tasks in different namespaces. E.g. pid 1 refers to the init task in the initial namespace and to some other task when seeing from another namespace. Moreover - pid, exisintg in one namespace may not exist in the other. This approach has one move advantage is that the tasks from the init namespace can see what tasks live in another namespace by reading entries from another proc tree. Signed-off-by: NPavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 10月, 2007 1 次提交
-
-
由 Eric W. Biederman 提交于
This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 4月, 2007 1 次提交
-
-
由 Andrew Morton 提交于
We're using #ifdef CONFIG_SYSCTL, but we should be using CONFIG_PROC_SYSCTL, so we get fs/built-in.o: In function `proc_root_init': /usr/src/linux/fs/proc/root.c:83: undefined reference to `proc_sys_init' Fix that up and remove an ifdef-in-C. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Helge Hafting <helgehaf@aitel.hist.no> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 2月, 2007 1 次提交
-
-
由 Eric W. Biederman 提交于
With this change the sysctl inodes can be cached and nothing needs to be done when removing a sysctl table. For a cost of 2K code we will save about 4K of static tables (when we remove de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or about half that on a 32bit arch. The speed feels about the same, even though we can now cache the sysctl dentries :( We get the core advantage that we don't need to have a 1 to 1 mapping between ctl table entries and proc files. Making it possible to have /proc/sys vary depending on the namespace you are in. The currently merged namespaces don't have an issue here but the network namespace under /proc/sys/net needs to have different directories depending on which network adapters are visible. By simply being a cache different directories being visible depending on who you are is trivial to implement. [akpm@osdl.org: fix uninitialised var] [akpm@osdl.org: fix ARM build] [bunk@stusta.de: make things static] Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-