- 01 8月, 2008 1 次提交
-
-
由 Al Viro 提交于
New primitive: alloc_fd(start, flags). get_unused_fd() and get_unused_fd_flags() become wrappers on top of it. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 27 7月, 2008 9 次提交
-
-
由 Al Viro 提交于
* dup2() should return -EBADF on exceeded sysctl_nr_open * dup() should *not* return -EINVAL even if you have rlimit set to 0; it should get -EMFILE instead. Check for orig_start exceeding rlimit taken to sys_fcntl(). Failing expand_files() in dup{2,3}() now gets -EMFILE remapped to -EBADF. Consequently, remaining checks for rlimit are taken to expand_files(). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
* do not pass nameidata; struct path is all the callers want. * switch to new helpers: user_path_at(dfd, pathname, flags, &path) user_path(pathname, &path) user_lpath(pathname, &path) user_path_dir(pathname, &path) (fail if not a directory) The last 3 are trivial macro wrappers for the first one. * remove nameidata in callers. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Almost all users __user_walk_fd() and friends care only about struct path. Get rid of the few that do not. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Miklos Szeredi 提交于
Move the immutable and append-only checks from chmod, chown and utimes into notify_change(). Checks for immutable and append-only files are always performed by the VFS and not by the filesystem (see permission() and may_...() in namei.c), so these belong in notify_change(), and not in inode_change_ok(). This should be completely equivalent. CC: Ulrich Drepper <drepper@redhat.com> CC: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
* MAY_CHDIR is redundant - it's an equivalent of MAY_ACCESS * MAY_ACCESS on fuse should affect only the last step of pathname resolution * fchdir() and chroot() should pass MAY_ACCESS, for the same reason why chdir() needs that. * now that we pass MAY_ACCESS explicitly in all cases, LOOKUP_ACCESS can be removed; it has no business being in nameidata. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
long overdue... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... so we ought to pass MAY_CHDIR to vfs_permission() instead of having it triggered on every step of preceding pathname resolution. LOOKUP_CHDIR is killed by that. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Miklos Szeredi 提交于
vfs_permission(MAY_WRITE) already checked for the inode being immutable, so no need to repeat it. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Acked-by: NChristoph Hellwig <hch@infradead.org>
-
- 25 7月, 2008 1 次提交
-
-
由 Jon Tollefson 提交于
Adds a check for an overflow in the filesystem size so if someone is checking with statfs() on a 16G blocksize hugetlbfs in a 32bit binary that it will report back EOVERFLOW instead of a size of 0. Acked-by: NNishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: NJon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 7月, 2008 1 次提交
-
-
由 Andrew G. Morgan 提交于
This commit includes a bugfix for the fragile setuid fixup code in the case that filesystem capabilities are supported (in access()). The effect of this fix is gated on filesystem capability support because changing securebits is only supported when filesystem capabilities support is configured.) [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NAndrew G. Morgan <morgan@kernel.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Acked-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 5月, 2008 1 次提交
-
-
由 Al Viro 提交于
Initial splitoff of the low-level stuff; taken to fdtable.h Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 28 4月, 2008 1 次提交
-
-
由 Nick Piggin 提交于
Convert XIP to support non-struct page backed memory, using VM_MIXEDMAP for the user mappings. This requires the get_xip_page API to be changed to an address based one. Improve the API layering a little bit too, while we're here. This is required in order to support XIP filesystems on memory that isn't backed with struct page (but memory with struct page is still supported too). Signed-off-by: NNick Piggin <npiggin@suse.de> Acked-by: NCarsten Otte <cotte@de.ibm.com> Cc: Jared Hulbert <jaredeh@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 4月, 2008 7 次提交
-
-
由 Dave Hansen 提交于
There have been a few oopses caused by 'struct file's with NULL f_vfsmnts. There was also a set of potentially missed mnt_want_write()s from dentry_open() calls. This patch provides a very simple debugging framework to catch these kinds of bugs. It will WARN_ON() them, but should stop us from having any oopses or mnt_writer count imbalances. I'm quite convinced that this is a good thing because it found bugs in the stuff I was working on as soon as I wrote it. [hch: made it conditional on a debug option. But it's still a little bit too ugly] [hch: merged forced remount r/o fix from Dave and akpm's fix for the fix] Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
It is OK to let access() go without using a mnt_want/drop_write() pair because it doesn't actually do writes to the filesystem, and it is inherently racy anyway. This is a rare case when it is OK to use __mnt_is_readonly() directly. Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
chown/chmod,etc... don't call permission in the same way that the normal "open for write" calls do. They still write to the filesystem, so bump the write count during these operations. Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
This is the first really tricky patch in the series. It elevates the writer count on a mount each time a non-special file is opened for write. We used to do this in may_open(), but Miklos pointed out that __dentry_open() is used as well to create filps. This will cover even those cases, while a call in may_open() would not have. There is also an elevated count around the vfs_create() call in open_namei(). See the comments for more details, but we need this to fix a 'create, remount, fail r/w open()' race. Some filesystems forego the use of normal vfs calls to create struct files. Make sure that these users elevate the mnt writer count because they will get __fput(), and we need to make sure they're balanced. Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
open_namei() will, in the future, need to take mount write counts over its creation and truncation (via may_open()) operations. It needs to keep these write counts until any potential filp that is created gets __fput()'d. This gets complicated in the error handling and becomes very murky as to how far open_namei() actually got, and whether or not that mount write count was taken. That makes it a bad interface. All that the current do_filp_open() really does is allocate the nameidata on the stack, then call open_namei(). So, this merges those two functions and moves filp_open() over to namei.c so it can be close to its buddy: do_filp_open(). It also gets a kerneldoc comment in the process. Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Dave Hansen 提交于
My end goal here is to make sure all users of may_open() return filps. This will ensure that we properly release mount write counts which were taken for the filp in may_open(). This patch moves the sys_open flags to namei flags calculation into fs/namei.c. We'll shortly be moving the nameidata_to_filp() calls into namei.c, and this gets the sys_open flags to a place where we can get at them when we need them. Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 11 4月, 2008 1 次提交
-
-
由 Roland McGrath 提交于
The prevent_tail_call() macro works around the problem of the compiler clobbering argument words on the stack, which for asmlinkage functions is the caller's (user's) struct pt_regs. The tail/sibling-call optimization is not the only way that the compiler can decide to use stack argument words as scratch space, which we have to prevent. Other optimizations can do it too. Until we have new compiler support to make "asmlinkage" binding on the compiler's own use of the stack argument frame, we have work around all the manifestations of this issue that crop up. More cases seem to be prevented by also keeping the incoming argument variables live at the end of the function. This makes their original stack slots attractive places to leave those variables, so the compiler tends not clobber them for something else. It's still no guarantee, but it handles some observed cases that prevent_tail_call() did not. Signed-off-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 3月, 2008 1 次提交
-
-
由 Christoph Hellwig 提交于
Make sure no-one calls dentry_open with a NULL vfsmount argument and crap out with a stacktrace otherwise. A NULL file->f_vfsmnt has always been problematic, but with the per-mount r/o tracking we can't accept anymore at all. [AV] the last place that passed NULL had been eliminated by the previous patch (reiserfs xattr stuff) Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 2月, 2008 3 次提交
-
-
由 Jan Blunck 提交于
In nearly all cases the set_fs_{root,pwd}() calls work on a struct path. Change the function to reflect this and use path_get() here. Signed-off-by: NJan Blunck <jblunck@suse.de> Signed-off-by: NAndreas Gruenbacher <agruen@suse.de> Acked-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Blunck 提交于
* Add path_put() functions for releasing a reference to the dentry and vfsmount of a struct path in the right order * Switch from path_release(nd) to path_put(&nd->path) * Rename dput_path() to path_put_conditional() [akpm@linux-foundation.org: fix cifs] Signed-off-by: NJan Blunck <jblunck@suse.de> Signed-off-by: NAndreas Gruenbacher <agruen@suse.de> Acked-by: NChristoph Hellwig <hch@lst.de> Cc: <linux-fsdevel@vger.kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Steven French <sfrench@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Blunck 提交于
This is the central patch of a cleanup series. In most cases there is no good reason why someone would want to use a dentry for itself. This series reflects that fact and embeds a struct path into nameidata. Together with the other patches of this series - it enforced the correct order of getting/releasing the reference count on <dentry,vfsmount> pairs - it prepares the VFS for stacking support since it is essential to have a struct path in every place where the stack can be traversed - it reduces the overall code size: without patch series: text data bss dec hex filename 5321639 858418 715768 6895825 6938d1 vmlinux with patch series: text data bss dec hex filename 5320026 858418 715768 6894212 693284 vmlinux This patch: Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix cifs] [akpm@linux-foundation.org: fix smack] Signed-off-by: NJan Blunck <jblunck@suse.de> Signed-off-by: NAndreas Gruenbacher <agruen@suse.de> Acked-by: NChristoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 2月, 2008 2 次提交
-
-
由 Arjan van de Ven 提交于
These exports (which aren't used and which are in fact dangerous to use because they pretty much form a security hole to use) have been marked _UNUSED since 2.6.24 with removal in 2.6.25. This patch is their final departure from the Linux kernel tree. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Harvey Harrison 提交于
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 11月, 2007 1 次提交
-
-
由 Arjan van de Ven 提交于
sys_open / sys_read were used in the early 1.2 days to load firmware from disk inside drivers. Since 2.0 or so this was deprecated behavior, but several drivers still were using this. Since a few years we have a request_firmware() API that implements this in a nice, consistent way. Only some old ISA sound drivers (pre-ALSA) still straggled along for some time.... however with commit c2b1239a the last user is now gone. This is a good thing, since using sys_open / sys_read etc for firmware is a very buggy to dangerous thing to do; these operations put an fd in the process file descriptor table.... which then can be tampered with from other threads for example. For those who don't want the firmware loader, filp_open()/vfs_read are the better APIs to use, without this security issue. The patch below marks sys_open and sys_read unused now that they're really not used anymore, and for deletion in the 2.6.25 timeframe. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 10月, 2007 1 次提交
-
-
由 Al Viro 提交于
makes caller simpler *and* allows to scan ancestors Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 17 10月, 2007 3 次提交
-
-
由 Serge E. Hallyn 提交于
Implement file posix capabilities. This allows programs to be given a subset of root's powers regardless of who runs them, without having to use setuid and giving the binary all of root's powers. This version works with Kaigai Kohei's userspace tools, found at http://www.kaigai.gr.jp/index.php. For more information on how to use this patch, Chris Friedhoff has posted a nice page at http://www.friedhoff.org/fscaps.html. Changelog: Nov 27: Incorporate fixes from Andrew Morton (security-introduce-file-caps-tweaks and security-introduce-file-caps-warning-fix) Fix Kconfig dependency. Fix change signaling behavior when file caps are not compiled in. Nov 13: Integrate comments from Alexey: Remove CONFIG_ ifdef from capability.h, and use %zd for printing a size_t. Nov 13: Fix endianness warnings by sparse as suggested by Alexey Dobriyan. Nov 09: Address warnings of unused variables at cap_bprm_set_security when file capabilities are disabled, and simultaneously clean up the code a little, by pulling the new code into a helper function. Nov 08: For pointers to required userspace tools and how to use them, see http://www.friedhoff.org/fscaps.html. Nov 07: Fix the calculation of the highest bit checked in check_cap_sanity(). Nov 07: Allow file caps to be enabled without CONFIG_SECURITY, since capabilities are the default. Hook cap_task_setscheduler when !CONFIG_SECURITY. Move capable(TASK_KILL) to end of cap_task_kill to reduce audit messages. Nov 05: Add secondary calls in selinux/hooks.c to task_setioprio and task_setscheduler so that selinux and capabilities with file cap support can be stacked. Sep 05: As Seth Arnold points out, uid checks are out of place for capability code. Sep 01: Define task_setscheduler, task_setioprio, cap_task_kill, and task_setnice to make sure a user cannot affect a process in which they called a program with some fscaps. One remaining question is the note under task_setscheduler: are we ok with CAP_SYS_NICE being sufficient to confine a process to a cpuset? It is a semantic change, as without fsccaps, attach_task doesn't allow CAP_SYS_NICE to override the uid equivalence check. But since it uses security_task_setscheduler, which elsewhere is used where CAP_SYS_NICE can be used to override the uid equivalence check, fixing it might be tough. task_setscheduler note: this also controls cpuset:attach_task. Are we ok with CAP_SYS_NICE being used to confine to a cpuset? task_setioprio task_setnice sys_setpriority uses this (through set_one_prio) for another process. Need same checks as setrlimit Aug 21: Updated secureexec implementation to reflect the fact that euid and uid might be the same and nonzero, but the process might still have elevated caps. Aug 15: Handle endianness of xattrs. Enforce capability version match between kernel and disk. Enforce that no bits beyond the known max capability are set, else return -EPERM. With this extra processing, it may be worth reconsidering doing all the work at bprm_set_security rather than d_instantiate. Aug 10: Always call getxattr at bprm_set_security, rather than caching it at d_instantiate. [morgan@kernel.org: file-caps clean up for linux/capability.h] [bunk@kernel.org: unexport cap_inode_killpriv] Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Andrew Morgan <morgan@kernel.org> Signed-off-by: NAndrew Morgan <morgan@kernel.org> Signed-off-by: NAdrian Bunk <bunk@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alan Cox 提交于
The early LFS work that Linux uses favours EFBIG in various places. SuSv3 specifically uses EOVERFLOW for this as noted by Michael (Bug 7253) [EOVERFLOW] The named file is a regular file and the size of the file cannot be represented correctly in an object of type off_t. We should therefore transition to the proper error return code Signed-off-by: NAlan Cox <alan@redhat.com> Cc: Theodore Tso <tytso@mit.edu> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yuichi Nakamura 提交于
It reduces the selinux overhead on read/write by only revalidating permissions in selinux_file_permission if the task or inode labels have changed or the policy has changed since the open-time check. A new LSM hook, security_dentry_open, is added to capture the necessary state at open time to allow this optimization. (see http://marc.info/?l=selinux&m=118972995207740&w=2) Signed-off-by: Yuichi Nakamura<ynakam@hitachisoft.jp> Acked-by: NStephen Smalley <sds@tycho.nsa.gov> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 01 8月, 2007 1 次提交
-
-
由 david m. richter 提交于
It is possible that another process could acquire a new file lease right after break_lease() is called during a truncate, but before lease-granting is disabled by the subsequent get_write_access(). Merely switching the order of the break_lease() and get_write_access() calls prevents this race. Signed-off-by: NDavid M. Richter <richterd@citi.umich.edu> Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 7月, 2007 1 次提交
-
-
由 Ulrich Drepper 提交于
The fallocate syscall returns ENOSYS in case the filesystem does not support the operation and expects the userlevel code to fill in. This is good in concept. The problem is that the libc code for old kernels should be able to distinguish the case where the syscall is not at all available vs not functioning for a specific mount point. As is this is not possible and we always have to invoke the syscall even if the kernel doesn't support it. I suggest the following patch. Using EOPNOTSUPP is IMO the right thing to do. Cc: Amit Arora <aarora@in.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 7月, 2007 1 次提交
-
-
由 Amit Arora 提交于
fallocate() is a new system call being proposed here which will allow applications to preallocate space to any file(s) in a file system. Each file system implementation that wants to use this feature will need to support an inode operation called ->fallocate(). Applications can use this feature to avoid fragmentation to certain level and thus get faster access speed. With preallocation, applications also get a guarantee of space for particular file(s) - even if later the the system becomes full. Currently, glibc provides an interface called posix_fallocate() which can be used for similar cause. Though this has the advantage of working on all file systems, but it is quite slow (since it writes zeroes to each block that has to be preallocated). Without a doubt, file systems can do this more efficiently within the kernel, by implementing the proposed fallocate() system call. It is expected that posix_fallocate() will be modified to call this new system call first and incase the kernel/filesystem does not implement it, it should fall back to the current implementation of writing zeroes to the new blocks. ToDos: 1. Implementation on other architectures (other than i386, x86_64, and ppc). Patches for s390(x) and ia64 are already available from previous posts, but it was decided that they should be added later once fallocate is in the mainline. Hence not including those patches in this take. 2. Changes to glibc, a) to support fallocate() system call b) to make posix_fallocate() and posix_fallocate64() call fallocate() Signed-off-by: NAmit Arora <aarora@in.ibm.com>
-
- 17 7月, 2007 2 次提交
-
-
由 Ulrich Drepper 提交于
Part two in the O_CLOEXEC saga: adding support for file descriptors received through Unix domain sockets. The patch is once again pretty minimal, it introduces a new flag for recvmsg and passes it just like the existing MSG_CMSG_COMPAT flag. I think this bit is not used otherwise but the networking people will know better. This new flag is not recognized by recvfrom and recv. These functions cannot be used for that purpose and the asymmetry this introduces is not worse than the already existing MSG_CMSG_COMPAT situations. The patch must be applied on the patch which introduced O_CLOEXEC. It has to remove static from the new get_unused_fd_flags function but since scm.c cannot live in a module the function still hasn't to be exported. Here's a test program to make sure the code works. It's so much longer than the actual patch... #include <errno.h> #include <error.h> #include <fcntl.h> #include <stdio.h> #include <string.h> #include <unistd.h> #include <sys/socket.h> #include <sys/un.h> #ifndef O_CLOEXEC # define O_CLOEXEC 02000000 #endif #ifndef MSG_CMSG_CLOEXEC # define MSG_CMSG_CLOEXEC 0x40000000 #endif int main (int argc, char *argv[]) { if (argc > 1) { int fd = atol (argv[1]); printf ("child: fd = %d\n", fd); if (fcntl (fd, F_GETFD) == 0 || errno != EBADF) { puts ("file descriptor valid in child"); return 1; } return 0; } struct sockaddr_un sun; strcpy (sun.sun_path, "./testsocket"); sun.sun_family = AF_UNIX; char databuf[] = "hello"; struct iovec iov[1]; iov[0].iov_base = databuf; iov[0].iov_len = sizeof (databuf); union { struct cmsghdr hdr; char bytes[CMSG_SPACE (sizeof (int))]; } buf; struct msghdr msg = { .msg_iov = iov, .msg_iovlen = 1, .msg_control = buf.bytes, .msg_controllen = sizeof (buf) }; struct cmsghdr *cmsg = CMSG_FIRSTHDR (&msg); cmsg->cmsg_level = SOL_SOCKET; cmsg->cmsg_type = SCM_RIGHTS; cmsg->cmsg_len = CMSG_LEN (sizeof (int)); msg.msg_controllen = cmsg->cmsg_len; pid_t child = fork (); if (child == -1) error (1, errno, "fork"); if (child == 0) { int sock = socket (PF_UNIX, SOCK_STREAM, 0); if (sock < 0) error (1, errno, "socket"); if (bind (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0) error (1, errno, "bind"); if (listen (sock, SOMAXCONN) < 0) error (1, errno, "listen"); int conn = accept (sock, NULL, NULL); if (conn == -1) error (1, errno, "accept"); *(int *) CMSG_DATA (cmsg) = sock; if (sendmsg (conn, &msg, MSG_NOSIGNAL) < 0) error (1, errno, "sendmsg"); return 0; } /* For a test suite this should be more robust like a barrier in shared memory. */ sleep (1); int sock = socket (PF_UNIX, SOCK_STREAM, 0); if (sock < 0) error (1, errno, "socket"); if (connect (sock, (struct sockaddr *) &sun, sizeof (sun)) < 0) error (1, errno, "connect"); unlink (sun.sun_path); *(int *) CMSG_DATA (cmsg) = -1; if (recvmsg (sock, &msg, MSG_CMSG_CLOEXEC) < 0) error (1, errno, "recvmsg"); int fd = *(int *) CMSG_DATA (cmsg); if (fd == -1) error (1, 0, "no descriptor received"); char fdname[20]; snprintf (fdname, sizeof (fdname), "%d", fd); execl ("/proc/self/exe", argv[0], fdname, NULL); puts ("execl failed"); return 1; } [akpm@linux-foundation.org: Fix fastcall inconsistency noted by Michael Buesch] [akpm@linux-foundation.org: build fix] Signed-off-by: NUlrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Michael Buesch <mb@bu3sch.de> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ulrich Drepper 提交于
The problem is as follows: in multi-threaded code (or more correctly: all code using clone() with CLONE_FILES) we have a race when exec'ing. thread #1 thread #2 fd=open() fork + exec fcntl(fd,F_SETFD,FD_CLOEXEC) In some applications this can happen frequently. Take a web browser. One thread opens a file and another thread starts, say, an external PDF viewer. The result can even be a security issue if that open file descriptor refers to a sensitive file and the external program can somehow be tricked into using that descriptor. Just adding O_CLOEXEC support to open() doesn't solve the whole set of problems. There are other ways to create file descriptors (socket, epoll_create, Unix domain socket transfer, etc). These can and should be addressed separately though. open() is such an easy case that it makes not much sense putting the fix off. The test program: #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <unistd.h> #ifndef O_CLOEXEC # define O_CLOEXEC 02000000 #endif int main (int argc, char *argv[]) { int fd; if (argc > 1) { fd = atol (argv[1]); printf ("child: fd = %d\n", fd); if (fcntl (fd, F_GETFD) == 0 || errno != EBADF) { puts ("file descriptor valid in child"); return 1; } return 0; } fd = open ("/proc/self/exe", O_RDONLY | O_CLOEXEC); printf ("in parent: new fd = %d\n", fd); char buf[20]; snprintf (buf, sizeof (buf), "%d", fd); execl ("/proc/self/exe", argv[0], buf, NULL); puts ("execl failed"); return 1; } [kyle@parisc-linux.org: parisc fix] Signed-off-by: NUlrich Drepper <drepper@redhat.com> Acked-by: NIngo Molnar <mingo@elte.hu> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Chris Zankel <chris@zankel.net> Signed-off-by: NKyle McMartin <kyle@parisc-linux.org> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 5月, 2007 2 次提交
-
-
由 Linus Torvalds 提交于
.. to match what we do on write(). This way, people who write to files by using [f]truncate + writable mmap have the same semantics as if they were using the write() family of system calls. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Randy Dunlap 提交于
Remove includes of <linux/smp_lock.h> where it is not used/needed. Suggested by Al Viro. Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc, sparc64, and arm (all 59 defconfigs). Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-