- 23 1月, 2015 1 次提交
-
-
由 Al Viro 提交于
... and don't bother with new struct filename when we already have one Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 14 12月, 2014 1 次提交
-
-
由 David Drysdale 提交于
This patchset adds execveat(2) for x86, and is derived from Meredydd Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528). The primary aim of adding an execveat syscall is to allow an implementation of fexecve(3) that does not rely on the /proc filesystem, at least for executables (rather than scripts). The current glibc version of fexecve(3) is implemented via /proc, which causes problems in sandboxed or otherwise restricted environments. Given the desire for a /proc-free fexecve() implementation, HPA suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be an appropriate generalization. Also, having a new syscall means that it can take a flags argument without back-compatibility concerns. The current implementation just defines the AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be added in future -- for example, flags for new namespaces (as suggested at https://lkml.org/lkml/2006/7/11/474). Related history: - https://lkml.org/lkml/2006/12/27/123 is an example of someone realizing that fexecve() is likely to fail in a chroot environment. - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered documenting the /proc requirement of fexecve(3) in its manpage, to "prevent other people from wasting their time". - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a problem where a process that did setuid() could not fexecve() because it no longer had access to /proc/self/fd; this has since been fixed. This patch (of 4): Add a new execveat(2) system call. execveat() is to execve() as openat() is to open(): it takes a file descriptor that refers to a directory, and resolves the filename relative to that. In addition, if the filename is empty and AT_EMPTY_PATH is specified, execveat() executes the file to which the file descriptor refers. This replicates the functionality of fexecve(), which is a system call in other UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and so relies on /proc being mounted). The filename fed to the executed program as argv[0] (or the name of the script fed to a script interpreter) will be of the form "/dev/fd/<fd>" (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively reflecting how the executable was found. This does however mean that execution of a script in a /proc-less environment won't work; also, script execution via an O_CLOEXEC file descriptor fails (as the file will not be accessible after exec). Based on patches by Meredydd Luff. Signed-off-by: NDavid Drysdale <drysdale@google.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: Shuah Khan <shuah.kh@samsung.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rich Felker <dalias@aerifal.cx> Cc: Christoph Hellwig <hch@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 12月, 2014 4 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
All callers of path_init() proceed to do the identical cleanup when they are done with nameidata. Don't open-code it... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 11 12月, 2014 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 31 10月, 2014 1 次提交
-
-
由 Eric Rannaud 提交于
The man page for open(2) indicates that when O_CREAT is specified, the 'mode' argument applies only to future accesses to the file: Note that this mode applies only to future accesses of the newly created file; the open() call that creates a read-only file may well return a read/write file descriptor. The man page for open(2) implies that 'mode' is treated identically by O_CREAT and O_TMPFILE. O_TMPFILE, however, behaves differently: int fd = open("/tmp", O_TMPFILE | O_RDWR, 0); assert(fd == -1); assert(errno == EACCES); int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600); assert(fd > 0); For O_CREAT, do_last() sets acc_mode to MAY_OPEN only: if (*opened & FILE_CREATED) { /* Don't check for write permission, don't truncate */ open_flag &= ~O_TRUNC; will_truncate = false; acc_mode = MAY_OPEN; path_to_nameidata(path, nd); goto finish_open_created; } But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to may_open(). This patch lines up the behavior of O_TMPFILE with O_CREAT. After the inode is created, may_open() is called with acc_mode = MAY_OPEN, in do_tmpfile(). A different, but related glibc bug revealed the discrepancy: https://sourceware.org/bugzilla/show_bug.cgi?id=17523 The glibc lazily loads the 'mode' argument of open() and openat() using va_arg() only if O_CREAT is present in 'flags' (to support both the 2 argument and the 3 argument forms of open; same idea for openat()). However, the glibc ignores the 'mode' argument if O_TMPFILE is in 'flags'. On x86_64, for open(), it magically works anyway, as 'mode' is in RDX when entering open(), and is still in RDX on SYSCALL, which is where the kernel looks for the 3rd argument of a syscall. But openat() is not quite so lucky: 'mode' is in RCX when entering the glibc wrapper for openat(), while the kernel looks for the 4th argument of a syscall in R10. Indeed, the syscall calling convention differs from the regular calling convention in this respect on x86_64. So the kernel sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and fails with EACCES. Signed-off-by: NEric Rannaud <e@nanocritical.com> Acked-by: NAndy Lutomirski <luto@amacapital.net> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 10月, 2014 1 次提交
-
-
由 Miklos Szeredi 提交于
In an overlay directory that shadows an empty lower directory, say /mnt/a/empty102, do: touch /mnt/a/empty102/x unlink /mnt/a/empty102/x rmdir /mnt/a/empty102 It's actually harmless, but needs another level of nesting between I_MUTEX_CHILD and I_MUTEX_NORMAL. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Tested-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 24 10月, 2014 5 次提交
-
-
由 Miklos Szeredi 提交于
This adds a new RENAME_WHITEOUT flag. This flag makes rename() create a whiteout of source. The whiteout creation is atomic relative to the rename. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Whiteout isn't actually a new file type, but is represented as a char device (Linus's idea) with 0/0 device number. This has several advantages compared to introducing a new whiteout file type: - no userspace API changes (e.g. trivial to make backups of upper layer filesystem, without losing whiteouts) - no fs image format changes (you can boot an old kernel/fsck without whiteout support and things won't break) - implementation is trivial Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
It's already duplicated in btrfs and about to be used in overlayfs too. Move the sticky bit check to an inline helper and call the out-of-line helper only in the unlikly case of the sticky bit being set. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
We need to be able to check inode permissions (but not filesystem implied permissions) for stackable filesystems. Expose this interface for overlayfs. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Add a new inode operation i_op->dentry_open(). This is for stacked filesystems that want to return a struct file from a different filesystem. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
- 13 10月, 2014 1 次提交
-
-
由 Al Viro 提交于
As it is, path_lookupat() and path_mounpoint() might end up leaking struct file reference in some cases. Spotted-by: NEric Biggers <ebiggers3@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 09 10月, 2014 3 次提交
-
-
由 Eric W. Biederman 提交于
Now that d_invalidate can no longer fail, stop returning a useless return code. For the few callers that checked the return code update remove the handling of d_invalidate failure. Reviewed-by: NMiklos Szeredi <miklos@szeredi.hu> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric W. Biederman 提交于
With the introduction of mount namespaces and bind mounts it became possible to access files and directories that on some paths are mount points but are not mount points on other paths. It is very confusing when rm -rf somedir returns -EBUSY simply because somedir is mounted somewhere else. With the addition of user namespaces allowing unprivileged mounts this condition has gone from annoying to allowing a DOS attack on other users in the system. The possibility for mischief is removed by updating the vfs to support rename, unlink and rmdir on a dentry that is a mountpoint and by lazily unmounting mountpoints on deleted dentries. In particular this change allows rename, unlink and rmdir system calls on a dentry without a mountpoint in the current mount namespace to succeed, and it allows rename, unlink, and rmdir performed on a distributed filesystem to update the vfs cache even if when there is a mount in some namespace on the original dentry. There are two common patterns of maintaining mounts: Mounts on trusted paths with the parent directory of the mount point and all ancestory directories up to / owned by root and modifiable only by root (i.e. /media/xxx, /dev, /dev/pts, /proc, /sys, /sys/fs/cgroup/{cpu, cpuacct, ...}, /usr, /usr/local). Mounts on unprivileged directories maintained by fusermount. In the case of mounts in trusted directories owned by root and modifiable only by root the current parent directory permissions are sufficient to ensure a mount point on a trusted path is not removed or renamed by anyone other than root, even if there is a context where the there are no mount points to prevent this. In the case of mounts in directories owned by less privileged users races with users modifying the path of a mount point are already a danger. fusermount already uses a combination of chdir, /proc/<pid>/fd/NNN, and UMOUNT_NOFOLLOW to prevent these races. The removable of global rename, unlink, and rmdir protection really adds nothing new to consider only a widening of the attack window, and fusermount is already safe against unprivileged users modifying the directory simultaneously. In principle for perfect userspace programs returning -EBUSY for unlink, rmdir, and rename of dentires that have mounts in the local namespace is actually unnecessary. Unfortunately not all userspace programs are perfect so retaining -EBUSY for unlink, rmdir and rename of dentries that have mounts in the current mount namespace plays an important role of maintaining consistency with historical behavior and making imperfect userspace applications hard to exploit. v2: Remove spurious old_dentry. v3: Optimized shrink_submounts_and_drop Removed unsued afs label v4: Simplified the changes to check_submounts_and_drop Do not rename check_submounts_and_drop shrink_submounts_and_drop Document what why we need atomicity in check_submounts_and_drop Rely on the parent inode mutex to make d_revalidate and d_invalidate an atomic unit. v5: Refcount the mountpoint to detach in case of simultaneous renames. Reviewed-by: NMiklos Szeredi <miklos@szeredi.hu> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric W. Biederman 提交于
In preparation for allowing mountpoints to be renamed and unlinked in remote filesystems and in other mount namespaces test if on a dentry there is a mount in the local mount namespace before allowing it to be renamed or unlinked. The primary motivation here are old versions of fusermount unmount which is not safe if the a path can be renamed or unlinked while it is verifying the mount is safe to unmount. More recent versions are simpler and safer by simply using UMOUNT_NOFOLLOW when unmounting a mount in a directory owned by an arbitrary user. Miklos Szeredi <miklos@szeredi.hu> reports this is approach is good enough to remove concerns about new kernels mixed with old versions of fusermount. A secondary motivation for restrictions here is that it removing empty directories that have non-empty mount points on them appears to violate the rule that rmdir can not remove empty directories. As Linus Torvalds pointed out this is useful for programs (like git) that test if a directory is empty with rmdir. Therefore this patch arranges to enforce the existing mount point semantics for local mount namespace. v2: Rewrote the test to be a drop in replacement for d_mountpoint v3: Use bool instead of int as the return type of is_local_mountpoint Reviewed-by: NMiklos Szeredi <miklos@szeredi.hu> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 9月, 2014 2 次提交
-
-
由 James Hogan 提交于
Commit d6bb3e90 ("vfs: simplify and shrink stack frame of link_path_walk()") introduced build problems with GCC versions older than 4.6 due to the initialisation of a member of an anonymous union in struct qstr without enclosing braces. This hits GCC bug 10676 [1] (which was fixed in GCC 4.6 by [2]), and causes the following build error: fs/namei.c: In function 'link_path_walk': fs/namei.c:1778: error: unknown field 'hash_len' specified in initializer This is worked around by adding explicit braces. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676 [2] https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=159206 Fixes: d6bb3e90 (vfs: simplify and shrink stack frame of link_path_walk()) Signed-off-by: NJames Hogan <james.hogan@imgtec.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-metag@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Commit 9226b5b4 ("vfs: avoid non-forwarding large load after small store in path lookup") made link_path_walk() always access the "hash_len" field as a single 64-bit entity, in order to avoid mixed size accesses to the members. However, what I didn't notice was that that effectively means that the whole "struct qstr this" is now basically redundant. We already explicitly track the "const char *name", and if we just use "u64 hash_len" instead of "long len", there is nothing else left of the "struct qstr". We do end up wanting the "struct qstr" if we have a filesystem with a "d_hash()" function, but that's a rare case, and we might as well then just squirrell away the name and hash_len at that point. End result: fewer live variables in the loop, a smaller stack frame, and better code generation. And we don't need to pass in pointers variables to helper functions any more, because the return value contains all the relevant information. So this removes more lines than it adds, and the source code is clearer too. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 9月, 2014 3 次提交
-
-
由 Linus Torvalds 提交于
The performance regression that Josef Bacik reported in the pathname lookup (see commit 99d263d4 "vfs: fix bad hashing of dentries") made me look at performance stability of the dcache code, just to verify that the problem was actually fixed. That turned up a few other problems in this area. There are a few cases where we exit RCU lookup mode and go to the slow serializing case when we shouldn't, Al has fixed those and they'll come in with the next VFS pull. But my performance verification also shows that link_path_walk() turns out to have a very unfortunate 32-bit store of the length and hash of the name we look up, followed by a 64-bit read of the combined hash_len field. That screws up the processor store to load forwarding, causing an unnecessary hickup in this critical routine. It's caused by the ugly calling convention for the "hash_name()" function, and easily fixed by just making hash_name() fill in the whole 'struct qstr' rather than passing it a pointer to just the hash value. With that, the profile for this function looks much smoother. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Al Viro 提交于
in the former we simply check if dentry is still valid after picking its ->d_inode; in the latter we fetch ->d_inode in the same places where we fetch dentry and its ->d_seq, under the same checks. Cc: stable@vger.kernel.org # 2.6.38+ Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
return the value instead, and have path_init() do the assignment. Broken by "vfs: Fix absolute RCU path walk failures due to uninitialized seq number", which was Cc-stable with 2.6.38+ as destination. This one should go where it went. To avoid dummy value returned in case when root is already set (it would do no harm, actually, since the only caller that doesn't ignore the return value is guaranteed to have nd->root *not* set, but it's more obvious that way), lift the check into callers. And do the same to set_root(), to keep them in sync. Cc: stable@vger.kernel.org # 2.6.38+ Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 14 9月, 2014 2 次提交
-
-
由 Al Viro 提交于
read_seqretry() returns true on mismatch, not on match... Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Linus Torvalds 提交于
Josef Bacik found a performance regression between 3.2 and 3.10 and narrowed it down to commit bfcfaa77 ("vfs: use 'unsigned long' accesses for dcache name comparison and hashing"). He reports: "The test case is essentially for (i = 0; i < 1000000; i++) mkdir("a$i"); On xfs on a fio card this goes at about 20k dir/sec with 3.2, and 12k dir/sec with 3.10. This is because we spend waaaaay more time in __d_lookup on 3.10 than in 3.2. The new hashing function for strings is suboptimal for < sizeof(unsigned long) string names (and hell even > sizeof(unsigned long) string names that I've tested). I broke out the old hashing function and the new one into a userspace helper to get real numbers and this is what I'm getting: Old hash table had 1000000 entries, 0 dupes, 0 max dupes New hash table had 12628 entries, 987372 dupes, 900 max dupes We had 11400 buckets with a p50 of 30 dupes, p90 of 240 dupes, p99 of 567 dupes for the new hash My test does the hash, and then does the d_hash into a integer pointer array the same size as the dentry hash table on my system, and then just increments the value at the address we got to see how many entries we overlap with. As you can see the old hash function ended up with all 1 million entries in their own bucket, whereas the new one they are only distributed among ~12.5k buckets, which is why we're using so much more CPU in __d_lookup". The reason for this hash regression is two-fold: - On 64-bit architectures the down-mixing of the original 64-bit word-at-a-time hash into the final 32-bit hash value is very simplistic and suboptimal, and just adds the two 32-bit parts together. In particular, because there is no bit shuffling and the mixing boundary is also a byte boundary, similar character patterns in the low and high word easily end up just canceling each other out. - the old byte-at-a-time hash mixed each byte into the final hash as it hashed the path component name, resulting in the low bits of the hash generally being a good source of hash data. That is not true for the word-at-a-time case, and the hash data is distributed among all the bits. The fix is the same in both cases: do a better job of mixing the bits up and using as much of the hash data as possible. We already have the "hash_32|64()" functions to do that. Reported-by: NJosef Bacik <jbacik@fb.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chris Mason <clm@fb.com> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 9月, 2014 1 次提交
-
-
由 Dmitry Kasatkin 提交于
Empty files and missing xattrs do not guarantee that a file was just created. This patch passes FILE_CREATED flag to IMA to reliably identify new files. Signed-off-by: NDmitry Kasatkin <d.kasatkin@samsung.com> Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> 3.14+
-
- 08 8月, 2014 3 次提交
-
-
由 J. Bruce Fields 提交于
Looks like the directory loop check is actually done in renameat? Whatever, leave this out rather than trying to keep it up to date with the code. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 NeilBrown 提交于
In REF-walk mode, ->d_manage can return -EISDIR to indicate that the dentry is not really a mount trap (or even a mount point) and that any mounts or any DCACHE_NEED_AUTOMOUNT flag should be ignored. RCU-walk mode doesn't currently support this, so if there is a dentry with DCACHE_NEED_AUTOMOUNT set but which shouldn't be a mount-trap, lookup_fast() will always drop in REF-walk mode. With this patch, an -EISDIR from ->d_manage will always cause mounts and automounts to be ignored, both in REF-walk and RCU-walk. Bug-fixed-by: NDan Carpenter <dan.carpenter@oracle.com> Cc: Ian Kent <raven@themaw.net> Signed-off-by: NNeilBrown <neilb@suse.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Miklos Szeredi 提交于
Christoph Hellwig suggests: 1) make vfs_rename call ->rename2 if it exists instead of ->rename 2) switch all filesystems that you're adding NOREPLACE support for to use ->rename2 3) see how many ->rename instances we'll have left after a few iterations of 2. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 24 7月, 2014 1 次提交
-
-
由 Vasily Averin 提交于
Currently umount on symlink blocks following umount: /vz is separate mount # ls /vz/ -al | grep test drwxr-xr-x. 2 root root 4096 Jul 19 01:14 testdir lrwxrwxrwx. 1 root root 11 Jul 19 01:16 testlink -> /vz/testdir # umount -l /vz/testlink umount: /vz/testlink: not mounted (expected) # lsof /vz # umount /vz umount: /vz: device is busy. (unexpected) In this case mountpoint_last() gets an extra refcount on path->mnt Signed-off-by: NVasily Averin <vvs@openvz.org> Acked-by: NIan Kent <raven@themaw.net> Acked-by: NJeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 11 6月, 2014 1 次提交
-
-
由 Andy Lutomirski 提交于
The kernel has no concept of capabilities with respect to inodes; inodes exist independently of namespaces. For example, inode_capable(inode, CAP_LINUX_IMMUTABLE) would be nonsense. This patch changes inode_capable to check for uid and gid mappings and renames it to capable_wrt_inode_uidgid, which should make it more obvious what it does. Fixes CVE-2014-4014. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Chinner <david@fromorbit.com> Cc: stable@vger.kernel.org Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 4月, 2014 1 次提交
-
-
由 Al Viro 提交于
in non-lazy walk we need to be careful about dentry switching from negative to positive - both ->d_flags and ->d_inode are updated, and in some places we might see only one store. The cases where dentry has been obtained by dcache lookup with ->i_mutex held on parent are safe - ->d_lock and ->i_mutex provide all the barriers we need. However, there are several places where we run into trouble: * do_last() fetches ->d_inode, then checks ->d_flags and assumes that inode won't be NULL unless d_is_negative() is true. Race with e.g. creat() - we might have fetched the old value of ->d_inode (still NULL) and new value of ->d_flags (already not DCACHE_MISS_TYPE). Lin Ming has observed and reported the resulting oops. * a bunch of places checks ->d_inode for being non-NULL, then checks ->d_flags for "is it a symlink". Race with symlink(2) in case if our CPU sees ->d_inode update first - we see non-NULL there, but ->d_flags still contains DCACHE_MISS_TYPE instead of DCACHE_SYMLINK_TYPE. Result: false negative on "should we follow link here?", with subsequent unpleasantness. Cc: stable@vger.kernel.org # 3.13 and 3.14 need that one Reported-and-tested-by: NLin Ming <minggr@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 02 4月, 2014 3 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 01 4月, 2014 5 次提交
-
-
由 Miklos Szeredi 提交于
If flags contain RENAME_EXCHANGE then exchange source and destination files. There's no restriction on the type of the files; e.g. a directory can be exchanged with a symlink. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Miklos Szeredi 提交于
Add flags to security_path_rename() and security_inode_rename() hooks. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Miklos Szeredi 提交于
If this flag is specified and the target of the rename exists then the rename syscall fails with EEXIST. The VFS does the existence checking, so it is trivial to enable for most local filesystems. This patch only enables it in ext4. For network filesystems the VFS check is not enough as there may be a race between a remote create and the rename, so these filesystems need to handle this flag in their ->rename() implementations to ensure atomicity. Andy writes about why this is useful: "The trivial answer: to eliminate the race condition from 'mv -i'. Another answer: there's a common pattern to atomically create a file with contents: open a temporary file, write to it, optionally fsync it, close it, then link(2) it to the final name, then unlink the temporary file. The reason to use link(2) is because it won't silently clobber the destination. This is annoying: - It requires an extra system call that shouldn't be necessary. - It doesn't work on (IMO sensible) filesystems that don't support hard links (e.g. vfat). - It's not atomic -- there's an intermediate state where both files exist. - It's ugly. The new rename flag will make this totally sensible. To be fair, on new enough kernels, you can also use O_TMPFILE and linkat to achieve the same thing even more cleanly." Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Miklos Szeredi 提交于
Add new renameat2 syscall, which is the same as renameat with an added flags argument. Pass flags to vfs_rename() and to i_op->rename() as well. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Miklos Szeredi 提交于
There's actually very little difference between vfs_rename_dir() and vfs_rename_other() so move both inline into vfs_rename() which still stays reasonably readable. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
-