提交 · 5a460275ef3c14602040e5dc581a0d8771ce6b43 · openeuler / Kernel

11 5月, 2015 14 次提交

A
namei: expand nested_symlink() in its only caller · 5a460275
由 Al Viro 提交于 4月 17, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5a460275

do_last: move path there from caller's stack frame · 896475d5

由 Al Viro 提交于 4月 22, 2015

We used to need it to feed to follow_link().  No more...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

896475d5

namei: introduce nameidata->link · caa85634

由 Al Viro 提交于 4月 22, 2015

shares space with nameidata->next, walk_component() et.al. store
the struct path of symlink instead of returning it into a variable
passed by caller.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

caa85634

namei: don't bother with ->follow_link() if ->i_link is set · d4dee48b

由 Al Viro 提交于 4月 30, 2015

with new calling conventions it's trivial
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

Conflicts:
	fs/namei.c

d4dee48b

namei.c: separate the parts of follow_link() that find the link body · 0a959df5

由 Al Viro 提交于 4月 18, 2015

Split a piece of fs/namei.c:follow_link() that does obtaining the link
body into a separate function.  follow_link() itself is converted to
calling get_link() and then doing the body traversal (if any).

The next step will expand follow_link() call in link_path_walk()
and this helps to keep the size down...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0a959df5

new ->follow_link() and ->put_link() calling conventions · 680baacb

由 Al Viro 提交于 5月 02, 2015

a) instead of storing the symlink body (via nd_set_link()) and returning
an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
that opaque pointer (into void * passed by address by caller) and returns
the symlink body.  Returning ERR_PTR() on error, NULL on jump (procfs magic
symlinks) and pointer to symlink body for normal symlinks.  Stored pointer
is ignored in all cases except the last one.

Storing NULL for opaque pointer (or not storing it at all) means no call
of ->put_link().

b) the body used to be passed to ->put_link() implicitly (via nameidata).
Now only the opaque pointer is.  In the cases when we used the symlink body
to free stuff, ->follow_link() now should store it as opaque pointer in addition
to returning it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

680baacb

namei: lift nameidata into filename_mountpoint() · 46afd6f6

由 Al Viro 提交于 5月 01, 2015

when we go for on-demand allocation of saved state in
link_path_walk(), we'll want nameidata to stay around
for all 3 calls of path_mountpoint().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

46afd6f6

name: shift nameidata down into user_path_walk() · f5beed75

由 Al Viro 提交于 4月 30, 2015

that avoids having nameidata on stack during the calls of
->rmdir()/->unlink() and *two* of those during the calls
of ->rename().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f5beed75

namei: get rid of lookup_hash() · 6a9f40d6

由 Al Viro 提交于 4月 30, 2015

it's a convenient helper, but we'll want to shift nameidata
down the call chain, so it won't be available there...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6a9f40d6

do_last: regularize the logics around following symlinks · a5cfe2d5

由 Al Viro 提交于 4月 22, 2015

With LOOKUP_FOLLOW we unlazy and return 1; without it we either
fail with ELOOP or, for O_PATH opens, succeed.  No need to mix
those cases...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a5cfe2d5

do_last: kill symlink_ok · fd2805be

由 Al Viro 提交于 4月 22, 2015

When O_PATH is present, O_CREAT isn't, so symlink_ok is always equal to
(open_flags & O_PATH) && !(nd->flags & LOOKUP_FOLLOW).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fd2805be

A
namei: take O_NOFOLLOW treatment into do_last() · f488443d
由 Al Viro 提交于 4月 22, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f488443d

uninline walk_component() · 34b128f3

由 Al Viro 提交于 4月 19, 2015

seriously improves the stack *and* I-cache footprint...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

34b128f3

SECURITY: remove nameidata arg from inode_follow_link. · 37882db0

由 NeilBrown 提交于 3月 23, 2015

No ->inode_follow_link() methods use the nameidata arg, and
it is about to become private to namei.c.
So remove from all inode_follow_link() functions.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

37882db0

09 5月, 2015 2 次提交

path_openat(): fix double fput() · f15133df

由 Al Viro 提交于 5月 08, 2015

path_openat() jumps to the wrong place after do_tmpfile() - it has
already done path_cleanup() (as part of path_lookupat() called by
do_tmpfile()), so doing that again can lead to double fput().

Cc: stable@vger.kernel.org	# v3.11+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f15133df

namei: d_is_negative() should be checked before ->d_seq validation · 766c4cbf

由 Al Viro 提交于 5月 07, 2015

Fetching ->d_inode, verifying ->d_seq and finding d_is_negative() to
be true does *not* mean that inode we'd fetched had been NULL - that
holds only while ->d_seq is still unchanged.

Shift d_is_negative() checks into lookup_fast() prior to ->d_seq
verification.
Reported-by: NSteven Rostedt <rostedt@goodmis.org>
Tested-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

766c4cbf

25 4月, 2015 1 次提交

RCU pathwalk breakage when running into a symlink overmounting something · 3cab989a

由 Al Viro 提交于 4月 24, 2015

Calling unlazy_walk() in walk_component() and do_last() when we find
a symlink that needs to be followed doesn't acquire a reference to vfsmount.
That's fine when the symlink is on the same vfsmount as the parent directory
(which is almost always the case), but it's not always true - one _can_
manage to bind a symlink on top of something.  And in such cases we end up
with excessive mntput().

Cc: stable@vger.kernel.org # since 2.6.39
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3cab989a

16 4月, 2015 2 次提交

VFS: Make pathwalk use d_is_reg() rather than S_ISREG() · 4bbcbd3b

由 David Howells 提交于 3月 17, 2015

Make pathwalk use d_is_reg() rather than S_ISREG() to determine whether to
honour O_TRUNC.  Since this occurs after complete_walk(), the dentry type
field cannot change and the inode pointer cannot change as we hold a ref on
the dentry, so this should be safe.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4bbcbd3b

VFS: Combine inode checks with d_is_negative() and d_is_positive() in pathwalk · 698934df

由 David Howells 提交于 3月 17, 2015

Where we have:

    	if (!dentry->d_inode || d_is_negative(dentry)) {

type constructions in pathwalk we should be able to eliminate the check of
d_inode and rely solely on the result of d_is_negative() or d_is_positive().

What we do have to take care to do is to read d_inode after calling a
d_is_xxx() typecheck function to get the barriering right.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

698934df

12 4月, 2015 3 次提交

A
remove incorrect comment in lookup_one_len() · 9e7543e9
由 Al Viro 提交于 2月 23, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
9e7543e9
A
namei.c: fold do_path_lookup() into both callers · 74eb8cc5
由 Al Viro 提交于 2月 23, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
74eb8cc5

kill struct filename.separate · fd2f7cb5

由 Al Viro 提交于 2月 22, 2015

just make const char iname[] the last member and compare name->name with
name->iname instead of checking name->separate

We need to make sure that out-of-line name doesn't end up allocated adjacent
to struct filename refering to it; fortunately, it's easy to achieve - just
allocate that struct filename with one byte in ->iname[], so that ->iname[0]
will be inside the same object and thus have an address different from that
of out-of-line name [spotted by Boqun Feng <boqun.feng@gmail.com>]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fd2f7cb5

25 3月, 2015 4 次提交
- A
  switch path_init() to struct filename · 6e8a1f87
  由 Al Viro 提交于 2月 22, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  6e8a1f87
- A
  switch path_mountpoint() to struct filename · 668696dc
  由 Al Viro 提交于 2月 22, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  668696dc
- A
  switch path_lookupat() to struct filename · 5eb6b495
  由 Al Viro 提交于 2月 22, 2015
```
all callers were passing it ->name of some struct filename
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  5eb6b495
- A
  getname_flags(): clean up a bit · 94b5d262
  由 Al Viro 提交于 2月 22, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  94b5d262
23 2月, 2015 1 次提交

VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8

由 David Howells 提交于 1月 29, 2015

Convert the following where appropriate:

 (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

 (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

 (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
     complicated than it appears as some calls should be converted to
     d_can_lookup() instead.  The difference is whether the directory in
     question is a real dir with a ->lookup op or whether it's a fake dir with
     a ->d_automount op.

In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).

Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer.  In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.

However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.

There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
intended for special directory entry types that don't have attached inodes.

The following perl+coccinelle script was used:

use strict;

my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
    print "No matches\n";
    exit(0);
}

my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

my $coccifile = "tmp.sp.cocci";
open($fd, ">$coccifile") || die $coccifile;
print($fd "$_\n") || die $coccifile foreach (@cocci);
close($fd);

foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
	die "spatch failed";
}

[AV: overlayfs parts skipped]
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e36cb0b8

23 1月, 2015 6 次提交

audit: replace getname()/putname() hacks with reference counters · 55422d0b

由 Paul Moore 提交于 1月 22, 2015

In order to ensure that filenames are not released before the audit
subsystem is done with the strings there are a number of hacks built
into the fs and audit subsystems around getname() and putname().  To
say these hacks are "ugly" would be kind.

This patch removes the filename hackery in favor of a more
conventional reference count based approach.  The diffstat below tells
most of the story; lots of audit/fs specific code is replaced with a
traditional reference count based approach that is easily understood,
even by those not familiar with the audit and/or fs subsystems.

CC: viro@zeniv.linux.org.uk
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: NPaul Moore <pmoore@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

55422d0b

audit: enable filename recording via getname_kernel() · fd3522fd

由 Paul Moore 提交于 1月 22, 2015

Enable recording of filenames in getname_kernel() and remove the
kludgy workaround in __audit_inode() now that we have proper filename
logging for kernel users.

CC: viro@zeniv.linux.org.uk
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: NPaul Moore <pmoore@redhat.com>
Reviewed-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fd3522fd

simpler calling conventions for filename_mountpoint() · cbaab2db

由 Al Viro 提交于 1月 22, 2015

a) make it accept ERR_PTR() as filename (and return its PTR_ERR() in that case)
b) make it putname() the sucker in the end otherwise

simplifies life for callers...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cbaab2db

fs: create proper filename objects using getname_kernel() · 51689104

由 Paul Moore 提交于 1月 22, 2015

There are several areas in the kernel that create temporary filename
objects using the following pattern:

	int func(const char *name)
	{
		struct filename *file = { .name = name };
		...
		return 0;
	}

... which for the most part works okay, but it causes havoc within the
audit subsystem as the filename object does not persist beyond the
lifetime of the function.  This patch converts all of these temporary
filename objects into proper filename objects using getname_kernel()
and putname() which ensure that the filename object persists until the
audit subsystem is finished with it.

Also, a special thanks to Al Viro, Guenter Roeck, and Sabrina Dubroca
for helping resolve a difficult kernel panic on boot related to a
use-after-free problem in kern_path_create(); the thread can be seen
at the link below:

 * https://lkml.org/lkml/2015/1/20/710

This patch includes code that was either based on, or directly written
by Al in the above thread.

CC: viro@zeniv.linux.org.uk
CC: linux@roeck-us.net
CC: sd@queasysnail.net
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: NPaul Moore <pmoore@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

51689104

fs: rework getname_kernel to handle up to PATH_MAX sized filenames · 08518549

由 Paul Moore 提交于 1月 21, 2015

In preparation for expanded use in the kernel, make getname_kernel()
more useful by allowing it to handle any legal filename length.

Thanks to Guenter Roeck for his suggestion to substitute memcpy() for
strlcpy().

CC: linux@roeck-us.net
CC: viro@zeniv.linux.org.uk
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: NPaul Moore <pmoore@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

08518549

cut down the number of do_path_lookup() callers · fa14a0b8

由 Al Viro 提交于 1月 22, 2015

... and don't bother with new struct filename when we already have one
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fa14a0b8

14 12月, 2014 1 次提交

syscalls: implement execveat() system call · 51f39a1f

由 David Drysdale 提交于 12月 12, 2014

This patchset adds execveat(2) for x86, and is derived from Meredydd
Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).

The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc filesystem,
at least for executables (rather than scripts).  The current glibc version
of fexecve(3) is implemented via /proc, which causes problems in sandboxed
or otherwise restricted environments.

Given the desire for a /proc-free fexecve() implementation, HPA suggested
(https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
an appropriate generalization.

Also, having a new syscall means that it can take a flags argument without
back-compatibility concerns.  The current implementation just defines the
AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
added in future -- for example, flags for new namespaces (as suggested at
https://lkml.org/lkml/2006/7/11/474).

Related history:
 - https://lkml.org/lkml/2006/12/27/123 is an example of someone
   realizing that fexecve() is likely to fail in a chroot environment.
 - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
   documenting the /proc requirement of fexecve(3) in its manpage, to
   "prevent other people from wasting their time".
 - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
   problem where a process that did setuid() could not fexecve()
   because it no longer had access to /proc/self/fd; this has since
   been fixed.

This patch (of 4):

Add a new execveat(2) system call.  execveat() is to execve() as openat()
is to open(): it takes a file descriptor that refers to a directory, and
resolves the filename relative to that.

In addition, if the filename is empty and AT_EMPTY_PATH is specified,
execveat() executes the file to which the file descriptor refers.  This
replicates the functionality of fexecve(), which is a system call in other
UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and
so relies on /proc being mounted).

The filename fed to the executed program as argv[0] (or the name of the
script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
(for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
reflecting how the executable was found.  This does however mean that
execution of a script in a /proc-less environment won't work; also, script
execution via an O_CLOEXEC file descriptor fails (as the file will not be
accessible after exec).

Based on patches by Meredydd Luff.
Signed-off-by: NDavid Drysdale <drysdale@google.com>
Cc: Meredydd Luff <meredydd@senatehouse.org>
Cc: Shuah Khan <shuah.kh@samsung.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rich Felker <dalias@aerifal.cx>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

51f39a1f

12 12月, 2014 4 次提交
- A
  fs/namei.c: fold link_path_walk() call into path_init() · d465887f
  由 Al Viro 提交于 11月 20, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  d465887f
- A
  path_init(): don't bother with LOOKUP_PARENT in argument · 980f3ea2
  由 Al Viro 提交于 11月 20, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  980f3ea2
- A
  fs/namei.c: new helper (path_cleanup()) · 893b7775
  由 Al Viro 提交于 11月 20, 2014
```
All callers of path_init() proceed to do the identical cleanup when
they are done with nameidata.  Don't open-code it...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  893b7775
- A
  path_init(): store the "base" pointer to file in nameidata itself · 5e53084d
  由 Al Viro 提交于 11月 20, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  5e53084d
11 12月, 2014 1 次提交
- A
  make nameidata completely opaque outside of fs/namei.c · 1f55a6ec
  由 Al Viro 提交于 11月 01, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  1f55a6ec
31 10月, 2014 1 次提交

fs: allow open(dir, O_TMPFILE|..., 0) with mode 0 · 69a91c23

由 Eric Rannaud 提交于 10月 30, 2014

The man page for open(2) indicates that when O_CREAT is specified, the
'mode' argument applies only to future accesses to the file:

	Note that this mode applies only to future accesses of the newly
	created file; the open() call that creates a read-only file
	may well return a read/write file descriptor.

The man page for open(2) implies that 'mode' is treated identically by
O_CREAT and O_TMPFILE.

O_TMPFILE, however, behaves differently:

	int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);
	assert(fd == -1);
	assert(errno == EACCES);

	int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
	assert(fd > 0);

For O_CREAT, do_last() sets acc_mode to MAY_OPEN only:

	if (*opened & FILE_CREATED) {
		/* Don't check for write permission, don't truncate */
		open_flag &= ~O_TRUNC;
		will_truncate = false;
		acc_mode = MAY_OPEN;
		path_to_nameidata(path, nd);
		goto finish_open_created;
	}

But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to
may_open().

This patch lines up the behavior of O_TMPFILE with O_CREAT. After the
inode is created, may_open() is called with acc_mode = MAY_OPEN, in
do_tmpfile().

A different, but related glibc bug revealed the discrepancy:
https://sourceware.org/bugzilla/show_bug.cgi?id=17523

The glibc lazily loads the 'mode' argument of open() and openat() using
va_arg() only if O_CREAT is present in 'flags' (to support both the 2
argument and the 3 argument forms of open; same idea for openat()).
However, the glibc ignores the 'mode' argument if O_TMPFILE is in
'flags'.

On x86_64, for open(), it magically works anyway, as 'mode' is in
RDX when entering open(), and is still in RDX on SYSCALL, which is where
the kernel looks for the 3rd argument of a syscall.

But openat() is not quite so lucky: 'mode' is in RCX when entering the
glibc wrapper for openat(), while the kernel looks for the 4th argument
of a syscall in R10. Indeed, the syscall calling convention differs from
the regular calling convention in this respect on x86_64. So the kernel
sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and
fails with EACCES.
Signed-off-by: NEric Rannaud <e@nanocritical.com>
Acked-by: NAndy Lutomirski <luto@amacapital.net>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

69a91c23

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功