提交 · f60dc3db6e24b7c36445cf1feb56b34c799074b3 · openeuler / raspberrypi-kernel

14 7月, 2012 12 次提交

vfs: do_last(): clean up retry · f60dc3db

由 Miklos Szeredi 提交于 6月 05, 2012

Move the lookup retry logic to the bottom of the function to make the normal
case simpler to read.
Reported-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f60dc3db

vfs: do_last(): clean up bool · 77d660a8

由 Miklos Szeredi 提交于 6月 05, 2012

Consistently use bool for boolean values in do_last().
Reported-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

77d660a8

vfs: do_last(): clean up labels · e83db167

由 Miklos Szeredi 提交于 6月 05, 2012

Reported-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e83db167

vfs: do_last(): clean up error handling · aa4caadb

由 Miklos Szeredi 提交于 6月 05, 2012

Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aa4caadb

vfs: remove open intents from nameidata · 015c3bbc

由 Miklos Szeredi 提交于 6月 05, 2012

All users of open intents have been converted to use ->atomic_{open,create}.

This patch gets rid of nd->intent.open and related infrastructure.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

015c3bbc

vfs: add i_op->atomic_open() · d18e9008

由 Miklos Szeredi 提交于 6月 05, 2012

Add a new inode operation which is called on the last component of an open.
Using this the filesystem can look up, possibly create and open the file in one
atomic operation. If it cannot perform this (e.g. the file type turned out to
be wrong) it may signal this by returning NULL instead of an open struct file
pointer.

i_op->atomic_open() is only called if the last component is negative or needs
lookup. Handling cached positive dentries here doesn't add much value: these
can be opened using f_op->open(). If the cached file turns out to be invalid,
the open can be retried, this time using ->atomic_open() with a fresh dentry.

For now leave the old way of using open intents in lookup and revalidate in
place. This will be removed once all the users are converted.

David Howells noticed that if ->atomic_open() opens the file but does not create
it, handle_truncate() will be called on it even if it is not a regular file.
Fix this by checking the file type in this case too.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d18e9008

vfs: lookup_open(): expand lookup_hash() · 54ef4872

由 Miklos Szeredi 提交于 6月 05, 2012

Copy __lookup_hash() into lookup_open().  The next patch will insert the atomic
open call just before the real lookup.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

54ef4872

vfs: add lookup_open() · d58ffd35

由 Miklos Szeredi 提交于 6月 05, 2012

Split out lookup + maybe create from do_last().  This is the part under i_mutex
protection.

The function is called lookup_open() and returns a filp even though the open
part is not used yet.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d58ffd35

vfs: do_last(): common slow lookup · 71574865

由 Miklos Szeredi 提交于 6月 05, 2012

Make the slow lookup part of O_CREAT and non-O_CREAT opens common.

This allows atomic_open to be hooked into the slow lookup part.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

71574865

vfs: do_last(): separate O_CREAT specific code · b6183df7

由 Miklos Szeredi 提交于 6月 05, 2012

Check O_CREAT on the slow lookup paths where necessary.  This allows the rest to
be shared with plain open.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b6183df7

vfs: do_last(): inline lookup_slow() · 37d7fffc

由 Miklos Szeredi 提交于 6月 05, 2012

Copy lookup_slow() into do_last().
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

37d7fffc

namei.c: let follow_link() do put_link() on failure · 6d7b5aae

由 Al Viro 提交于 6月 10, 2012

no need for kludgy "set cookie to ERR_PTR(...) because we failed
before we did actual ->follow_link() and want to suppress put_link()",
no pointless check in put_link() itself.

Callers checked if follow_link() has failed anyway; might as well
break out of their loops if that happened, without bothering
to call put_link() first.

[AV: folded fixes from hch]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6d7b5aae

02 6月, 2012 11 次提交

vfs: retry last component if opening stale dentry · 16b1c1cd

由 Miklos Szeredi 提交于 5月 21, 2012

NFS optimizes away d_revalidates for last component of open.  This means that
open itself can find the dentry stale.

This patch allows the filesystem to return EOPENSTALE and the VFS will retry the
lookup on just the last component if possible.

If the lookup was done using RCU mode, including the last component, then this
is not possible since the parent dentry is lost.  In this case fall back to
non-RCU lookup.  Currently this is not used since NFS will always leave RCU
mode.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

16b1c1cd

vfs: do_last() common post lookup · 5f5daac1

由 Miklos Szeredi 提交于 5月 21, 2012

Now the post lookup code can be shared between O_CREAT and plain opens since
they are essentially the same.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5f5daac1

vfs: do_last(): add audit_inode before open · d7fdd7f6

由 Miklos Szeredi 提交于 5月 21, 2012

This allows this code to be shared between O_CREAT and plain opens.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d7fdd7f6

vfs: do_last(): only return EISDIR for O_CREAT · 050ac841

由 Miklos Szeredi 提交于 5月 21, 2012

This allows this code to be shared between O_CREAT and plain opens.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

050ac841

vfs: do_last(): check LOOKUP_DIRECTORY · af2f5542

由 Miklos Szeredi 提交于 5月 21, 2012

Check for ENOTDIR before finishing open.  This allows this code to be shared
between O_CREAT and plain opens.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

af2f5542

vfs: do_last(): make ENOENT exit RCU safe · 54c33e7f

由 Miklos Szeredi 提交于 5月 21, 2012

This will allow this code to be used in RCU mode.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

54c33e7f

vfs: make follow_link check RCU safe · d45ea867

由 Miklos Szeredi 提交于 5月 21, 2012

This will allow this code to be used in RCU mode.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d45ea867

vfs: do_last(): use inode variable · decf3400

由 Miklos Szeredi 提交于 5月 21, 2012

Use helper variable instead of path->dentry->d_inode before complete_walk().
This will allow this code to be used in RCU mode.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

decf3400

vfs: do_last(): inline walk_component() · a1eb3315

由 Miklos Szeredi 提交于 5月 21, 2012

Copy walk_component() into do_lookup().
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a1eb3315

vfs: do_last(): make exit RCU safe · e276ae67

由 Miklos Szeredi 提交于 5月 21, 2012

Allow returning from do_last() with LOOKUP_RCU still set on the "out:" and
"exit:" labels.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e276ae67

vfs: split do_lookup() · 697f514d

由 Miklos Szeredi 提交于 5月 21, 2012

Split do_lookup() into two functions:

  lookup_fast() - does cached lookup without i_mutex
  lookup_slow() - does lookup with i_mutex

Both follow managed dentries.

The new functions are needed by atomic_open.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

697f514d

30 5月, 2012 1 次提交

brlocks/lglocks: API cleanups · 962830df

由 Andi Kleen 提交于 5月 08, 2012

lglocks and brlocks are currently generated with some complicated macros
in lglock.h.  But there's no reason to not just use common utility
functions and put all the data into a common data structure.

In preparation, this patch changes the API to look more like normal
function calls with pointers, not magic macros.

The patch is rather large because I move over all users in one go to keep
it bisectable.  This impacts the VFS somewhat in terms of lines changed.
But no actual behaviour change.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

962830df

27 5月, 2012 1 次提交

word-at-a-time: make the interfaces truly generic · 36126f8f

由 Linus Torvalds 提交于 5月 26, 2012

This changes the interfaces in <asm/word-at-a-time.h> to be a bit more
complicated, but a lot more generic.

In particular, it allows us to really do the operations efficiently on
both little-endian and big-endian machines, pretty much regardless of
machine details.  For example, if you can rely on a fast population
count instruction on your architecture, this will allow you to make your
optimized <asm/word-at-a-time.h> file with that.

NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is
not truly generic, it actually only works on big-endian.  Why? Because
on little-endian the generic algorithms are wasteful, since you can
inevitably do better. The x86 implementation is an example of that.

(The only truly non-generic part of the asm-generic implementation is
the "find_zero()" function, and you could make a little-endian version
of it.  And if the Kbuild infrastructure allowed us to pick a particular
header file, that would be lovely)

The <asm/word-at-a-time.h> functions are as follows:

 - WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm
   uses.

 - has_zero(): take a word, and determine if it has a zero byte in it.
   It gets the word, the pointer to the constant pool, and a pointer to
   an intermediate "data" field it can set.

   This is the "quick-and-dirty" zero tester: it's what is run inside
   the hot loops.

 - "prep_zero_mask()": take the word, the data that has_zero() produced,
   and the constant pool, and generate an *exact* mask of which byte had
   the first zero.  This is run directly *outside* the loop, and allows
   the "has_zero()" function to answer the "is there a zero byte"
   question without necessarily getting exactly *which* byte is the
   first one to contain a zero.

   If you do multiple byte lookups concurrently (eg "hash_name()", which
   looks for both NUL and '/' bytes), after you've done the prep_zero_mask()
   phase, the result of those can be or'ed together to get the "either
   or" case.

 - The result from "prep_zero_mask()" can then be fed into "find_zero()"
   (to find the byte offset of the first byte that was zero) or into
   "zero_bytemask()" (to find the bytemask of the bytes preceding the
   zero byte).

   The existence of zero_bytemask() is optional, and is not necessary
   for the normal string routines.  But dentry name hashing needs it, so
   if you enable DENTRY_WORD_AT_A_TIME you need to expose it.

This changes the generic strncpy_from_user() function and the dentry
hashing functions to use these modified word-at-a-time interfaces.  This
gets us back to the optimized state of the x86 strncpy that we lost in
the previous commit when moving over to the generic version.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

36126f8f

25 5月, 2012 1 次提交

kernel: Move REPEAT_BYTE definition into linux/kernel.h · 44696908

由 David S. Miller 提交于 5月 23, 2012

And make sure that everything using it explicitly includes
that header file.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44696908

05 5月, 2012 1 次提交

vfs: clean up __d_lookup_rcu() and dentry_cmp() interfaces · 12f8ad4b

由 Linus Torvalds 提交于 5月 04, 2012

The calling conventions for __d_lookup_rcu() and dentry_cmp() are
annoying in different ways, and there is actually one single underlying
reason for both of the annoyances.

The fundamental reason is that we do the returned dentry sequence number
check inside __d_lookup_rcu() instead of doing it in the caller.  This
results in two annoyances:

 - __d_lookup_rcu() now not only needs to return the dentry and the
   sequence number that goes along with the lookup, it also needs to
   return the inode pointer that was validated by that sequence number
   check.

 - and because we did the sequence number check early (to validate the
   name pointer and length) we also couldn't just pass the dentry itself
   to dentry_cmp(), we had to pass the counted string that contained the
   name.

So that sequence number decision caused two separate ugly calling
conventions.

Both of these problems would be solved if we just did the sequence
number check in the caller instead.  There's only one caller, and that
caller already has to do the sequence number check for the parent
anyway, so just do that.

That allows us to stop returning the dentry->d_inode in that in-out
argument (pointer-to-pointer-to-inode), so we can make the inode
argument just a regular input inode pointer.  The caller can just load
the inode from dentry->d_inode, and then do the sequence number check
after that to make sure that it's synchronized with the name we looked
up.

And it allows us to just pass in the dentry to dentry_cmp(), which is
what all the callers really wanted.  Sure, dentry_cmp() has to be a bit
careful about the dentry (which is not stable during RCU lookup), but
that's actually very simple.

And now that dentry_cmp() can clearly see that the first string argument
is a dentry, we can use the direct word access for that, instead of the
careful unaligned zero-padding.  The dentry name is always properly
aligned, since it is a single path component that is either embedded
into the dentry itself, or was allocated with kmalloc() (see __d_alloc).

Finally, this also uninlines the nasty slow-case for dentry comparisons:
that one *does* need to do a sequence number check, since it will call
in to the low-level filesystems, and we want to give those a stable
inode pointer and path component length/start arguments.  Doing an extra
sequence check for that slow case is not a problem, though.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12f8ad4b

04 5月, 2012 1 次提交

vfs: make word-at-a-time accesses handle a non-existing page · e419b4cc

由 Linus Torvalds 提交于 5月 03, 2012

It turns out that there are more cases than CONFIG_DEBUG_PAGEALLOC that
can have holes in the kernel address space: it seems to happen easily
with Xen, and it looks like the AMD gart64 code will also punch holes
dynamically.

Actually hitting that case is still very unlikely, so just do the
access, and take an exception and fix it up for the very unlikely case
of it being a page-crosser with no next page.

And hey, this abstraction might even help other architectures that have
other issues with unaligned word accesses than the possible missing next
page.  IOW, this could do the byte order magic too.

Peter Anvin fixed a thinko in the shifting for the exception case.
Reported-and-tested-by: NJana Saout <jana@saout.de>
Cc:  Peter Anvin <hpa@zytor.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e419b4cc

03 5月, 2012 1 次提交
- E
  userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs · 8e96e3b7
  由 Eric W. Biederman 提交于 3月 03, 2012
```
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
```
  8e96e3b7
29 4月, 2012 1 次提交

VFS: clean up and simplify getname_flags() · 3f9f0aa6

由 Linus Torvalds 提交于 4月 28, 2012

This removes a number of silly games around strncpy_from_user() in
do_getname(), and removes that helper function entirely.  We instead
make getname_flags() just use strncpy_from_user() properly directly.

Removing the wrapper function simplifies things noticeably, mostly
because we no longer play the unnecessary games with segments (x86
strncpy_from_user() no longer needs the hack), but also because the
empty path handling is just much more obvious.  The return value of
"strncpy_to_user()" is much more obvious than checking an odd error
return case from do_getname().

[ non-x86 architectures were notified of this change several weeks ago,
  since it is possible that they have copied the old broken x86
  strncpy_from_user. But nobody reacted, so .. See

    http://www.spinics.net/lists/linux-arch/msg17313.html

  for details ]

Cc: linux-arch@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3f9f0aa6

08 4月, 2012 1 次提交

userns: Replace the hard to write inode_userns with inode_capable. · 1a48e2ac

由 Eric W. Biederman 提交于 11月 14, 2011

This represents a change in strategy of how to handle user namespaces.
Instead of tagging everything explicitly with a user namespace and bulking
up all of the comparisons of uids and gids in the kernel,  all uids and gids
in use will have a mapping to a flat kuid and kgid spaces respectively.  This
allows much more of the existing logic to be preserved and in general
allows for faster code.

In this new and improved world we allow someone to utiliize capabilities
over an inode if the inodes owner mapps into the capabilities holders user
namespace and the user has capabilities in their user namespace.  Which
is simple and efficient.

Moving the fs uid comparisons to be comparisons in a flat kuid space
follows in later patches, something that is only significant if you
are using user namespaces.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

1a48e2ac

07 4月, 2012 1 次提交

Make the "word-at-a-time" helper functions more commonly usable · f68e556e

由 Linus Torvalds 提交于 4月 06, 2012

I have a new optimized x86 "strncpy_from_user()" that will use these
same helper functions for all the same reasons the name lookup code uses
them.  This is preparation for that.

This moves them into an architecture-specific header file.  It's
architecture-specific for two reasons:

 - some of the functions are likely to want architecture-specific
   implementations.  Even if the current code happens to be "generic" in
   the sense that it should work on any little-endian machine, it's
   likely that the "multiply by a big constant and shift" implementation
   is less than optimal for an architecture that has a guaranteed fast
   bit count instruction, for example.

 - I expect that if architectures like sparc want to start playing
   around with this, we'll need to abstract out a few more details (in
   particular the actual unaligned accesses).  So we're likely to have
   more architecture-specific stuff if non-x86 architectures start using
   this.

   (and if it turns out that non-x86 architectures don't start using
   this, then having it in an architecture-specific header is still the
   right thing to do, of course)
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f68e556e

03 4月, 2012 1 次提交

vfs: Don't allow a user namespace root to make device nodes · 975d6b39

由 Eric W. Biederman 提交于 11月 13, 2011

Safely making device nodes in a container is solvable but simply
having the capability in a user namespace is not sufficient to make
this work.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

975d6b39

01 4月, 2012 7 次提交

vfs: fix out-of-date dentry_unhash() comment · c0d02594

由 J. Bruce Fields 提交于 2月 15, 2012

64252c75 "vfs: remove dget() from
dentry_unhash()" changed the implementation but not the comment.

Cc: Sage Weil <sage@newdream.net>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c0d02594

vfs: split __lookup_hash · bad61189

由 Miklos Szeredi 提交于 3月 26, 2012

Split __lookup_hash into two component functions:

 lookup_dcache - tries cached lookup, returns whether real lookup is needed
 lookup_real - calls i_op->lookup

This eliminates code duplication between d_alloc_and_lookup() and
d_inode_lookup().
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bad61189

A
untangling do_lookup() - take __lookup_hash()-calling case out of line. · 81e6f520
由 Al Viro 提交于 3月 30, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
81e6f520

untangling do_lookup() - switch to calling __lookup_hash() · a3255546

由 Al Viro 提交于 3月 30, 2012

now we have __lookup_hash() open-coded if !dentry case;
just call the damn thing instead...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a3255546

A
untangling do_lookup() - merge d_alloc_and_lookup() callers · a6ecdfcf
由 Al Viro 提交于 3月 30, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a6ecdfcf
A
untangling do_lookup() - merge failure exits in !dentry case · ec335e91
由 Al Viro 提交于 3月 30, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ec335e91
A
untangling do_lookup() - massage !dentry case towards __lookup_hash() · d774a058
由 Al Viro 提交于 3月 30, 2012
```
Reorder if-else cases for starters...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d774a058