提交 · 10556cb21a0d0b24d95f00ea6df16f599a3345b2 · openeuler / raspberrypi-kernel

20 7月, 2011 8 次提交

A
->permission() sanitizing: don't pass flags to ->permission() · 10556cb2
由 Al Viro 提交于 6月 20, 2011
```
not used by the instances anymore.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
10556cb2

->permission() sanitizing: don't pass flags to generic_permission() · 2830ba7f

由 Al Viro 提交于 6月 20, 2011

redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of
them removes that bit.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2830ba7f

A
->permission() sanitizing: don't pass flags to ->check_acl() · 7e40145e
由 Al Viro 提交于 6月 20, 2011
```
not used in the instances anymore.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
7e40145e

->permission() sanitizing: MAY_NOT_BLOCK · 1fc0f78c

由 Al Viro 提交于 6月 20, 2011

Duplicate the flags argument into mask bitmap.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1fc0f78c

kill check_acl callback of generic_permission() · 178ea735

由 Al Viro 提交于 6月 20, 2011

its value depends only on inode and does not change; we might as
well store it in ->i_op->check_acl and be done with that.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

178ea735

lockless get_write_access/deny_write_access · 07b8ce1e

由 Al Viro 提交于 6月 20, 2011

new helpers: atomic_inc_unless_negative()/atomic_dec_unless_positive()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

07b8ce1e

kill file_permission() completely · 3bfa784a

由 Al Viro 提交于 6月 19, 2011

convert the last remaining caller to inode_permission()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3bfa784a

new helper: iterate_supers_type() · 43e15cdb

由 Al Viro 提交于 6月 03, 2011

Call the given function for all superblocks of given type.  Function
gets a superblock (with s_umount locked shared) and (void *) argument
supplied by caller of iterator.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

43e15cdb

28 6月, 2011 1 次提交

mm: fix assertion mapping->nrpages == 0 in end_writeback() · 08142579

由 Jan Kara 提交于 6月 27, 2011

Under heavy memory and filesystem load, users observe the assertion
mapping->nrpages == 0 in end_writeback() trigger.  This can be caused by
page reclaim reclaiming the last page from a mapping in the following
race:

	CPU0				CPU1
  ...
  shrink_page_list()
    __remove_mapping()
      __delete_from_page_cache()
        radix_tree_delete()
					evict_inode()
					  truncate_inode_pages()
					    truncate_inode_pages_range()
					      pagevec_lookup() - finds nothing
					  end_writeback()
					    mapping->nrpages != 0 -> BUG
        page->mapping = NULL
        mapping->nrpages--

Fix the problem by doing a reliable check of mapping->nrpages under
mapping->tree_lock in end_writeback().

Analyzed by Jay <jinshan.xiong@whamcloud.com>, lost in LKML, and dug out
by Miklos Szeredi <mszeredi@suse.de>.

Cc: Jay <jinshan.xiong@whamcloud.com>
Cc: Miklos Szeredi <mszeredi@suse.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08142579

21 6月, 2011 1 次提交

vfs: i_state needs to be 'unsigned long' for now · 79568f5b

由 Linus Torvalds 提交于 6月 20, 2011

Commit 13e12d14 ("vfs: reorganize 'struct inode' layout a bit")
moved things around a bit changed i_state to be unsigned int instead of
unsigned long.  That was to help structure layout for the 64-bit case,
and shrink 'struct inode' a bit (admittedly that only happened when
spinlock debugging was on and i_flags didn't pack with i_lock).

However, Meelis Roos reports that this results in unaligned exceptions
on sprc, and it turns out that the bit-locking primitives that we use
for the I_NEW bit want to use the bitops.  Which want 'unsigned long',
not 'unsigned int'.

We really should fix the bit locking code to not have that kind of
requirement, but that's a much bigger change.  So for now, revert that
field back to 'unsigned long' (but keep the other re-ordering changes
from the commit that caused this).

Andi points out that we have played games with this in 'struct page', so
it's solvable with other hacks too, but since right now the struct inode
size advantage only happens with some rare config options, it's not
worth fighting.

It _would_ be worth fixing the bitlocking code, though.  Especially
since there is no type safety in the bitlocking code (this never caused
any warnings, and worked fine on x86-64, because the bitlocks take a
'void *' and x86-64 doesn't care that deeply about alignment).  So it's
currently a very easy problem to trigger by mistake and never notice.
Reported-by: NMeelis Roos <mroos@linux.ee>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

79568f5b

09 6月, 2011 1 次提交

vfs: reorganize 'struct inode' layout a bit · 13e12d14

由 Linus Torvalds 提交于 6月 08, 2011

This tries to make the 'struct inode' accesses denser in the data cache
by moving a commonly accessed field (i_security) closer to other fields
that are accessed often.

It also makes 'i_state' just an 'unsigned int' rather than 'unsigned
long', since we only use a few bits of that field, and moves it next to
the existing 'i_flags' so that we potentially get better structure
layout (although depending on config options, i_flags may already have
packed in the same word as i_lock, so this improves packing only for the
case of spinlock debugging)

Out 'struct inode' is still way too big, and we should probably move
some other fields around too (the acl fields in particular) for better
data cache access density. Other fields (like the inode hash) are
likely to be entirely irrelevant under most loads.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

13e12d14

04 6月, 2011 1 次提交

more conservative S_NOSEC handling · 9e1f1de0

由 Al Viro 提交于 6月 03, 2011

Caching "we have already removed suid/caps" was overenthusiastic as merged.
On network filesystems we might have had suid/caps set on another client,
silently picked by this client on revalidate, all of that *without* clearing
the S_NOSEC flag.

AFAICS, the only reasonably sane way to deal with that is
	* new superblock flag; unless set, S_NOSEC is not going to be set.
	* local block filesystems set it in their ->mount() (more accurately,
mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than
local block ones clear it)
	* if any network filesystem (or a cluster one) wants to use S_NOSEC,
it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when
inode attribute changes are picked from other clients.

It's not an earth-shattering hole (anybody that can set suid on another client
will almost certainly be able to write to the file before doing that anyway),
but it's a bug that needs fixing.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9e1f1de0

29 5月, 2011 1 次提交

Cache xattr security drop check for write v2 · 69b45732

由 Andi Kleen 提交于 5月 28, 2011

Some recent benchmarking on btrfs showed that a major scaling bottleneck
on large systems on btrfs is currently the xattr lookup on every write.

Why xattr lookup on every write I hear you ask?

write wants to drop suid and security related xattrs that could set o
capabilities for executables.  To do that it currently looks up
security.capability on EVERY write (even for non executables) to decide
whether to drop it or not.

In btrfs this causes an additional tree walk, hitting some per file system
locks and quite bad scalability. In a simple read workload on a 8S
system I saw over 90% CPU time in spinlocks related to that.

Chris Mason tells me this is also a problem in ext4, where it hits
the global mbcache lock.

This patch adds a simple per inode to avoid this problem.  We only
do the lookup once per file and then if there is no xattr cache
the decision. All xattr changes clear the flag.

I also used the same flag to avoid the suid check, although
that one is pretty cheap.

A file system can also set this flag when it creates the inode,
if it has a cheap way to do so.  This is done for some common file systems
in followon patches.

With this patch a major part of the lock contention disappears
for btrfs. Some testing on smaller systems didn't show significant
performance changes, but at least it helps the larger systems
and is generally more efficient.

v2: Rename is_sgid. add file system helper.
Cc: chris.mason@oracle.com
Cc: josef@redhat.com
Cc: viro@zeniv.linux.org.uk
Cc: agruen@linbit.com
Cc: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

69b45732

27 5月, 2011 2 次提交

fs: pass exact type of data dirties to ->dirty_inode · aa385729

由 Christoph Hellwig 提交于 5月 27, 2011

Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
anything else, so that the filesystem can track internally if it
needs to push out a transaction for fdatasync or not.

This is just the prototype change with no user for it yet.  I plan
to push large XFS changes for the next merge window, and getting
this trivial infrastructure in this window would help a lot to avoid
tree interdependencies.

Also remove incorrect comments that ->dirty_inode can't block.  That
has been changed a long time ago, and many implementations rely on it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aa385729

fs: add field to superblock to support cleancache · 9fdfdcf1

由 Dan Magenheimer 提交于 5月 26, 2011

This second patch of eight in this cleancache series adds a field to
the generic superblock to squirrel away a pool identifier that is
dynamically provided by cleancache-enabled filesystems at mount time
to uniquely identify files and pages belonging to this mounted filesystem.

Details and a FAQ can be found in Documentation/vm/cleancache.txt

[v8: trivial merge conflict update]
Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik Van Riel <riel@redhat.com>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>

9fdfdcf1

25 5月, 2011 3 次提交

ulimit: raise default hard ulimit on number of files to 4096 · 0ac1ee0b

由 Tim Gardner 提交于 5月 24, 2011

Apps are increasingly using more than 1024 file descriptors.  See
discussion in several distro bug trackers, e.g.  BugLink:
http://bugs.launchpad.net/bugs/663090
https://issues.rpath.com/browse/RPL-2054

You don't want to raise the default soft limit, since that might break
apps that use select(), but it's safe to raise the default hard limit;
that way, apps that know they need lots of file descriptors can raise
their soft limit without needing root, and without user intervention.

Ubuntu is doing this with a kernel change because they have a policy of
not changing kernel defaults in userland.

While 4096 might not be enough for *all* apps, it seems to be plenty for
the apps I've seen lately that are unhappy with 1024.
Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
Cc: Dan Kegel <dank@kegel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ac1ee0b

mm: Convert i_mmap_lock to a mutex · 3d48ae45

由 Peter Zijlstra 提交于 5月 24, 2011

Straightforward conversion of i_mmap_lock to a mutex.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NHugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Tony Luck <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3d48ae45

mm: Remove i_mmap_lock lockbreak · 97a89413

由 Peter Zijlstra 提交于 5月 24, 2011

Hugh says:
 "The only significant loser, I think, would be page reclaim (when
  concurrent with truncation): could spin for a long time waiting for
  the i_mmap_mutex it expects would soon be dropped? "

Counter points:
 - cpu contention makes the spin stop (need_resched())
 - zap pages should be freeing pages at a higher rate than reclaim
   ever can

I think the simplification of the truncate code is definitely worth it.

Effectively reverts: 2aa15890 ("mm: prevent concurrent
unmap_mapping_range() on the same inode") and takes out the code that
caused its problem.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

97a89413

15 5月, 2011 1 次提交

fs: remove FS_COW_FL · e1e8fb6a

由 Li Zefan 提交于 4月 15, 2011

FS_COW_FL and FS_NOCOW_FL were newly introduced to control per file
COW in btrfs, but FS_NOCOW_FL is sufficient.

The fact is we don't have corresponding BTRFS_INODE_COW flag.

COW is default, and FS_NOCOW_FL can be used to switch off COW for
a single file.

If we mount btrfs with nodatacow, a newly created file will be set with
the FS_NOCOW_FL flag. So to turn on COW for it, we can just clear the
FS_NOCOW_FL flag.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e1e8fb6a

13 4月, 2011 2 次提交

vfs: Re-introduce s_uuid in the superblock · 0bba0169

由 Linus Torvalds 提交于 4月 12, 2011

Gaah.  When commit be85bcca reverted the export of file system uuid
via /proc/<pid>/mountinfo, it also unintentionally removed the s_uuid
field in struct super_block.

I didn't mean to do that, since filesystems have been taught to fill it
in (and we want to keep it for future re-introduction in the mountinfo
file).

Stupid of me. This adds it back in.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0bba0169

Revert "vfs: Export file system uuid via /proc/<pid>/mountinfo" · be85bcca

由 Linus Torvalds 提交于 4月 12, 2011

This reverts commit 93f1c20b.

It turns out that libmount misparses it because it adds a '-' character
in the uuid string, which libmount then incorrectly confuses with the
separator string (" - ") at the end of all the optional arguments.

Upstream libmount (in the util-linux tree) has been fixed, but until
that fix actually percolates up to users, we'd better not expose this
change in the kernel.

Let's revisit this later (possibly by exposing the UUID without any '-'
characters in it, avoiding the user-space bug).
Reported-by: NDave Jones <davej@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Karel Zak <kzak@redhat.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Miklos Szeredi <mszeredi@suse.cz>
Cc: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be85bcca

06 4月, 2011 1 次提交

fs: export empty_aops · 7dcda1c9

由 Jens Axboe 提交于 4月 05, 2011

With the ->sync_page() hook gone, we have a few users that
add their own static address_space_operations without any
functions defined.

fs/inode.c already has an empty_aops that it uses for init
purposes. Lets export that and use it in the places where
an otherwise empty aops was defined.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7dcda1c9

31 3月, 2011 1 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

28 3月, 2011 1 次提交

Btrfs: add datacow flag in inode flag · 32471f6e

由 liubo 提交于 3月 21, 2011

For datacow control, the corresponding inode flags are needed.
This is for btrfs use.

v1->v2:
Change FS_COW_FL to another bit due to conflict with the upstream e2fsprogs
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

32471f6e

25 3月, 2011 1 次提交

fs: protect inode->i_state with inode->i_lock · 250df6ed

由 Dave Chinner 提交于 3月 22, 2011

Protect inode state transitions and validity checks with the
inode->i_lock. This enables us to make inode state transitions
independently of the inode_lock and is the first step to peeling
away the inode_lock from the code.

This requires that __iget() is done atomically with i_state checks
during list traversals so that we don't race with another thread
marking the inode I_FREEING between the state check and grabbing the
reference.

Also remove the unlock_new_inode() memory barrier optimisation
required to avoid taking the inode_lock when clearing I_NEW.
Simplify the code by simply taking the inode->i_lock around the
state change and wakeup. Because the wakeup is no longer tricky,
remove the wake_up_inode() function and open code the wakeup where
necessary.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

250df6ed

24 3月, 2011 2 次提交

userns: rename is_owner_or_cap to inode_owner_or_capable · 2e149670

由 Serge E. Hallyn 提交于 3月 23, 2011

And give it a kernel-doc comment.

[akpm@linux-foundation.org: btrfs changed in linux-next]
Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: NDavid Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e149670

userns: userns: check user namespace for task->file uid equivalence checks · e795b717

由 Serge E. Hallyn 提交于 3月 23, 2011

Cheat for now and say all files belong to init_user_ns.  Next step will be
to let superblocks belong to a user_ns, and derive inode_userns(inode)
from inode->i_sb->s_user_ns.  Finally we'll introduce more flexible
arrangements.

Changelog:
	Feb 15: make is_owner_or_cap take const struct inode
	Feb 23: make is_owner_or_cap bool

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Acked-by: NDavid Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e795b717

23 3月, 2011 1 次提交

fs.h: remove 8 bytes of padding from block_device on 64bit builds · f3ccfcda

由 Richard Kennedy 提交于 3月 22, 2011

Re-ordering struct block_inode to remove 8 bytes of padding on 64 bit
builds, which also shrinks bdev_inode by 8 bytes (776 -> 768) allowing it
to fit into one fewer cache lines.
Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3ccfcda

18 3月, 2011 1 次提交

kill simple_set_mnt() · 474a00ee

由 Al Viro 提交于 3月 17, 2011

not needed anymore, since all users (->get_sb() instances) are gone.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

474a00ee

17 3月, 2011 2 次提交

vfs: bury ->get_sb() · 1a102ff9

由 Al Viro 提交于 3月 16, 2011

This is an ex-parrot.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1a102ff9

vfs: new superblock methods to override /proc/*/mount{s,info} · c7f404b4

由 Al Viro 提交于 3月 16, 2011

a) ->show_devname(m, mnt) - what to put into devname columns in mounts,
mountinfo and mountstats
b) ->show_path(m, mnt) - what to put into relative path column in mountinfo

Leaving those NULL gives old behaviour.  NFS switched to using those.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c7f404b4

15 3月, 2011 3 次提交

New kind of open files - "location only". · 1abf0c71

由 Al Viro 提交于 3月 13, 2011

New flag for open(2) - O_PATH.  Semantics:
	* pathname is resolved, but the file itself is _NOT_ opened
as far as filesystem is concerned.
	* almost all operations on the resulting descriptors shall
fail with -EBADF.  Exceptions are:
	1) operations on descriptors themselves (i.e.
		close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD),
		fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD),
		fcntl(fd, F_SETFD, ...))
	2) fcntl(fd, F_GETFL), for a common non-destructive way to
		check if descriptor is open
	3) "dfd" arguments of ...at(2) syscalls, i.e. the starting
		points of pathname resolution
	* closing such descriptor does *NOT* affect dnotify or
posix locks.
	* permissions are checked as usual along the way to file;
no permission checks are applied to the file itself.  Of course,
giving such thing to syscall will result in permission checks (at
the moment it means checking that starting point of ....at() is
a directory and caller has exec permissions on it).

fget() and fget_light() return NULL on such descriptors; use of
fget_raw() and fget_raw_light() is needed to get them.  That protects
existing code from dealing with those things.

There are two things still missing (they come in the next commits):
one is handling of symlinks (right now we refuse to open them that
way; see the next commit for semantics related to those) and another
is descriptor passing via SCM_RIGHTS datagrams.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1abf0c71

vfs: Export file system uuid via /proc/<pid>/mountinfo · 93f1c20b

由 Aneesh Kumar K.V 提交于 1月 29, 2011

We add a per superblock uuid field. File systems should
update the uuid in the fill_super callback
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

93f1c20b

vfs: Add name to file handle conversion support · 990d6c2d

由 Aneesh Kumar K.V 提交于 1月 29, 2011

The syscall also return mount id which can be used
to lookup file system specific information such as uuid
in /proc/<pid>/mountinfo
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

990d6c2d

14 3月, 2011 3 次提交

clean statfs-like syscalls up · c8b91acc

由 Al Viro 提交于 3月 12, 2011

New helpers: user_statfs() and fd_statfs(), taking userland pathname and
descriptor resp. and filling struct kstatfs.  Syscalls of statfs family
(native, compat and foreign - osf and hpux on alpha and parisc resp.)
switched to those.  Removes some boilerplate code, simplifies cleanup
on errors...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c8b91acc

open-style analog of vfs_path_lookup() · 73d049a4

由 Al Viro 提交于 3月 11, 2011

new function: file_open_root(dentry, mnt, name, flags) opens the file
vfs_path_lookup would arrive to.

Note that name can be empty; in that case the usual requirement that
dentry should be a directory is lifted.

open-coded equivalents switched to it, may_open() got down exactly
one caller and became static.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

73d049a4

switch do_filp_open() to struct open_flags · 47c805dc

由 Al Viro 提交于 2月 23, 2011

take calculation of open_flags by open(2) arguments into new helper
in fs/open.c, move filp_open() over there, have it and do_sys_open()
use that helper, switch exec.c callers of do_filp_open() to explicit
(and constant) struct open_flags.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

47c805dc

10 3月, 2011 2 次提交

block: kill off REQ_UNPLUG · 721a9602

由 Jens Axboe 提交于 3月 09, 2011

With the plugging now being explicitly controlled by the
submitter, callers need not pass down unplugging hints
to the block layer. If they want to unplug, it's because they
manually plugged on their own - in which case, they should just
unplug at will.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

721a9602

block: remove per-queue plugging · 7eaceacc

由 Jens Axboe 提交于 3月 10, 2011

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7eaceacc

24 2月, 2011 1 次提交

Fix over-zealous flush_disk when changing device size. · 93b270f7

由 NeilBrown 提交于 2月 24, 2011

There are two cases when we call flush_disk.
In one, the device has disappeared (check_disk_change) so any
data will hold becomes irrelevant.
In the oter, the device has changed size (check_disk_size_change)
so data we hold may be irrelevant.

In both cases it makes sense to discard any 'clean' buffers,
so they will be read back from the device if needed.

In the former case it makes sense to discard 'dirty' buffers
as there will never be anywhere safe to write the data.  In the
second case it *does*not* make sense to discard dirty buffers
as that will lead to file system corruption when you simply enlarge
the containing devices.

flush_disk calls __invalidate_devices.
__invalidate_device calls both invalidate_inodes and invalidate_bdev.

invalidate_inodes *does* discard I_DIRTY inodes and this does lead
to fs corruption.

invalidate_bev *does*not* discard dirty pages, but I don't really care
about that at present.

So this patch adds a flag to __invalidate_device (calling it
__invalidate_device2) to indicate whether dirty buffers should be
killed, and this is passed to invalidate_inodes which can choose to
skip dirty inodes.

flusk_disk then passes true from check_disk_change and false from
check_disk_size_change.

dm avoids tripping over this problem by calling i_size_write directly
rathher than using check_disk_size_change.

md does use check_disk_size_change and so is affected.

This regression was introduced by commit 608aeef1 which causes
check_disk_size_change to call flush_disk, so it is suitable for any
kernel since 2.6.27.

Cc: stable@kernel.org
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Cc: Andrew Patterson <andrew.patterson@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NNeilBrown <neilb@suse.de>

93b270f7