提交 · 1aed3e4204dd787d53b3cd6363eb63bb4900c38e · xiphi1978 / linux

18 3月, 2011 1 次提交
- A
  lose 'mounting_here' argument in ->d_manage() · 1aed3e42
  由 Al Viro 提交于 3月 18, 2011
```
it's always false...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  1aed3e42
17 3月, 2011 1 次提交

由 Al Viro 提交于 3月 16, 2011

This is an ex-parrot.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1a102ff9

16 1月, 2011 4 次提交

Unexport do_add_mount() and add in follow_automount(), not ->d_automount() · ea5b778a

由 David Howells 提交于 1月 14, 2011

Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
added rather than calling do_add_mount() itself. follow_automount() will then
do the addition.

This slightly complicates things as ->d_automount() normally wants to add the
new vfsmount to an expiration list and start an expiration timer. The problem
with that is that the vfsmount will be deleted if it has a refcount of 1 and
the timer will not repeat if the expiration list is empty.

To this end, we require the vfsmount to be returned from d_automount() with a
refcount of (at least) 2. One of these refs will be dropped unconditionally.
In addition, follow_automount() must get a 3rd ref around the call to
do_add_mount() lest it eat a ref and return an error, leaving the mount we
have open to being expired as we would otherwise have only 1 ref on it.

d_automount() should also add the the vfsmount to the expiration list (by
calling mnt_set_expiry()) and start the expiration timer before returning, if
this mechanism is to be used. The vfsmount will be unlinked from the
expiration list by follow_automount() if do_add_mount() fails.

This patch also fixes the call to do_add_mount() for AFS to propagate the mount
flags from the parent vfsmount.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ea5b778a

Allow d_manage() to be used in RCU-walk mode · ab90911f

由 David Howells 提交于 1月 14, 2011

Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well
as when it is in Ref-walk mode.  This permits __follow_mount_rcu() to call
d_manage() directly.  d_manage() needs a parameter to indicate that it is in
RCU-walk mode as it isn't allowed to sleep if in that mode (but should return
-ECHILD instead).

autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon
accesses it and otherwise request dropping back to ref-walk mode.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ab90911f

Add a dentry op to allow processes to be held during pathwalk transit · cc53ce53

由 David Howells 提交于 1月 14, 2011

Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
sleep when it tries to transit away from one of that filesystem's directories
during a pathwalk.  The operation is keyed off a new dentry flag
(DCACHE_MANAGE_TRANSIT).

The filesystem is allowed to be selective about which processes it holds and
which it permits to continue on or prohibits from transiting from each flagged
directory.  This will allow autofs to hold up client processes whilst letting
its userspace daemon through to maintain the directory or the stuff behind it
or mounted upon it.

The ->d_manage() dentry operation:

	int (*d_manage)(struct path *path, bool mounting_here);

takes a pointer to the directory about to be transited away from and a flag
indicating whether the transit is undertaken by do_add_mount() or
do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

It should return 0 if successful and to let the process continue on its way;
-EISDIR to prohibit the caller from skipping to overmounted filesystems or
automounting, and to use this directory; or some other error code to return to
the user.

->d_manage() is called with namespace_sem writelocked if mounting_here is true
and no other locks held, so it may sleep.  However, if mounting_here is true,
it may not initiate or wait for a mount or unmount upon the parameter
directory, even if the act is actually performed by userspace.

Within fs/namei.c, follow_managed() is extended to check with d_manage() first
on each managed directory, before transiting away from it or attempting to
automount upon it.

follow_down() is renamed follow_down_one() and should only be used where the
filesystem deliberately intends to avoid management steps (e.g. autofs).

A new follow_down() is added that incorporates the loop done by all other
callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
and CIFS do use it, their use is removed by converting them to use
d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
also takes an extra parameter to indicate if it is being called from mount code
(with namespace_sem writelocked) which it passes to d_manage().  follow_down()
ignores automount points so that it can be used to mount on them.

__follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
that determine whether to abort or not itself.  That would allow the autofs
daemon to continue on in rcu-walk mode.

Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
required as every tranist from that directory will cause d_manage() to be
invoked.  It can always be set again when necessary.

==========================
WHAT THIS MEANS FOR AUTOFS
==========================

Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
trigger the automounting of indirect mounts, and both of these can be called
with i_mutex held.

autofs knows that the i_mutex will be held by the caller in lookup(), and so
can drop it before invoking the daemon - but this isn't so for d_revalidate(),
since the lock is only held on _some_ of the code paths that call it.  This
means that autofs can't risk dropping i_mutex from its d_revalidate() function
before it calls the daemon.

The bug could manifest itself as, for example, a process that's trying to
validate an automount dentry that gets made to wait because that dentry is
expired and needs cleaning up:

	mkdir         S ffffffff8014e05a     0 32580  24956
	Call Trace:
	 [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
	 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58
	 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
	 [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
	 [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
	 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
	 [<ffffffff80057a2f>] lookup_create+0x46/0x80
	 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4

versus the automount daemon which wants to remove that dentry, but can't
because the normal process is holding the i_mutex lock:

	automount     D ffffffff8014e05a     0 32581      1              32561
	Call Trace:
	 [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
	 [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
	 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
	 [<ffffffff800e6d55>] do_rmdir+0x77/0xde
	 [<ffffffff8005d229>] tracesys+0x71/0xe0
	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

which means that the system is deadlocked.

This patch allows autofs to hold up normal processes whilst the daemon goes
ahead and does things to the dentry tree behind the automouter point without
risking a deadlock as almost no locks are held in d_manage() and none in
d_automount().
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Was-Acked-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cc53ce53

Add a dentry op to handle automounting rather than abusing follow_link() · 9875cf80

由 David Howells 提交于 1月 14, 2011

Add a dentry op (d_automount) to handle automounting directories rather than
abusing the follow_link() inode operation.  The operation is keyed off a new
dentry flag (DCACHE_NEED_AUTOMOUNT).

This also makes it easier to add an AT_ flag to suppress terminal segment
automount during pathwalk and removes the need for the kludge code in the
pathwalk algorithm to handle directories with follow_link() semantics.

The ->d_automount() dentry operation:

	struct vfsmount *(*d_automount)(struct path *mountpoint);

takes a pointer to the directory to be mounted upon, which is expected to
provide sufficient data to determine what should be mounted.  If successful, it
should return the vfsmount struct it creates (which it should also have added
to the namespace using do_add_mount() or similar).  If there's a collision with
another automount attempt, NULL should be returned.  If the directory specified
by the parameter should be used directly rather than being mounted upon,
-EISDIR should be returned.  In any other case, an error code should be
returned.

The ->d_automount() operation is called with no locks held and may sleep.  At
this point the pathwalk algorithm will be in ref-walk mode.

Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is
added to handle mountpoints.  It will return -EREMOTE if the automount flag was
set, but no d_automount() op was supplied, -ELOOP if we've encountered too many
symlinks or mountpoints, -EISDIR if the walk point should be used without
mounting and 0 if successful.  The path will be updated to point to the mounted
filesystem if a successful automount took place.

__follow_mount() is replaced by follow_managed() which is more generic
(especially with the patch that adds ->d_manage()).  This handles transits from
directories during pathwalk, including automounting and skipping over
mountpoints (and holding processes with the next patch).

__follow_mount_rcu() will jump out of RCU-walk mode if it encounters an
automount point with nothing mounted on it.

follow_dotdot*() does not handle automounts as you don't want to trigger them
whilst following "..".

I've also extracted the mount/don't-mount logic from autofs4 and included it
here.  It makes the mount go ahead anyway if someone calls open() or creat(),
tries to traverse the directory, tries to chdir/chroot/etc. into the directory,
or sticks a '/' on the end of the pathname.  If they do a stat(), however,
they'll only trigger the automount if they didn't also say O_NOFOLLOW.

I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their
inodes as automount points.  This flag is automatically propagated to the
dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate().  This saves NFS and could
save AFS a private flag bit apiece, but is not strictly necessary.  It would be
preferable to do the propagation in d_set_d_op(), but that doesn't normally
have access to the inode.

[AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu()
succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after
that, rather than just returning with ungrabbed *path]
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Was-Acked-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9875cf80

14 1月, 2011 1 次提交
- N
  fs: small rcu-walk documentation fixes · a82416da
  由 Nick Piggin 提交于 1月 14, 2011
```
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
```
  a82416da
07 1月, 2011 5 次提交

N
fs: provide rcu-walk aware permission i_ops · b74c79e9
由 Nick Piggin 提交于 1月 07, 2011
```
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
```
b74c79e9

fs: rcu-walk aware d_revalidate method · 34286d66

由 Nick Piggin 提交于 1月 07, 2011

Require filesystems be aware of .d_revalidate being called in rcu-walk
mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
-ECHILD from all implementations.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

34286d66

fs: change d_hash for rcu-walk · b1e6a015

由 Nick Piggin 提交于 1月 07, 2011

Change d_hash so it may be called from lock-free RCU lookups. See similar
patch for d_compare for details.

For in-tree filesystems, this is just a mechanical change.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b1e6a015

fs: change d_compare for rcu-walk · 621e155a

由 Nick Piggin 提交于 1月 07, 2011

Change d_compare so it may be called from lock-free RCU lookups. This
does put significant restrictions on what may be done from the callback,
however there don't seem to have been any problems with in-tree fses.
If some strange use case pops up that _really_ cannot cope with the
rcu-walk rules, we can just add new rcu-unaware callbacks, which would
cause name lookup to drop out of rcu-walk mode.

For in-tree filesystems, this is just a mechanical change.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

621e155a

fs: change d_delete semantics · fe15ce44

由 Nick Piggin 提交于 1月 07, 2011

Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fe15ce44

03 12月, 2010 1 次提交

Documentation/filesystems/vfs.txt: fix ->repeasepage() description · 4fe65cab

由 Andrew Morton 提交于 12月 02, 2010

->releasepage() does not remove the page from the mapping.
Acked-by: NNeil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4fe65cab

02 12月, 2010 1 次提交

Call the filesystem back whenever a page is removed from the page cache · 6072d13c

由 Linus Torvalds 提交于 12月 01, 2010

NFS needs to be able to release objects that are stored in the page
cache once the page itself is no longer visible from the page cache.

This patch adds a callback to the address space operations that allows
filesystems to perform page cleanups once the page has been removed
from the page cache.

Original patch by: Linus Torvalds <torvalds@linux-foundation.org>
[trondmy: cover the cases of invalidate_inode_pages2() and
          truncate_inode_pages()]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

6072d13c

14 8月, 2010 1 次提交

bkl: Remove locked .ioctl file operation · b19dd42f

由 Arnd Bergmann 提交于 7月 04, 2010

The last user is gone, so we can safely remove this
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

b19dd42f

28 5月, 2010 2 次提交

fs: introduce new truncate sequence · 7bb46a67

由 npiggin@suse.de 提交于 5月 27, 2010

Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
setattr > vmtruncate > truncate, have filesystems call their truncate sequence
from ->setattr if filesystem specific operations are required. vmtruncate is
deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
previously should be used.

simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence. Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.

simple_setsize is also introduced to perform just the ATTR_SIZE portion
of simple_setattr (ie. changing i_size and trimming pagecache).

To implement the new truncate sequence:
- filesystem specific manipulations (eg freeing blocks) must be done in
  the setattr method rather than ->truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
  the event of write failure after allocation, so this must be performed
  in the fs code.
- convert usage of helpers block_write_begin, nobh_write_begin,
  cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
  variants. These avoid calling vmtruncate to trim blocks (see previous).
- inode_setattr should not be used. generic_setattr is a new function
  to be used to copy simple attributes into the generic inode.
- make use of the better opportunity to handle errors with the new sequence.

Big problem with the previous calling sequence: the filesystem is not called
until i_size has already changed.  This means it is not allowed to fail the
call, and also it does not know what the previous i_size was. Also, generic
code calling vmtruncate to truncate allocated blocks in case of error had
no good way to return a meaningful error (or, for example, atomically handle
block deallocation).

Cc: Christoph Hellwig <hch@lst.de>
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7bb46a67

drop unused dentry argument to ->fsync · 7ea80859

由 Christoph Hellwig 提交于 5月 26, 2010

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7ea80859

23 4月, 2010 1 次提交

Documentation/: it's -> its where appropriate · a33f3224

由 Francis Galiegue 提交于 4月 23, 2010

Fix obvious cases of "it's" being used when "its" was meant.
Signed-off-by: NFrancis Galiegue <fgaliegue@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

a33f3224

10 12月, 2009 1 次提交

kill wait_on_page_writeback_range · 94004ed7

由 Christoph Hellwig 提交于 9月 30, 2009

All callers really want the more logical filemap_fdatawait_range interface,
so convert them to use it and merge wait_on_page_writeback_range into
filemap_fdatawait_range.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

94004ed7

16 9月, 2009 1 次提交

HWPOISON: Define a new error_remove_page address space op for async truncation · 25718736

由 Andi Kleen 提交于 9月 16, 2009

Truncating metadata pages is not safe right now before
we haven't audited all file systems.

To enable truncation only for data address space define
a new address_space callback error_remove_page.

This is used for memory_failure.c memory error handling.

This can be then set to truncate_inode_page()

This patch just defines the new operation and adds documentation.

Callers and users come in followon patches.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>

25718736

21 4月, 2009 1 次提交

Documentation/filesystems: remove out of date reference to BKL being held · 66672fef

由 Adrian McMenamin 提交于 4月 20, 2009

Documentation/filesystems/vfs.txt incorrectly states that the kernel is
locked during the call to statfs (Documentation/filesystems/Locking
correctly says it is not). This patch removes the offending sentence.

remove reference to BKL being held in statfs
Signed-off-by: NAdrian McMenamin <adrian@mcmen.demon.co.uk>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

66672fef

10 1月, 2009 1 次提交

filesystem freeze: add error handling of write_super_lockfs/unlockfs · c4be0c1d

由 Takashi Sato 提交于 1月 09, 2009

Currently, ext3 in mainline Linux doesn't have the freeze feature which
suspends write requests.  So, we cannot take a backup which keeps the
filesystem's consistency with the storage device's features (snapshot and
replication) while it is mounted.

In many case, a commercial filesystem (e.g.  VxFS) has the freeze feature
and it would be used to get the consistent backup.

If Linux's standard filesystem ext3 has the freeze feature, we can do it
without a commercial filesystem.

So I have implemented the ioctls of the freeze feature.
I think we can take the consistent backup with the following steps.
1. Freeze the filesystem with the freeze ioctl.
2. Separate the replication volume or create the snapshot
   with the storage device's feature.
3. Unfreeze the filesystem with the unfreeze ioctl.
4. Take the backup from the separated replication volume
   or the snapshot.

This patch:

VFS:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they can return an error.
Rename write_super_lockfs and unlockfs of the super block operation
freeze_fs and unfreeze_fs to avoid a confusion.

ext3, ext4, xfs, gfs2, jfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that write_super_lockfs returns an error if needed,
and unlockfs always returns 0.

reiserfs:
Changed the type of write_super_lockfs and unlockfs from "void"
to "int" so that they always return 0 (success) to keep a current behavior.
Signed-off-by: NTakashi Sato <t-sato@yk.jp.nec.com>
Signed-off-by: NMasayuki Hamaguchi <m-hamaguchi@ys.jp.nec.com>
Cc: <xfs-masters@oss.sgi.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Alasdair G Kergon <agk@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c4be0c1d

01 1月, 2009 2 次提交

kill ->dir_notify() · 6badd79b

由 Al Viro 提交于 12月 26, 2008

Remove the hopelessly misguided ->dir_notify(). The only instance (cifs)
has been broken by design from the very beginning; the objects it creates
are never destroyed, keep references to struct file they can outlive, nothing
that could possibly evict them exists on close(2) path *and* no locking
whatsoever is done to prevent races with close(), should the previous, er,
deficiencies someday be dealt with.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6badd79b

correct wrong function name of d_put in kernel document and source comment · be42c4c4

由 Zhaolei 提交于 12月 01, 2008

no function named d_put(), it should be dput().

Impact: fix document and comment, no functionality changed
Signed-off-by: NZhao Lei <zhaolei@cn.fuijtsu.com>
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

be42c4c4

31 10月, 2008 1 次提交

fs: remove prepare_write/commit_write · 4e02ed4b

由 Nick Piggin 提交于 10月 29, 2008

Nothing uses prepare_write or commit_write. Remove them from the tree
completely.

[akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting]
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4e02ed4b

27 7月, 2008 1 次提交

Documentation cleanup: trivial misspelling, punctuation, and grammar corrections. · d9195881

由 Matt LaPlante 提交于 7月 25, 2008

Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d9195881

07 5月, 2008 1 次提交

[PATCH] kill ->put_inode · 33dcdac2

由 Christoph Hellwig 提交于 4月 29, 2008

And with that last patch to affs killing the last put_inode instance we
can finally, after many years of transition kill this racy and awkward
interface.

(It's kinda funny that even the description in
Documentation/filesystems/vfs.txt was entirely wrong..)

Also remove a very misleading comment above the defintion of
struct super_operations.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

33dcdac2

09 2月, 2008 1 次提交

mount options: add documentation · f84e3f52

由 Miklos Szeredi 提交于 2月 08, 2008

This series addresses the problem of showing mount options in
/proc/mounts.

Several filesystems which use mount options, have not implemented a
.show_options superblock operation.  Several others have implemented
this callback, but have not kept it fully up to date with the parsed
options.

Q: Why do we need correct option showing in /proc/mounts?
A: We want /proc/mounts to fully replace /etc/mtab.  The reasons for
   this are:
    - unprivileged mounters won't be able to update /etc/mtab
    - /etc/mtab doesn't work with private mount namespaces
    - /etc/mtab can become out-of-sync with reality

Q: Can't this be done, so that filesystems need not bother with
   implementing a .show_mounts callback, and keeping it up to date?
A: Only in some cases.  Certain filesystems allow modification of a
   subset of options in their remount_fs method.  It is not possible
   to take this into account without knowing exactly how the
   filesystem handles options.

For the simple case (no remount or remount resets all options) the
patchset introduces two helpers:

  generic_show_options()
  save_mount_options()

These can also be used to emulate the old /etc/mtab behavior, until
proper support is added.  Even if this is not 100% correct, it's still
better than showing no options at all.

The following patches fix up most in-tree filesystems, some have been
compile tested only, some have been reviewed and acked by the
maintainer.

Table displaying status of all in-kernel filesystems:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
legend:

  none - fs has options, but doesn't define ->show_options()
  some - fs defines ->show_options(), but some only options are shown
  good - fs shows all options
  noopt - fs does not have options
  patch - a patch will be posted
  merged - a patch has been merged by subsystem maintainer

9p          good
adfs        patch
affs        patch
afs         patch
autofs      patch
autofs4     patch
befs        patch
bfs         noopt
cifs        some
coda        noopt
configfs    noopt
cramfs      noopt
debugfs     noopt
devpts      patch
ecryptfs    good
efs         noopt
ext2        patch
ext3        good
ext4        merged
fat         patch
freevxfs    noopt
fuse        patch
fusectl	    noopt
gfs2        good
gfs2meta    noopt
hfs         good
hfsplus     good
hostfs      patch
hpfs        patch
hppfs       noopt
hugetlbfs   patch
isofs       patch
jffs2       noopt
jfs         merged
minix       noopt
msdos       ->fat
ncpfs       patch
nfs         some
nfsd        noopt
ntfs        good
ocfs2       good
ocfs2/dlmfs noopt
openpromfs  noopt
proc        noopt
qnx4        noopt
ramfs       noopt
reiserfs    patch
romfs       noopt
smbfs       good
sysfs       noopt
sysv        noopt
udf         patch
ufs         good
vfat        ->fat
xfs         good

mm/shmem.c                                    patch
drivers/oprofile/oprofilefs.c                 noopt
drivers/infiniband/hw/ipath/ipath_fs.c        noopt
drivers/misc/ibmasm/ibmasmfs.c                noopt
drivers/usb/core (usbfs)                      merged
drivers/usb/gadget (gadgetfs)                 noopt
drivers/isdn/capi/capifs.c                    patch
kernel/cpuset.c                               noopt
fs/binfmt_misc.c                              noopt
net/sunrpc/rpc_pipe.c                         noopt
arch/powerpc/platforms/cell/spufs             patch
arch/s390/hypfs                               good
ipc/mqueue.c                                  noopt
security (securityfs)                         noopt
security/selinux/selinuxfs.c                  noopt
kernel/cgroup.c                               good
security/smack/smackfs.c                      noopt

in -mm:

reiser4     some
unionfs     good
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

This patch:

Document the rules for handling mount options in the .show_options
super operation.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f84e3f52

08 2月, 2008 1 次提交

iget: remove iget() and the read_inode() super op as being obsolete · 12debc42

由 David Howells 提交于 2月 07, 2008

Remove the old iget() call and the read_inode() superblock operation it uses
as these are really obsolete, and the use of read_inode() does not produce
proper error handling (no distinction between ENOMEM and EIO when marking an
inode bad).

Furthermore, this removes the temptation to use iget() to find an inode by
number in a filesystem from code outside that filesystem.

iget_locked() should be used instead.  A new function is added in an earlier
patch (iget_failed) that is to be called to mark an inode as bad, unlock it
and release it should the get routine fail.  Mark iget() and read_inode() as
being obsolete and remove references to them from the documentation.

Typically a filesystem will be modified such that the read_inode function
becomes an internal iget function, for example the following:

	void thingyfs_read_inode(struct inode *inode)
	{
		...
	}

would be changed into something like:

	struct inode *thingyfs_iget(struct super_block *sp, unsigned long ino)
	{
		struct inode *inode;
		int ret;

		inode = iget_locked(sb, ino);
		if (!inode)
			return ERR_PTR(-ENOMEM);
		if (!(inode->i_state & I_NEW))
			return inode;

		...
		unlock_new_inode(inode);
		return inode;
	error:
		iget_failed(inode);
		return ERR_PTR(ret);
	}

and then thingyfs_iget() would be called rather than iget(), for example:

	ret = -EINVAL;
	inode = iget(sb, ino);
	if (!inode || is_bad_inode(inode))
		goto error;

becomes:

	inode = thingyfs_iget(sb, ino);
	if (IS_ERR(inode)) {
		ret = PTR_ERR(inode);
		goto error;
	}

Note that is_bad_inode() does not need to be called.  The error returned by
thingyfs_iget() should render it unnecessary.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12debc42

20 10月, 2007 1 次提交

Documentation/filesystems/vfs.txt: typo fix · bc5b1d55

由 Shaun Zinck 提交于 10月 20, 2007

Signed-off-by: NShaun Zinck <shaun.zinck@gmail.com>
Signed-off-by: NAdrian Bunk <bunk@kernel.org>

bc5b1d55

17 10月, 2007 2 次提交

fs: remove some AOP_TRUNCATED_PAGE · 55144768

由 Nick Piggin 提交于 10月 16, 2007

prepare/commit_write no longer returns AOP_TRUNCATED_PAGE since OCFS2 and
GFS2 were converted to the new aops, so we can make some simplifications
for that.

[michal.k.k.piotrowski@gmail.com: fix warning]
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: NMichal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

55144768

fs: introduce write_begin, write_end, and perform_write aops · afddba49

由 Nick Piggin 提交于 10月 16, 2007

These are intended to replace prepare_write and commit_write with more
flexible alternatives that are also able to avoid the buffered write
deadlock problems efficiently (which prepare_write is unable to do).

[mark.fasheh@oracle.com: API design contributions, code review and fixes]
[akpm@linux-foundation.org: various fixes]
[dmonakhov@sw.ru: new aop block_write_begin fix]
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: NDmitriy Monakhov <dmonakhov@openvz.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

afddba49

17 7月, 2007 2 次提交

update Documentation/filesystems/vfs.txt · 422b14c2

由 Borislav Petkov 提交于 7月 15, 2007

Update Documentation/filesystems/vfs.txt
Signed-off-by: NBorislav Petkov <bbpetkov@yahoo.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

422b14c2

update description in Documentation/filesystems/vfs.txt · 0746aec3

由 Borislav Petkov 提交于 7月 15, 2007

Update the description of struct file_system_type and get_sb() in
Documentation/filesystems/vfs.txt to match the current code.
Signed-off-by: NBorislav Petkov <bbpetkov@yahoo.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0746aec3

09 5月, 2007 1 次提交

VFS: delay the dentry name generation on sockets and pipes · c23fbb6b

由 Eric Dumazet 提交于 5月 08, 2007

1) Introduces a new method in 'struct dentry_operations'.  This method
   called d_dname() might be called from d_path() to build a pathname for
   special filesystems.  It is called without locks.

   Future patches (if we succeed in having one common dentry for all
   pipes/sockets) may need to change prototype of this method, but we now
   use : char *d_dname(struct dentry *dentry, char *buffer, int buflen);

2) Adds a dynamic_dname() helper function that eases d_dname() implementations

3) Defines d_dname method for sockets : No more sprintf() at socket
   creation.  This is delayed up to the moment someone does an access to
   /proc/pid/fd/...

4) Defines d_dname method for pipes : No more sprintf() at pipe
   creation.  This is delayed up to the moment someone does an access to
   /proc/pid/fd/...

A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a
*nice* speedup on my Pentium(M) 1.6 Ghz :

3.090 s instead of 3.450 s
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NChristoph Hellwig <hch@infradead.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c23fbb6b

21 2月, 2007 1 次提交

[PATCH] fs: fix libfs data leak · 955eff5a

由 Nick Piggin 提交于 2月 20, 2007

simple_prepare_write leaks uninitialised kernel data. This happens because
the it leaves an uninitialised "hole" over the part of the page that the
write is expected to go to. This is fine, but it then marks the page
uptodate, which means a concurrent read can come in and copy the
uninitialised memory into userspace before it written to.

Fix it by simply marking it uptodate in simple_commit_write instead, after
the hole has been filled in. This could theoretically break an fs that
uses simple_prepare_write and not simple_commit_write, and that relies on
the incorrect simple_prepare_write behaviour. Luckily, none of those
exists in the tree.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

955eff5a

04 10月, 2006 1 次提交

Documentation: remove duplicated words · 670e9f34

由 Paolo Ornati 提交于 10月 03, 2006

Remove many duplicated words under Documentation/ and do other small
cleanups.

Examples:
        "and and" --> "and"
        "in in" --> "in"
        "the the" --> "the"
        "the the" --> "to the"
        ...
Signed-off-by: NPaolo Ornati <ornati@fastwebnet.it>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

670e9f34

01 10月, 2006 1 次提交

[PATCH] Vectorize aio_read/aio_write fileop methods · 027445c3

由 Badari Pulavarty 提交于 9月 30, 2006

This patch vectorizes aio_read() and aio_write() methods to prepare for
collapsing all aio & vectored operations into one interface - which is
aio_read()/aio_write().
Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: Michael Holzheu <HOLZHEU@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

027445c3

11 7月, 2006 1 次提交

[PATCH] VFS documentation tweak · 5d8b2ebf

由 Jonathan Corbet 提交于 7月 10, 2006

As I was looking over the get_sb() changes, I stumbled across a little
mistake in the documentation updates.  Unless we're getting into an
interesting new object-oriented realm, I doubt that get_sb() should really
return "struct int"...
Signed-off-by: NJonathan Corbet <corbet@lwn.net>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5d8b2ebf

23 6月, 2006 1 次提交

[PATCH] VFS: Permit filesystem to perform statfs with a known root dentry · 726c3342

由 David Howells 提交于 6月 23, 2006

Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch.  That reduced the significance of
sb->s_root, allowing NFS to place a fake root there.  However, NFS does
require a dentry to use as a target for the statfs operation.  This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

726c3342