提交 · 2bfc96a127bc1cc94d26bfaa40159966064f9c8c · openeuler / Kernel

18 8月, 2010 1 次提交

由 Nick Piggin 提交于 8月 18, 2010

fs: brlock vfsmount_lock

Use a brlock for the vfsmount lock. It must be taken for write whenever
modifying the mount hash or associated fields, and may be taken for read when
performing mount hash lookups.

A new lock is added for the mnt-id allocator, so it doesn't need to take
the heavy vfsmount write-lock.

The number of atomics should remain the same for fastpath rlock cases, though
code would be slightly slower due to per-cpu access. Scalability is not not be
much improved in common cases yet, due to other locks (ie. dcache_lock) getting
in the way. However path lookups crossing mountpoints should be one case where
scalability is improved (currently requiring the global lock).

The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
Altix system (high latency to remote nodes), a simple umount microbenchmark
(mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
took 6.8s, afterwards took 7.1s, about 5% slower.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

99b7db7b

11 8月, 2010 2 次提交

vfs: remove unused MNT_STRICTATIME · 532490f0

由 Miklos Szeredi 提交于 8月 02, 2010

Commit d0adde57 added MNT_STRICTATIME
but it isn't actually used (MS_STRICTATIME clears MNT_RELATIME and
MNT_NOATIME rather than setting any mount flag).
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

532490f0

vfs: add helpers to get root and pwd · f7ad3c6b

由 Miklos Szeredi 提交于 8月 10, 2010

Add three helpers that retrieve a refcounted copy of the root and cwd
from the supplied fs_struct.

 get_fs_root()
 get_fs_pwd()
 get_fs_root_and_pwd()
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f7ad3c6b

10 8月, 2010 1 次提交

Fix sget() race with failing mount · 7a4dec53

由 Al Viro 提交于 8月 09, 2010

If sget() finds a matching superblock being set up, it'll
grab an active reference to it and grab s_umount.  That's
fine - we'll wait for completion of foofs_get_sb() that way.
However, if said foofs_get_sb() fails we'll end up holding
the halfway-created superblock.  deactivate_locked_super()
called by foofs_get_sb() will just unlock the sucker since
we are holding another active reference to it.

What we need is a way to tell if superblock has been successfully
set up.  Unfortunately, neither ->s_root nor the check for
MS_ACTIVE quite fit.  Cheap and easy way, suitable for backport:
new flag set by the (only) caller of ->get_sb().  If that flag
isn't present by the time sget() grabbed s_umount on preexisting
superblock it has found, it's seeing a stillborn and should
just bury it with deactivate_locked_super() (and repeat the search).

Longer term we want to set that flag in ->get_sb() instances (and
check for it to distinguish between "sget() found us a live sb"
and "sget() has allocated an sb, we need to set it up" in there,
instead of checking ->s_root as we do now).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org

7a4dec53

28 7月, 2010 2 次提交

fsnotify: Infrastructure for per-mount watches · ca9c726e

由 Andreas Gruenbacher 提交于 12月 17, 2009

Per-mount watches allow groups to listen to fsnotify events on an entire
mount.  This patch simply adds and initializes the fields needed in the
vfsmount struct to make this happen.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

ca9c726e

fsnotify/vfsmount: add fsnotify fields to struct vfsmount · 2504c5d6

由 Andreas Gruenbacher 提交于 12月 17, 2009

This patch adds the list and mask fields needed to support vfsmount marks.
These are the same fields fsnotify needs on an inode.  They are not used,
just declared and we note where the cleanup hook should be (the function is
not yet defined)
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

2504c5d6

15 5月, 2010 1 次提交

Fix the regression created by "set S_DEAD on unlink()..." commit · d83c49f3

由 Al Viro 提交于 4月 30, 2010

1) i_flags simply doesn't work for mount/unlink race prevention;
we may have many links to file and rm on one of those obviously
shouldn't prevent bind on top of another later on.  To fix it
right way we need to mark _dentry_ as unsuitable for mounting
upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and
i_mutex on the inode in question.  Set it (with dont_mount(dentry))
in unlink/rmdir/etc., check (with cant_mount(dentry)) in places
in namespace.c that used to check for S_DEAD.  Setting S_DEAD
is still needed in places where we used to set it (for directories
getting killed), since we rely on it for readdir/rmdir race
prevention.

2) rename()/mount() protection has another bogosity - we unhash
the target before we'd checked that it's not a mountpoint.  Fixed.

3) ancient bogosity in pivot_root() - we locked i_mutex on the
right directory, but checked S_DEAD on the different (and wrong)
one.  Noticed and fixed.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d83c49f3

12 4月, 2010 6 次提交

security: remove dead hook sb_post_pivotroot · 91a9420f

由 Eric Paris 提交于 4月 07, 2010

Unused hook.  Remove.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

91a9420f

security: remove dead hook sb_post_addmount · 3db29101

由 Eric Paris 提交于 4月 07, 2010

Unused hook.  Remove.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

3db29101

security: remove dead hook sb_post_remount · 82dab104

由 Eric Paris 提交于 4月 07, 2010

Unused hook.  Remove.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

82dab104

security: remove dead hook sb_umount_busy · 4b61d12c

由 Eric Paris 提交于 4月 07, 2010

Unused hook.  Remove.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

4b61d12c

security: remove dead hook sb_umount_close · 231923bd

由 Eric Paris 提交于 4月 07, 2010

Unused hook.  Remove.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

231923bd

security: remove sb_check_sb hooks · 35363310

由 Eric Paris 提交于 4月 07, 2010

Unused hook.  Remove it.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

35363310

04 3月, 2010 7 次提交

vfs: add NOFOLLOW flag to umount(2) · db1f05bb

由 Miklos Szeredi 提交于 2月 10, 2010

Add a new UMOUNT_NOFOLLOW flag to umount(2).  This is needed to prevent
symlink attacks in unprivileged unmounts (fuse, samba, ncpfs).

Additionally, return -EINVAL if an unknown flag is used (and specify
an explicitly unused flag: UMOUNT_UNUSED).  This makes it possible for
the caller to determine if a flag is supported or not.

CC: Eugene Teo <eugene@redhat.com>
CC: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

db1f05bb

A
Mirror MS_KERNMOUNT in ->mnt_flags · 8089352a
由 Al Viro 提交于 2月 05, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
8089352a

get rid of useless vfsmount_lock use in put_mnt_ns() · d498b25a

由 Al Viro 提交于 2月 05, 2010

It hadn't been needed since we'd sanitized the logics in
mark_mounts_for_expiry() (which, in turn, used to be a
rudiment of bad old times when namespace_sem was per-ns).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d498b25a

A
take check for new events in namespace (guts of mounts_poll()) to namespace.c · 9f5596af
由 Al Viro 提交于 2月 05, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
9f5596af

new helper: iterate_mounts() · 1f707137

由 Al Viro 提交于 1月 30, 2010

apply function to vfsmounts in set returned by collect_mounts(),
stop if it returns non-zero.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1f707137

VFS: Clean up shared mount flag propagation · 495d6c9c

由 Valerie Aurora 提交于 1月 26, 2010

The handling of mount flags in set_mnt_shared() got a little tangled
up during previous cleanups, with the following problems:

* MNT_PNODE_MASK is defined as a literal constant when it should be a
bitwise xor of other MNT_* flags
* set_mnt_shared() clears and then sets MNT_SHARED (part of MNT_PNODE_MASK)
* MNT_PNODE_MASK could use a comment in mount.h
* MNT_PNODE_MASK is a terrible name, change to MNT_SHARED_MASK

This patch fixes these problems.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

495d6c9c

Kill CL_PROPAGATION, sanitize fs/pnode.c:get_source() · 796a6b52

由 Al Viro 提交于 1月 16, 2010

First of all, get_source() never results in CL_PROPAGATION
alone.  We either get CL_MAKE_SHARED (for the continuation
of peer group) or CL_SLAVE (slave that is not shared) or both
(beginning of peer group among slaves).  Massage the code to
make that explicit, kill CL_PROPAGATION test in clone_mnt()
(nothing sets CL_MAKE_SHARED without CL_PROPAGATION and in
clone_mnt() we are checking CL_PROPAGATION after we'd found
that there's no CL_SLAVE, so the check for CL_MAKE_SHARED
would do just as well).

Fix comments, while we are at it...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

796a6b52

17 1月, 2010 4 次提交

do_add_mount() should sanitize mnt_flags · 27d55f1f

由 Al Viro 提交于 1月 16, 2010

MNT_WRITE_HOLD shouldn't leak into new vfsmount and neither
should MNT_SHARED (the latter will be set properly, along with
the rest of shared-subtree data structures)
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

27d55f1f

mnt_flags fixes in do_remount() · 7b43a79f

由 Al Viro 提交于 1月 16, 2010

* need vfsmount_lock over modifying it
* need to preserve MNT_SHARED/MNT_UNBINDABLE
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7b43a79f

A
attach_recursive_mnt() needs to hold vfsmount_lock over set_mnt_shared() · df1a1ad2
由 Al Viro 提交于 1月 16, 2010
```
race in mnt_flags update
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
df1a1ad2

may_umount() needs namespace_sem · 8ad08d8a

由 Al Viro 提交于 1月 16, 2010

otherwise it races with clone_mnt() changing mnt_share/mnt_slaves
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8ad08d8a

18 12月, 2009 1 次提交

Revert "fix mismerge with Trond's stuff (create_mnt_ns() export is gone now)" · a2770d86

由 Linus Torvalds 提交于 12月 17, 2009

This reverts commit e9496ff4. Quoth Al:

 "it's dependent on a lot of other stuff not currently in mainline
  and badly broken with current fs/namespace.c.  Sorry, badly
  out-of-order cherry-pick from old queue.

  PS: there's a large pending series reworking the refcounting and
  lifetime rules for vfsmounts that will, among other things, allow to
  rip a subtree away _without_ dissolving connections in it, to be
  garbage-collected when all active references are gone.  It's
  considerably saner wrt "is the subtree busy" logics, but it's nowhere
  near being ready for merge at the moment; this changeset is one of the
  things becoming possible with that sucker, but it certainly shouldn't
  have been picked during this cycle.  My apologies..."
Noticed-by: NEric Paris <eparis@redhat.com>
Requested-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a2770d86

17 12月, 2009 1 次提交
- A
  fix mismerge with Trond's stuff (create_mnt_ns() export is gone now) · e9496ff4
  由 Al Viro 提交于 8月 09, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  e9496ff4
12 10月, 2009 1 次提交

LSM: Pass original mount flags to security_sb_mount(). · a27ab9f2

由 Tetsuo Handa 提交于 10月 04, 2009

This patch allows LSM modules to determine based on original mount flags
passed to mount(). A LSM module can get masked mount flags (if needed) by

	flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE |
		   MS_NOATIME | MS_NODIRATIME | MS_RELATIME| MS_KERNMOUNT |
		   MS_STRICTATIME);
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NJames Morris <jmorris@namei.org>

a27ab9f2

24 9月, 2009 1 次提交

fs: fix overflow in sys_mount() for in-kernel calls · eca6f534

由 Vegard Nossum 提交于 9月 18, 2009

sys_mount() reads/copies a whole page for its "type" parameter.  When
do_mount_root() passes a kernel address that points to an object which is
smaller than a whole page, copy_mount_options() will happily go past this
memory object, possibly dereferencing "wild" pointers that could be in any
state (hence the kmemcheck warning, which shows that parts of the next
page are not even allocated).

(The likelihood of something going wrong here is pretty low -- first of
all this only applies to kernel calls to sys_mount(), which are mostly
found in the boot code.  Secondly, I guess if the page was not mapped,
exact_copy_from_user() _would_ in fact handle it correctly because of its
access_ok(), etc.  checks.)

But it is much nicer to avoid the dubious reads altogether, by stopping as
soon as we find a NUL byte.  Is there a good reason why we can't do
something like this, using the already existing strndup_from_user()?

[akpm@linux-foundation.org: make copy_mount_string() static]
[AV: fix compat mount breakage, which involves undoing akpm's change above]
Reported-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: Nal <al@dizzy.pdmi.ras.ru>

eca6f534

08 8月, 2009 1 次提交

vfs: mnt_want_write_file(): fix special file handling · 2d8dd38a

由 OGAWA Hirofumi 提交于 8月 06, 2009

I suspect that mnt_want_write_file() may have wrong assumption.  I think
mnt_want_write_file() is assuming it increments ->mnt_writers if
(file->f_mode & FMODE_WRITE).  But, if it's special_file(), it is false?
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Acked-by: NDave Hansen <dave@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d8dd38a

09 7月, 2009 1 次提交

headers: mnt_namespace.h redux · b43f3cbd

由 Alexey Dobriyan 提交于 7月 08, 2009

Fix various silly problems wrt mnt_namespace.h:

 - exit_mnt_ns() isn't used, remove it
 - done that, sched.h and nsproxy.h inclusions aren't needed
 - mount.h inclusion was need for vfsmount_lock, but no longer
 - remove mnt_namespace.h inclusion from files which don't use anything
   from mnt_namespace.h
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b43f3cbd

24 6月, 2009 2 次提交
- A
  ... and the same for vfsmount id/mount group id · f21f6220
  由 Al Viro 提交于 6月 24, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  f21f6220
- T
  VFS: Switch init_mount_tree() to use the new create_mnt_ns() helper · 3b22edc5
  由 Trond Myklebust 提交于 6月 23, 2009
```
Eliminates some duplicated code...
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  3b22edc5
23 6月, 2009 2 次提交

VFS: Add VFS helper functions for setting up private namespaces · cf8d2c11

由 Trond Myklebust 提交于 6月 22, 2009

The purpose of this patch is to improve the remote mount path lookup
support for distributed filesystems such as the NFSv4 client.

When given a mount command of the form "mount server:/foo/bar /mnt", the
NFSv4 client is required to look up the filehandle for "server:/", and
then look up each component of the remote mount path "foo/bar" in order
to find the directory that is actually going to be mounted on /mnt.
Following that remote mount path may involve following symlinks,
crossing server-side mount points and even following referrals to
filesystem volumes on other servers.

Since the standard VFS path lookup code already supports walking paths
that contain all these features (using in-kernel automounts for
following referrals) we would like to be able to reuse that rather than
duplicate the full path traversal functionality in the NFSv4 client code.

This patch therefore defines a VFS helper function create_mnt_ns(), that
sets up a temporary filesystem namespace and attaches a root filesystem to
it. It exports the create_mnt_ns() and put_mnt_ns() function for use by
filesystem modules.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf8d2c11

VFS: Uninline the function put_mnt_ns() · 616511d0

由 Trond Myklebust 提交于 6月 22, 2009

In order to allow modules to use it without having to export vfsmount_lock.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

616511d0

12 6月, 2009 6 次提交

A
Push BKL down into do_remount_sb() · 4aa98cf7
由 Al Viro 提交于 5月 08, 2009
```
[folded fix from Jiri Slaby]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
4aa98cf7
A
Push BKL down beyond VFS-only parts of do_mount() · 7f78d4cd
由 Al Viro 提交于 5月 08, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
7f78d4cd
A
Push BKL into do_mount() · 6fac98dd
由 Al Viro 提交于 5月 08, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6fac98dd

dcache: extrace and use d_unlinked() · f3da392e

由 Alexey Dobriyan 提交于 5月 04, 2009

d_unlinked() will be used in middle-term to ban checkpointing when opened
but unlinked file is detected, and in long term, to detect such situation
and special case on it.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f3da392e

fs: introduce mnt_clone_write · 96029c4e

由 npiggin@suse.de 提交于 4月 26, 2009

This patch speeds up lmbench lat_mmap test by about another 2% after the
first patch.

Before:
 avg = 462.286
 std = 5.46106

After:
 avg = 453.12
 std = 9.58257

(50 runs of each, stddev gives a reasonable confidence)

It does this by introducing mnt_clone_write, which avoids some heavyweight
operations of mnt_want_write if called on a vfsmount which we know already
has a write count; and mnt_want_write_file, which can call mnt_clone_write
if the file is open for write.

After these two patches, mnt_want_write and mnt_drop_write go from 7% on
the profile down to 1.3% (including mnt_clone_write).

[AV: mnt_want_write_file() should take file alone and derive mnt from it;
not only all callers have that form, but that's the only mnt about which
we know that it's already held for write if file is opened for write]

Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

96029c4e

fs: mnt_want_write speedup · d3ef3d73

由 npiggin@suse.de 提交于 4月 26, 2009

This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up
basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it.
A microbenchmark yes, but it exercises some important paths in the mm.

Before:
 avg = 501.9
 std = 14.7773

After:
 avg = 462.286
 std = 5.46106

(50 runs of each, stddev gives a reasonable confidence, but there is quite
a bit of variation there still)

It does this by removing the complex per-cpu locking and counter-cache and
replaces it with a percpu counter in struct vfsmount. This makes the code
much simpler, and avoids spinlocks (although the msync is still pretty
costly, unfortunately). It results in about 900 bytes smaller code too. It
does increase the size of a vfsmount, however.

It should also give a speedup on large systems if CPUs are frequently operating
on different mounts (because the existing scheme has to operate on an atomic in
the struct vfsmount when switching between mounts). But I'm most interested in
the single threaded path performance for the moment.

[AV: minor cleanup]

Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d3ef3d73

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功