提交 · 3d733633a633065729c9e4e254b2e5442c00ef7e · openeuler / raspberrypi-kernel

19 4月, 2008 2 次提交

[PATCH] r/o bind mounts: track numbers of writers to mounts · 3d733633

由 Dave Hansen 提交于 2月 15, 2008

This is the real meat of the entire series.  It actually
implements the tracking of the number of writers to a mount.
However, it causes scalability problems because there can be
hundreds of cpus doing open()/close() on files on the same mnt at
the same time.  Even an atomic_t in the mnt has massive scalaing
problems because the cacheline gets so terribly contended.

This uses a statically-allocated percpu variable.  All want/drop
operations are local to a cpu as long that cpu operates on the same
mount, and there are no writer count imbalances.  Writer count
imbalances happen when a write is taken on one cpu, and released
on another, like when an open/close pair is performed on two

Upon a remount,ro request, all of the data from the percpu
variables is collected (expensive, but very rare) and we determine
if there are any outstanding writers to the mount.

I've written a little benchmark to sit in a loop for a couple of
seconds in several cpus in parallel doing open/write/close loops.

http://sr71.net/~dave/linux/openbench.c

The code in here is a a worst-possible case for this patch.  It
does opens on a _pair_ of files in two different mounts in parallel.
This should cause my code to lose its "operate on the same mount"
optimization completely.  This worst-case scenario causes a 3%
degredation in the benchmark.

I could probably get rid of even this 3%, but it would be more
complex than what I have here, and I think this is getting into
acceptable territory.  In practice, I expect writing more than 3
bytes to a file, as well as disk I/O to mask any effects that this
has.

(To get rid of that 3%, we could have an #defined number of mounts
in the percpu variable.  So, instead of a CPU getting operate only
on percpu data when it accesses only one mount, it could stay on
percpu data when it only accesses N or fewer mounts.)

[AV] merged fix for __clear_mnt_mount() stepping on freed vfsmount
Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3d733633

[PATCH] r/o bind mounts: stub functions · 8366025e

由 Dave Hansen 提交于 2月 15, 2008

This patch adds two function mnt_want_write() and mnt_drop_write().  These are
used like a lock pair around and fs operations that might cause a write to the
filesystem.

Before these can become useful, we must first cover each place in the VFS
where writes are performed with a want/drop pair.  When that is complete, we
can actually introduce code that will safely check the counts before allowing
r/w<->r/o transitions to occur.
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8366025e

28 3月, 2008 2 次提交

[PATCH] do shrink_submounts() for all fs types · c35038be

由 Al Viro 提交于 3月 22, 2008

... and take it out of ->umount_begin() instances.  Call with all locks
already taken (by do_umount()) and leave calling release_mounts() to
caller (it will do release_mounts() anyway, so we can just put into
the same list).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c35038be

[PATCH] count ghost references to vfsmounts · 7c4b93d8

由 Al Viro 提交于 3月 21, 2008

make propagate_mount_busy() exclude references from the vfsmounts
that had been isolated by umount_tree() and are just waiting for
release_mounts() to dispose of their ->mnt_parent/->mnt_mountpoint.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7c4b93d8

09 5月, 2007 1 次提交

Fix misspellings collected by members of KJ list. · beb7dd86

由 Robert P. J. Day 提交于 5月 09, 2007

Fix the misspellings of "propogate", "writting" and (oh, the shame
:-) "kenrel" in the source tree.
Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

beb7dd86

12 2月, 2007 1 次提交

[PATCH] struct vfsmount: keep mnt_count & mnt_expiry_mark away from mnt_flags · 4ba4d4c0

由 Eric Dumazet 提交于 2月 10, 2007

I noticed cache misses in touch_atime() that can be avoided if we keep
mnt_count & mnt_expiry_mark in a different cache line than mnt_flags
(mostly read)

mnt_count & mnt_expiry_mark are modified each time a file is opened/closed
in a file system.

touch_atime() is called each time a file is read, and generally needs to
read mnt_flags.

Other fields of struct vfsmount are mostly read so I chose to move
mnt_count & mnt_expiry_mark at the end of struct vfsmount.  And adding a
comment so that nobody tries to re-arrange fields to fill the holes :)

On 64bits platforms, the new offsetof(mnt_count) is 0xC0
On 32bits platforms, it is 0x60, so I didnot add a
____cacheline_aligned_in_smp because it would have a too big impact on the
size of this object (in particular if CONFIG_X86_L1_CACHE_SHIFT=7)
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4ba4d4c0

14 12月, 2006 1 次提交

[PATCH] relative atime · 47ae32d6

由 Valerie Henson 提交于 12月 13, 2006

Add "relatime" (relative atime) support.  Relative atime only updates the
atime if the previous atime is older than the mtime or ctime.  Like
noatime, but useful for applications like mutt that need to know when a
file has been read since it was last modified.

A corresponding patch against mount(8) is available at
http://userweb.kernel.org/~akpm/mount-relative-atime.txtSigned-off-by: NValerie Henson <val_henson@linux.intel.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Karel Zak <kzak@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

47ae32d6

09 12月, 2006 1 次提交

[PATCH] rename struct namespace to struct mnt_namespace · 6b3286ed

由 Kirill Korotaev 提交于 12月 08, 2006

Rename 'struct namespace' to 'struct mnt_namespace' to avoid confusion with
other namespaces being developped for the containers : pid, uts, ipc, etc.
'namespace' variables and attributes are also renamed to 'mnt_ns'
Signed-off-by: NKirill Korotaev <dev@sw.ru>
Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6b3286ed

23 6月, 2006 1 次提交

[PATCH] VFS: Permit filesystem to perform statfs with a known root dentry · 726c3342

由 David Howells 提交于 6月 23, 2006

Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch.  That reduced the significance of
sb->s_root, allowing NFS to place a fake root there.  However, NFS does
require a dentry to use as a target for the statfs operation.  This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Nathan Scott <nathans@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

726c3342

09 6月, 2006 2 次提交

VFS: Add shrink_submounts() · 5528f911

由 Trond Myklebust 提交于 6月 09, 2006

Allow a submount to be marked as being 'shrinkable' by means of the
vfsmount->mnt_flags, and then add a function 'shrink_submounts()' which
attempts to recursively unmount these submounts.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

5528f911

VFS: Add GPL_EXPORTED function vfs_kern_mount() · bb4a58bf

由 Trond Myklebust 提交于 6月 09, 2006

do_kern_mount() does not allow the kernel to use private mount interfaces
without exposing the same interfaces to userland. The problem is that the
filesystem is referenced by name, thus meaning that it and its mount
interface must be registered in the global filesystem list.

vfs_kern_mount() passes the struct file_system_type as an explicit
parameter in order to overcome this limitation.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

bb4a58bf

11 1月, 2006 1 次提交

[PATCH] per-mountpoint noatime/nodiratime · fc33a7bb

由 Christoph Hellwig 提交于 1月 09, 2006

Turn noatime and nodiratime into per-mount instead of per-sb flags.

After all the preparations this is a rather trivial patch. The mount code
needs to treat the two options as per-mount instead of per-superblock, and
touch_atime needs to be changed to check the new MNT_ flags in addition to
the MS_ flags that are kept for filesystems that are always
noatime/nodiratime but not user settable anymore. Besides that core code
only nfs needed an update because it's leaving atime updates to the server
and thus sets the S_NOATIME flag on every inode, but needs to know whether
it's a real noatime mount for an getattr optimization.

While we're at it I've killed the IS_NOATIME/IS_NODIRATIME macros that were
only used by touch_atime.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

fc33a7bb

09 1月, 2006 1 次提交

[PATCH] shared mounts: cleanup · bf066c7d

由 Miklos Szeredi 提交于 1月 08, 2006

Small cleanups in shared mounts code.
Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: <viro@parcelfarce.linux.theplanet.co.uk>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

bf066c7d

08 11月, 2005 5 次提交

[PATCH] unbindable mounts · 9676f0c6

由 Ram Pai 提交于 11月 07, 2005

An unbindable mount does not forward or receive propagation.  Also
unbindable mount disallows bind mounts.  The semantics is as follows.

Bind semantics:
  It is invalid to bind mount an unbindable mount.

Move semantics:
  It is invalid to move an unbindable mount under shared mount.

Clone-namespace semantics:
  If a mount is unbindable in the parent namespace, the corresponding
  cloned mount in the child namespace becomes unbindable too.  Note:
  there is subtle difference, unbindable mounts cannot be bind mounted
  but can be cloned during clone-namespace.
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9676f0c6

[PATCH] introduce slave mounts · a58b0eb8

由 Ram Pai 提交于 11月 07, 2005

A slave mount always has a master mount from which it receives
mount/umount events.  Unlike shared mount the event propagation does not
flow from the slave mount to the master.
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a58b0eb8

[PATCH] introduce shared mounts · 03e06e68

由 Ram Pai 提交于 11月 07, 2005

This creates shared mounts.  A shared mount when bind-mounted to some
mountpoint, propagates mount/umount events to each other.  All the
shared mounts that propagate events to each other belong to the same
peer-group.
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

03e06e68

[PATCH] beginning of the shared-subtree proper · 07b20889

由 Ram Pai 提交于 11月 07, 2005

A private mount does not forward or receive propagation.  This patch
provides user the ability to convert any mount to private.
Signed-off-by: NRam Pai <linuxram@us.ibm.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

07b20889

[PATCH] saner handling of auto_acct_off() and DQUOT_OFF() in umount · 7b7b1ace

由 Al Viro 提交于 11月 07, 2005

The way we currently deal with quota and process accounting that might
keep vfsmount busy at umount time is inherently broken; we try to turn
them off just in case (not quite correctly, at that) and

  a) pray umount doesn't fail (otherwise they'll stay turned off)
  b) pray nobody doesn anything funny just as we turn quota off

Moreover, LSM provides hooks for doing the same sort of broken logics.

The proper way to deal with that is to introduce the second kind of
reference to vfsmount.  Semantics:

 - when the last normal reference is dropped, all special ones are
   converted to normal ones and if there had been any, cleanup is done.
 - normal reference can be cloned into a special one
 - special reference can be converted to normal one; that's a no-op if
   we'd already passed the point of no return (i.e.  mntput() had
   converted special references to normal and started cleanup).

The way it works: e.g. starting process accounting converts the vfsmount
reference pinned by the opened file into special one and turns it back
to normal when it gets shut down; acct_auto_close() is done when no
normal references are left.  That way it does *not* obstruct umount(2)
and it silently gets turned off when the last normal reference to
vfsmount is gone.  Which is exactly what we want...

The same should be done by LSM module that holds some internal
references to vfsmount and wants to shut them down on umount - it should
make them special and security_sb_umount_close() will be called exactly
when the last normal reference to vfsmount is gone.

quota handling is even simpler - we don't use normal file IO anymore, so
there's no need to hold vfsmounts at all.  DQUOT_OFF() is done from
deactivate_super(), where it really belongs.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7b7b1ace

13 7月, 2005 1 次提交

[PATCH] name_to_dev_t warning fix · d53d9f16

由 Andrew Morton 提交于 7月 12, 2005

kernel/power/disk.c needs a declaration of name_to_dev_t() in scope.  mount.h
seems like an appropriate choice.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d53d9f16

08 7月, 2005 2 次提交

[PATCH] namespace: rename _mntput to mntput_no_expire · 751c404b

由 Miklos Szeredi 提交于 7月 07, 2005

This patch renames _mntput() to something a little more descriptive:
mntput_no_expire().
Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

751c404b

[PATCH] namespace: rename mnt_fslink to mnt_expire · 55e700b9

由 Miklos Szeredi 提交于 7月 07, 2005

This patch renames vfsmount->mnt_fslink to something a little more
descriptive: vfsmount->mnt_expire.
Signed-off-by: NMike Waychison <michael.waychison@sun.com>
Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

55e700b9

17 4月, 2005 1 次提交

Linux-2.6.12-rc2 · 1da177e4

由 Linus Torvalds 提交于 4月 16, 2005

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

1da177e4