提交 · 63997e98a3be68d7cec806d22bf9b02b2e1daabb · openeuler / Kernel

26 10月, 2010 40 次提交

由 Al Viro 提交于 10月 25, 2010

Pull removal of fsnotify marks into generic_shutdown_super().
Split umount-time work into a new function - evict_inodes().
Make sure that invalidate_inodes() will be able to cope with
I_FREEING once we change locking in iput().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

63997e98

fs: skip I_FREEING inodes in writeback_sb_inodes · 9843b76a

由 Christoph Hellwig 提交于 10月 24, 2010

Skip I_FREEING inodes just like I_WILL_FREE and I_NEW when walking the
writeback lists.  Currenly this can't happen, but once we move from
inode_lock to more fine grained locking we can have an inode that's
still on the writeback lists but has I_FREEING set, and we absolutely
need to skip it here, just like we do for all other inode list walks.

Based on a patch from Dave Chinner.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9843b76a

C
fs: fold invalidate_list into invalidate_inodes · a0318786
由 Christoph Hellwig 提交于 10月 24, 2010
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a0318786

fs: do not drop inode_lock in dispose_list · d895a1c9

由 Christoph Hellwig 提交于 10月 24, 2010

Despite the comment above it we can not safely drop the lock here.
invalidate_list is called from many other places that just umount.
Also switch to proper list macros now that we never drop the lock.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d895a1c9

fs: inode split IO and LRU lists · 7ccf19a8

由 Nick Piggin 提交于 10月 21, 2010

The use of the same inode list structure (inode->i_list) for two
different list constructs with different lifecycles and purposes
makes it impossible to separate the locking of the different
operations. Therefore, to enable the separation of the locking of
the writeback and reclaim lists, split the inode->i_list into two
separate lists dedicated to their specific tracking functions.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7ccf19a8

fs: switch bdev inode bdi's correctly · a5491e0c

由 Dave Chinner 提交于 10月 21, 2010

bdev inodes can remain dirty even after their last close. Hence the
BDI associated with the bdev->inode gets modified duringthe last
close to point to the default BDI. However, the bdev inode still
needs to be moved to the dirty lists of the new BDI, otherwise it
will corrupt the writeback list is was left on.

Add a new function bdev_inode_switch_bdi() to move all the bdi state
from the old bdi to the new one safely. This is only a temporary
measure until the bdev inode<->bdi lifecycle problems are sorted
out.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a5491e0c

fs: fix buffer invalidation in invalidate_list · 99a38919

由 Christoph Hellwig 提交于 10月 23, 2010

We must not call invalidate_inode_buffers in invalidate_list unless the
inode can be reclaimed.  If we remove the buffer association of a busy
inode fsync won't find the buffers anymore.  As invalidate_inode_buffers
is called from various others sources than umount this actually does
matter in practice.

While at it change the loop to a more natural form and remove the
WARN_ON for I_NEW, wich we already tested a few lines above.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

99a38919

fsnotify: use dget_parent · 4d4eb366

由 Christoph Hellwig 提交于 10月 10, 2010

Use dget_parent instead of opencoding it.  This simplifies the code, but
more importanly prepares for the more complicated locking for a parent
dget in the dcache scale patch series.

It means we do grab a reference to the parent now if need to be watched,
but not with the specified mask.  If this turns out to be a problem
we'll have to revisit it, but for now let's keep as much as possible
dcache internals inside dcache.[ch].
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4d4eb366

smbfs: use dget_parent · be9eee2e

由 Christoph Hellwig 提交于 10月 10, 2010

Use dget_parent instead of opencoding it.  This simplifies the code, but
more importanly prepares for the more complicated locking for a parent
dget in the dcache scale patch series.

Note that the d_time assignment in smb_renew_times moves out of d_lock,
but it's a single atomic 32-bit value, and that's what other sites
setting it do already.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

be9eee2e

exportfs: use dget_parent · 0461ee26

由 Christoph Hellwig 提交于 10月 13, 2010

Use dget_parent instead of opencoding it.  This simplifies the code, but
more importanly prepares for the more complicated locking for a parent
dget in the dcache scale patch series.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0461ee26

fs: use RCU read side protection in d_validate · 3825bdb7

由 Christoph Hellwig 提交于 10月 10, 2010

d_validate does a purely read lookup in the dentry hash, so use RCU read side
locking instead of dcache_lock.  Split out from a larget patch by
Nick Piggin <npiggin@suse.de>.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3825bdb7

fs: clean up dentry lru modification · a4633357

由 Christoph Hellwig 提交于 10月 10, 2010

Always do a list_del_init on the LRU to make sure the list_empty invariant for
not beeing on the LRU always holds true, and fold dentry_lru_del_init into
dentry_lru_del. Replace the dentry_lru_add_tail primitive with a
dentry_lru_move_tail operations that simpler when the dentry already is one
the list, which is always is. Move the list_empty into dentry_lru_add to
fit the scheme of the other lru helpers, and simplify locking once we
move to a separate LRU lock.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a4633357

fs: split __shrink_dcache_sb · 3049cfe2

由 Christoph Hellwig 提交于 10月 10, 2010

Currently __shrink_dcache_sb has an extremly awkward calling convention
because it tries to please very different callers. Split out the
main loop into a shrink_dentry_list helper, which gets called directly
from shrink_dcache_sb for the cases where all dentries need to be pruned,
or from __shrink_dcache_sb for pruning only a certain number of dentries.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3049cfe2

fs: improve DCACHE_REFERENCED usage · 265ac902

由 Nick Piggin 提交于 10月 10, 2010

dentry referenced bit is only set when installing the dentry back
onto the LRU. However with lazy LRU, the dentry can already be on
the LRU list at dput time, thus missing out on setting the referenced
bit. Fix this.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

265ac902

fs: use percpu counter for nr_dentry and nr_dentry_unused · 312d3ca8

由 Christoph Hellwig 提交于 10月 10, 2010

The nr_dentry stat is a globally touched cacheline and atomic operation
twice over the lifetime of a dentry. It is used for the benfit of userspace
only. Turn it into a per-cpu counter and always decrement it in d_free instead
of doing various batching operations to reduce lock hold times in the callers.

Based on an earlier patch from Nick Piggin <npiggin@suse.de>.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

312d3ca8

fs: simplify __d_free · 9c82ab9c

由 Christoph Hellwig 提交于 10月 10, 2010

Remove d_callback and always call __d_free with a RCU head.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9c82ab9c

fs: take dcache_lock inside __d_path · be148247

由 Christoph Hellwig 提交于 10月 10, 2010

All callers take dcache_lock just around the call to __d_path, so
take the lock into it in preparation of getting rid of dcache_lock.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

be148247

fs: do not assign default i_ino in new_inode · 85fe4025

由 Christoph Hellwig 提交于 10月 23, 2010

Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves.  For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

85fe4025

fs: introduce a per-cpu last_ino allocator · f991bd2e

由 Eric Dumazet 提交于 10月 23, 2010

new_inode() dirties a contended cache line to get increasing
inode numbers. This limits performance on workloads that cause
significant parallel inode allocation.

Solve this problem by using a per_cpu variable fed by the shared
last_ino in batches of 1024 allocations.  This reduces contention on
the shared last_ino, and give same spreading ino numbers than before
(i.e. same wraparound after 2^32 allocations).
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f991bd2e

new helper: ihold() · 7de9c6ee

由 Al Viro 提交于 10月 23, 2010

Clones an existing reference to inode; caller must already hold one.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7de9c6ee

fs: remove inode_add_to_list/__inode_add_to_list · 646ec461

由 Christoph Hellwig 提交于 10月 23, 2010

Split up inode_add_to_list/__inode_add_to_list.  Locking for the two
lists will be split soon so these helpers really don't buy us much
anymore.

The __ prefixes for the sb list helpers will go away soon, but until
inode_lock is gone we'll need them to distinguish between the locked
and unlocked variants.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

646ec461

fs: move i_count increments into find_inode/find_inode_fast · f7899bd5

由 Christoph Hellwig 提交于 10月 23, 2010

Now that iunique is not abusing find_inode anymore we can move the i_ref
increment back to where it belongs.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f7899bd5

fs: Stop abusing find_inode_fast in iunique · ad5e195a

由 Christoph Hellwig 提交于 10月 23, 2010

Stop abusing find_inode_fast for iunique and opencode the inode hash walk.
Introduce a new iunique_lock to protect the iunique counters once inode_lock
is removed.

Based on a patch originally from Nick Piggin.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ad5e195a

fs: Factor inode hash operations into functions · 4c51acbc

由 Dave Chinner 提交于 10月 23, 2010

Before replacing the inode hash locking with a more scalable
mechanism, factor the removal of the inode from the hashes rather
than open coding it in several places.

Based on a patch originally from Nick Piggin.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4c51acbc

fs: Implement lazy LRU updates for inodes · 9e38d86f

由 Nick Piggin 提交于 10月 23, 2010

Convert the inode LRU to use lazy updates to reduce lock and
cacheline traffic.  We avoid moving inodes around in the LRU list
during iget/iput operations so these frequent operations don't need
to access the LRUs. Instead, we defer the refcount checks to
reclaim-time and use a per-inode state flag, I_REFERENCED, to tell
reclaim that iget has touched the inode in the past. This means that
only reclaim should be touching the LRU with any frequency, hence
significantly reducing lock acquisitions and the amount contention
on LRU updates.

This also removes the inode_in_use list, which means we now only
have one list for tracking the inode LRU status. This makes it much
simpler to split out the LRU list operations under it's own lock.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9e38d86f

fs: Convert nr_inodes and nr_unused to per-cpu counters · cffbc8aa

由 Dave Chinner 提交于 10月 23, 2010

The number of inodes allocated does not need to be tied to the
addition or removal of an inode to/from a list. If we are not tied
to a list lock, we could update the counters when inodes are
initialised or destroyed, but to do that we need to convert the
counters to be per-cpu (i.e. independent of a lock). This means that
we have the freedom to change the list/locking implementation
without needing to care about the counters.

Based on a patch originally from Eric Dumazet.

[AV: cleaned up a bit, fixed build breakage on weird configs
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cffbc8aa

vfs: fix infinite loop caused by clone_mnt race · be1a16a0

由 Miklos Szeredi 提交于 10月 05, 2010

If clone_mnt() happens while mnt_make_readonly() is running, the
cloned mount might have MNT_WRITE_HOLD flag set, which results in
mnt_want_write() spinning forever on this mount.

Needs CAP_SYS_ADMIN to trigger deliberately and unlikely to happen
accidentally.  But if it does happen it can hang the machine.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

be1a16a0

A
switch hfs to hlist_add_fake() · 89b0fc38
由 Al Viro 提交于 10月 23, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
89b0fc38

list.h: new helper - hlist_add_fake() · 756acc2d

由 Al Viro 提交于 10月 23, 2010

Make node look as if it was on hlist, with hlist_del()
working correctly.  Usable without any locking...

Convert a couple of places where we want to do that to
inode->i_hash.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

756acc2d

new helper: inode_unhashed() · 1d3382cb

由 Al Viro 提交于 10月 23, 2010

note: for race-free uses you inode_lock held
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1d3382cb

A
unexport invalidate_inodes · a8dade34
由 Al Viro 提交于 10月 24, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a8dade34
A
smbfs never retains inodes with zero refcount in the first place · 61ebdb42
由 Al Viro 提交于 10月 24, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
61ebdb42

ntfs: don't call invalidate_inodes() · 70fd136e

由 Al Viro 提交于 10月 24, 2010

We are in fill_super(); again, no inodes with zero i_count could
be around until we set MS_ACTIVE.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

70fd136e

gfs2: invalidate_inodes() is no-op there · 9dcefee5

由 Al Viro 提交于 10月 24, 2010

In fill_super() we hadn't MS_ACTIVE set yet, so there won't
be any inodes with zero i_count sitting around.

In put_super() we already have MS_ACTIVE removed *and* we
had called invalidate_inodes() since then.  So again there
won't be any inodes with zero i_count...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9dcefee5

ext2_remount: don't bother with invalidate_inodes() · 8e3b9a07

由 Al Viro 提交于 10月 24, 2010

It's pointless - we *do* have busy inodes (root directory,
for one), so that call will fail and attempt to change
XIP flag will be ignored.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8e3b9a07

fs/buffer.c: call __block_write_begin() if we have page · 309f77ad

由 Namhyung Kim 提交于 10月 25, 2010

If we have the appropriate page already, call __block_write_begin()
directly instead of releasing and regrabbing it inside of
block_write_begin().
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

309f77ad

lockdep: fixup checking of dir inode annotation · a3314a0e

由 Namhyung Kim 提交于 10月 11, 2010

Since inode->i_mode shares its bits for S_IFMT, S_ISDIR should be
used to distinguish whether it is a dir or not.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a3314a0e

aio: bump i_count instead of using igrab · 306fb097

由 Chris Mason 提交于 8月 23, 2010

The aio batching code is using igrab to get an extra reference on the
inode so it can safely batch.  igrab will go ahead and take the global
inode spinlock, which can be a bottleneck on large machines doing lots
of AIO.

In this case, igrab isn't required because we already have a reference
on the file handle.  It is safe to just bump the i_count directly
on the inode.

Benchmarking shows this patch brings IOP/s on tons of flash up by about
2.5X.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

306fb097

update block_device_operations documentation · e1455d1b

由 Christoph Hellwig 提交于 10月 06, 2010

Updated Documentation/filesystems/Locking to match the code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

e1455d1b

fs/buffer.c: remove duplicated assignment on b_private · 8358e7d7

由 Namhyung Kim 提交于 10月 16, 2010

bh->b_private is initialized within init_buffer(), thus the
assignment should be redundant. Remove it.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8358e7d7

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功