提交 · ff0c7d15f9787b7e8c601533c015295cc68329f8 · OpenXiangShan / riscv-linux

07 1月, 2011 4 次提交

fs: avoid inode RCU freeing for pseudo fs · ff0c7d15

由 Nick Piggin 提交于 1月 07, 2011

Pseudo filesystems that don't put inode on RCU list or reachable by
rcu-walk dentries do not need to RCU free their inodes.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

ff0c7d15

fs: icache RCU free inodes · fa0d7e3d

由 Nick Piggin 提交于 1月 07, 2011

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fa0d7e3d

fs: use fast counters for vfs caches · 3e880fb5

由 Nick Piggin 提交于 1月 07, 2011

percpu_counter library generates quite nasty code, so unless you need
to dynamically allocate counters or take fast approximate value, a
simple per cpu set of counters is much better.

The percpu_counter can never be made to work as well, because it has an
indirection from pointer to percpu memory, and it can't use direct
this_cpu_inc interfaces because it doesn't use static PER_CPU data, so
code will always be worse.

In the fastpath, it is the difference between this:

        incl %gs:nr_dentry      # nr_dentry

and this:

        movl    percpu_counter_batch(%rip), %edx        # percpu_counter_batch,
        movl    $1, %esi        #,
        movq    $nr_dentry, %rdi        #,
        call    __percpu_counter_add    # (plus I clobber registers)

__percpu_counter_add:
        pushq   %rbp    #
        movq    %rsp, %rbp      #,
        subq    $32, %rsp       #,
        movq    %rbx, -24(%rbp) #,
        movq    %r12, -16(%rbp) #,
        movq    %r13, -8(%rbp)  #,
        movq    %rdi, %rbx      # fbc, fbc
#APP
# 216 "/home/npiggin/usr/src/linux-2.6/arch/x86/include/asm/thread_info.h" 1
        movq %gs:kernel_stack,%rax      #, pfo_ret__
# 0 "" 2
#NO_APP
        incl    -8124(%rax)     # <variable>.preempt_count
        movq    32(%rdi), %r12  # <variable>.counters, tcp_ptr__
#APP
# 78 "lib/percpu_counter.c" 1
        add %gs:this_cpu_off, %r12      # this_cpu_off, tcp_ptr__
# 0 "" 2
#NO_APP
        movslq  (%r12),%r13     #* tcp_ptr__, tmp73
        movslq  %edx,%rax       # batch, batch
        addq    %rsi, %r13      # amount, count
        cmpq    %rax, %r13      # batch, count
        jge     .L27    #,
        negl    %edx    # tmp76
        movslq  %edx,%rdx       # tmp76, tmp77
        cmpq    %rdx, %r13      # tmp77, count
        jg      .L28    #,
.L27:
        movq    %rbx, %rdi      # fbc,
        call    _raw_spin_lock  #
        addq    %r13, 8(%rbx)   # count, <variable>.count
        movq    %rbx, %rdi      # fbc,
        movl    $0, (%r12)      #,* tcp_ptr__
        call    _raw_spin_unlock        #
.L29:
#APP
# 216 "/home/npiggin/usr/src/linux-2.6/arch/x86/include/asm/thread_info.h" 1
        movq %gs:kernel_stack,%rax      #, pfo_ret__
# 0 "" 2
#NO_APP
        decl    -8124(%rax)     # <variable>.preempt_count
        movq    -8136(%rax), %rax       #, D.14625
        testb   $8, %al #, D.14625
        jne     .L32    #,
.L31:
        movq    -24(%rbp), %rbx #,
        movq    -16(%rbp), %r12 #,
        movq    -8(%rbp), %r13  #,
        leave
        ret
        .p2align 4,,10
        .p2align 3
.L28:
        movl    %r13d, (%r12)   # count,*
        jmp     .L29    #
.L32:
        call    preempt_schedule        #
        .p2align 4,,6
        jmp     .L31    #
        .size   __percpu_counter_add, .-__percpu_counter_add
        .p2align 4,,15
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

3e880fb5

vfs: revert per-cpu nr_unused counters for dentry and inodes · 86c8749e

由 Nick Piggin 提交于 1月 07, 2011

The nr_unused counters count the number of objects on an LRU, and as such they
are synchronized with LRU object insertion and removal and scanning, and
protected under the LRU lock.

Making it per-cpu does not actually get any concurrency improvements because of
this lock, and summing the counter is much slower, and
incrementing/decrementing it costs more code size and is slower too.

These counters should stay per-LRU, which currently means global.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

86c8749e

27 10月, 2010 1 次提交

IMA: move read counter into struct inode · a178d202

由 Eric Paris 提交于 10月 25, 2010

IMA currently allocated an inode integrity structure for every inode in
core.  This stucture is about 120 bytes long.  Most files however
(especially on a system which doesn't make use of IMA) will never need
any of this space.  The problem is that if IMA is enabled we need to
know information about the number of readers and the number of writers
for every inode on the box.  At the moment we collect that information
in the per inode iint structure and waste the rest of the space.  This
patch moves those counters into the struct inode so we can eventually
stop allocating an IMA integrity structure except when absolutely
needed.

This patch does the minimum needed to move the location of the data.
Further cleanups, especially the location of counter updates, may still
be possible.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a178d202

26 10月, 2010 18 次提交

split invalidate_inodes() · 63997e98

由 Al Viro 提交于 10月 25, 2010

Pull removal of fsnotify marks into generic_shutdown_super().
Split umount-time work into a new function - evict_inodes().
Make sure that invalidate_inodes() will be able to cope with
I_FREEING once we change locking in iput().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

63997e98

C
fs: fold invalidate_list into invalidate_inodes · a0318786
由 Christoph Hellwig 提交于 10月 24, 2010
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a0318786

fs: do not drop inode_lock in dispose_list · d895a1c9

由 Christoph Hellwig 提交于 10月 24, 2010

Despite the comment above it we can not safely drop the lock here.
invalidate_list is called from many other places that just umount.
Also switch to proper list macros now that we never drop the lock.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d895a1c9

fs: inode split IO and LRU lists · 7ccf19a8

由 Nick Piggin 提交于 10月 21, 2010

The use of the same inode list structure (inode->i_list) for two
different list constructs with different lifecycles and purposes
makes it impossible to separate the locking of the different
operations. Therefore, to enable the separation of the locking of
the writeback and reclaim lists, split the inode->i_list into two
separate lists dedicated to their specific tracking functions.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7ccf19a8

fs: fix buffer invalidation in invalidate_list · 99a38919

由 Christoph Hellwig 提交于 10月 23, 2010

We must not call invalidate_inode_buffers in invalidate_list unless the
inode can be reclaimed.  If we remove the buffer association of a busy
inode fsync won't find the buffers anymore.  As invalidate_inode_buffers
is called from various others sources than umount this actually does
matter in practice.

While at it change the loop to a more natural form and remove the
WARN_ON for I_NEW, wich we already tested a few lines above.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

99a38919

fs: do not assign default i_ino in new_inode · 85fe4025

由 Christoph Hellwig 提交于 10月 23, 2010

Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves.  For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

85fe4025

fs: introduce a per-cpu last_ino allocator · f991bd2e

由 Eric Dumazet 提交于 10月 23, 2010

new_inode() dirties a contended cache line to get increasing
inode numbers. This limits performance on workloads that cause
significant parallel inode allocation.

Solve this problem by using a per_cpu variable fed by the shared
last_ino in batches of 1024 allocations.  This reduces contention on
the shared last_ino, and give same spreading ino numbers than before
(i.e. same wraparound after 2^32 allocations).
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f991bd2e

new helper: ihold() · 7de9c6ee

由 Al Viro 提交于 10月 23, 2010

Clones an existing reference to inode; caller must already hold one.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7de9c6ee

fs: remove inode_add_to_list/__inode_add_to_list · 646ec461

由 Christoph Hellwig 提交于 10月 23, 2010

Split up inode_add_to_list/__inode_add_to_list.  Locking for the two
lists will be split soon so these helpers really don't buy us much
anymore.

The __ prefixes for the sb list helpers will go away soon, but until
inode_lock is gone we'll need them to distinguish between the locked
and unlocked variants.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

646ec461

fs: move i_count increments into find_inode/find_inode_fast · f7899bd5

由 Christoph Hellwig 提交于 10月 23, 2010

Now that iunique is not abusing find_inode anymore we can move the i_ref
increment back to where it belongs.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f7899bd5

fs: Stop abusing find_inode_fast in iunique · ad5e195a

由 Christoph Hellwig 提交于 10月 23, 2010

Stop abusing find_inode_fast for iunique and opencode the inode hash walk.
Introduce a new iunique_lock to protect the iunique counters once inode_lock
is removed.

Based on a patch originally from Nick Piggin.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ad5e195a

fs: Factor inode hash operations into functions · 4c51acbc

由 Dave Chinner 提交于 10月 23, 2010

Before replacing the inode hash locking with a more scalable
mechanism, factor the removal of the inode from the hashes rather
than open coding it in several places.

Based on a patch originally from Nick Piggin.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4c51acbc

fs: Implement lazy LRU updates for inodes · 9e38d86f

由 Nick Piggin 提交于 10月 23, 2010

Convert the inode LRU to use lazy updates to reduce lock and
cacheline traffic.  We avoid moving inodes around in the LRU list
during iget/iput operations so these frequent operations don't need
to access the LRUs. Instead, we defer the refcount checks to
reclaim-time and use a per-inode state flag, I_REFERENCED, to tell
reclaim that iget has touched the inode in the past. This means that
only reclaim should be touching the LRU with any frequency, hence
significantly reducing lock acquisitions and the amount contention
on LRU updates.

This also removes the inode_in_use list, which means we now only
have one list for tracking the inode LRU status. This makes it much
simpler to split out the LRU list operations under it's own lock.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9e38d86f

fs: Convert nr_inodes and nr_unused to per-cpu counters · cffbc8aa

由 Dave Chinner 提交于 10月 23, 2010

The number of inodes allocated does not need to be tied to the
addition or removal of an inode to/from a list. If we are not tied
to a list lock, we could update the counters when inodes are
initialised or destroyed, but to do that we need to convert the
counters to be per-cpu (i.e. independent of a lock). This means that
we have the freedom to change the list/locking implementation
without needing to care about the counters.

Based on a patch originally from Eric Dumazet.

[AV: cleaned up a bit, fixed build breakage on weird configs
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cffbc8aa

new helper: inode_unhashed() · 1d3382cb

由 Al Viro 提交于 10月 23, 2010

note: for race-free uses you inode_lock held
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1d3382cb

A
unexport invalidate_inodes · a8dade34
由 Al Viro 提交于 10月 24, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a8dade34

lockdep: fixup checking of dir inode annotation · a3314a0e

由 Namhyung Kim 提交于 10月 11, 2010

Since inode->i_mode shares its bits for S_IFMT, S_ISDIR should be
used to distinguish whether it is a dir or not.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a3314a0e

fs: mark destroy_inode static · 56b0dacf

由 Christoph Hellwig 提交于 10月 06, 2010

Hugetlbfs used to need it, but after the destroy_inode and evict_inode
changes it's not required anymore.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

56b0dacf

10 8月, 2010 12 次提交

A
All filesystems that need invalidate_inode_buffers() are doing that explicitly · b70a3e07
由 Al Viro 提交于 6月 07, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b70a3e07
A
convert remaining ->clear_inode() to ->evict_inode() · b57922d9
由 Al Viro 提交于 6月 07, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b57922d9
A
Make ->drop_inode() just return whether inode needs to be dropped · 45321ac5
由 Al Viro 提交于 6月 07, 2010
```
... and let iput_final() do the actual eviction or retention
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
45321ac5
A
fs/inode.c:clear_inode() is gone · 30140837
由 Al Viro 提交于 6月 07, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
30140837
A
fs/inode.c:evict() doesn't care about delete vs. non-delete paths now · 644da596
由 Al Viro 提交于 6月 07, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
644da596
A
->delete_inode() is gone · 07958f9f
由 Al Viro 提交于 6月 07, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
07958f9f

new helper: end_writeback() · b0683aa6

由 Al Viro 提交于 6月 04, 2010

Essentially, the minimal variant of ->evict_inode().  It's
a trimmed-down clear_inode(), sans any fs callbacks.  Once
it returns we know that no async writeback will be happening;
every ->evict_inode() instance should do that once and do that
before doing anything ->write_inode() could interfere with
(e.g. freeing the on-disk inode).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b0683aa6

Take ->i_bdev/->i_cdev handling out of clear_inode() · 661074e9

由 Al Viro 提交于 6月 04, 2010

All call chains to clear_inode() pass through evict_inode() and
clear_inode() should be called by evict_inode() exactly once.
So we can pull i_bdev/i_cdev detaching up to evict_inode() itself.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

661074e9

A
generic_detach_inode() can be static now · c6287315
由 Al Viro 提交于 6月 04, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
c6287315

New method - evict_inode() · be7ce416

由 Al Viro 提交于 6月 04, 2010

Hybrid of ->clear_inode() and ->delete_inode(); if present, does
all fs work to be done when in-core inode is about to be gone,
for whatever reason.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

be7ce416

A
unify fs/inode.c callers of clear_inode() · b4272d4c
由 Al Viro 提交于 6月 04, 2010
```
For now, just a straightforward merge
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b4272d4c

simplify checks for I_CLEAR/I_FREEING · a4ffdde6

由 Al Viro 提交于 6月 02, 2010

add I_CLEAR instead of replacing I_FREEING with it.  I_CLEAR is
equivalent to I_FREEING for almost all code looking at either;
it's there to keep track of having called clear_inode() exactly
once per inode lifetime, at some point after having set I_FREEING.
I_CLEAR and I_FREEING never get set at the same time with the
current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
instead of I_CLEAR without loss of information.  As the result of
such change, checks become simpler and the amount of code that needs
to know about I_CLEAR shrinks a lot.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a4ffdde6

28 7月, 2010 2 次提交

fsnotify: rename fsnotify_mark_entry to just fsnotify_mark · e61ce867

由 Eric Paris 提交于 12月 17, 2009

The name is long and it serves no real purpose.  So rename
fsnotify_mark_entry to just fsnotify_mark.
Signed-off-by: NEric Paris <eparis@redhat.com>

e61ce867

E
inotify: remove inotify in kernel interface · 2dfc1cae
由 Eric Paris 提交于 12月 17, 2009
```
nothing uses inotify in the kernel, drop it!
Signed-off-by: NEric Paris <eparis@redhat.com>
```
2dfc1cae

19 7月, 2010 1 次提交

mm: add context argument to shrinker callback · 7f8275d0

由 Dave Chinner 提交于 7月 19, 2010

The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

7f8275d0

22 5月, 2010 2 次提交

vfs: Add inode uid,gid,mode init helper · a1bd120d

由 Dmitry Monakhov 提交于 3月 04, 2010

Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a1bd120d

fs: inode.c use atomic_inc_return in __iget · 2e147f1e

由 Richard Kennedy 提交于 5月 14, 2010

Using atomic_inc_return in __iget(struct inode *inode) makes the intent
of this code clearer and generates less code on processors that have
this operation.

On x86_64 this patch reduces the text size of inode.o by 12 bytes.
Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>

----
patch against 2.6.34-rc7
compiled & tested on x86_64 AMD X2

I've been running with this patch applied for several weeks with no
obvious problems.
regards
Richard
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2e147f1e

OpenXiangShan / riscv-linux 大约 1 年 前同步成功

OpenXiangShan / riscv-linux
大约 1 年前同步成功