提交 · 34286d6662308d82aed891852d04c7c3a2649b16 · openeuler / raspberrypi-kernel

07 1月, 2011 7 次提交

fs: rcu-walk aware d_revalidate method · 34286d66

由 Nick Piggin 提交于 1月 07, 2011

Require filesystems be aware of .d_revalidate being called in rcu-walk
mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
-ECHILD from all implementations.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

34286d66

fs: dcache reduce branches in lookup path · fb045adb

由 Nick Piggin 提交于 1月 07, 2011

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fb045adb

fs: icache RCU free inodes · fa0d7e3d

由 Nick Piggin 提交于 1月 07, 2011

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fa0d7e3d

fs: dcache remove dcache_lock · b5c84bf6

由 Nick Piggin 提交于 1月 07, 2011

dcache_lock no longer protects anything. remove it.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b5c84bf6

fs: dcache scale subdirs · 2fd6b7f5

由 Nick Piggin 提交于 1月 07, 2011

Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
using dcache_lock for these anyway (eg. using i_mutex).

Note: if we change the locking rule in future so that ->d_child protection is
provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
But it would be an exception to an otherwise regular locking scheme, so we'd
have to see some good results. Probably not worthwhile.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

2fd6b7f5

fs: dcache scale d_unhashed · da502956

由 Nick Piggin 提交于 1月 07, 2011

Protect d_unhashed(dentry) condition with d_lock. This means keeping
DCACHE_UNHASHED bit in synch with hash manipulations.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

da502956

fs: dcache scale dentry refcount · b7ab39f6

由 Nick Piggin 提交于 1月 07, 2011

Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b7ab39f6

18 12月, 2010 2 次提交

ceph: mark user pages dirty on direct-io reads · b6aa5901

由 Henry C Chang 提交于 12月 15, 2010

For read operation, we have to set the argument _write_ of get_user_pages
to 1 since we will write data to pages. Also, we need to SetPageDirty before
releasing these pages.
Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

b6aa5901

ceph: fix null pointer dereference in ceph_init_dentry for nfs reexport · 92cf7652

由 Sage Weil 提交于 12月 17, 2010

The fh_to_dentry etc. methods use ceph_init_dentry(), which assumes that
d_parent is defined.  It isn't for those callers, so check!
Signed-off-by: NSage Weil <sage@newdream.net>

92cf7652

16 12月, 2010 1 次提交

ceph: fix direct-io on non-page-aligned buffers · ab226e21

由 Henry C Chang 提交于 12月 15, 2010

The user buffer may be 512-byte aligned, not page-aligned.  We were
assuming the buffer was page-aligned and only accounting for
non-page-aligned io offsets.
Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: NSage Weil <sage@newdream.net>

ab226e21

07 12月, 2010 1 次提交

ceph: fix ioctl magic · 1cd275f6

由 Sage Weil 提交于 12月 06, 2010

The ioctl magic was inadvertently changed in 571dba52.
Signed-off-by: NSage Weil <sage@newdream.net>

1cd275f6

02 12月, 2010 4 次提交

ceph: Behave better when handling file lock replies. · a5b10629

由 Herb Shiu 提交于 11月 23, 2010

Fill in the local lock with response data if appropriate,
and don't call posix_lock_file when reading locks.
Signed-off-by: NHerb Shiu <herb_shiu@tcloudcomputing.com>
Acked-by: NGreg Farnum <gregf@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

a5b10629

ceph: pass lock information by struct file_lock instead of as individual params. · 637ae8d5

由 Herb Shiu 提交于 11月 23, 2010

Signed-off-by: NHerb Shiu <herb_shiu@tcloudcomputing.com>
Acked-by: NGreg Farnum <gregf@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

637ae8d5

ceph: Handle file locks in replies from the MDS. · 25933abd

由 Herb Shiu 提交于 12月 01, 2010

Previously the kernel client incorrectly assumed everything was a directory.
Signed-off-by: NHerb Shiu <herb_shiu@tcloudcomputing.com>
Acked-by: NGreg Farnum <gregf@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

25933abd

ceph: avoid possible null deref in readdir after dir llseek · 884ea892

由 Sage Weil 提交于 11月 22, 2010

last may be NULL, but we dereference it in the else branch without
checking.  Normally it doesn't trigger because last == NULL when fpos == 2,
but it could happen on a newly opened dir if the user seeks forward.
Reported-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

884ea892

19 11月, 2010 1 次提交

ceph: fix readdir EOVERFLOW on 32-bit archs · 3105c19c

由 Sage Weil 提交于 11月 18, 2010

One of the readdir filldir_t callers was passing the raw ceph 64-bit ino
instead of the hashed 32-bit one, producing an EOVERFLOW in the filler
callback.  Fix this by calling the ceph_vino_to_ino() helper to do the
conversion.
Reported-by: NJan Smets <jan.smets@alcatel-lucent.com>
Tested-by: NJan Smets <jan.smets@alcatel-lucent.com>
Signed-off-by: NSage Weil <sage@newdream.net>

3105c19c

18 11月, 2010 1 次提交

BKL: remove extraneous #include <smp_lock.h> · 451a3c24

由 Arnd Bergmann 提交于 11月 17, 2010

The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

451a3c24

12 11月, 2010 2 次提交

ceph: fix frag offset for non-leftmost frags · 7b88dadc

由 Sage Weil 提交于 11月 11, 2010

We start at offset 2 for the leftmost frag, and 0 for subsequent frags.
When we reach the end (rightmost), we go back to 2.  This fixes readdir on
fragmented (large) directories.
Signed-off-by: NSage Weil <sage@newdream.net>

7b88dadc

ceph: fix dangling pointer · a1629c3b

由 Sage Weil 提交于 11月 11, 2010

Clear fi->last_name when it's freed.  The only caller is rewinddir() (or
equivalent lseek).
Signed-off-by: NSage Weil <sage@newdream.net>

a1629c3b

10 11月, 2010 2 次提交

ceph: make page alignment explicit in osd interface · b7495fc2

由 Sage Weil 提交于 11月 09, 2010

We used to infer alignment of IOs within a page based on the file offset,
which assumed they matched. This broke with direct IO that was not aligned
to pages (e.g., 512-byte aligned IO). We were also trusting the alignment
specified in the OSD reply, which could have been adjusted by the server.

Explicitly specify the page alignment when setting up OSD IO requests.
Signed-off-by: NSage Weil <sage@newdream.net>

b7495fc2

S
ceph: fix comment, remove extraneous args · e98b6fed
由 Sage Weil 提交于 11月 09, 2010
```
The offset/length arguments aren't used.
Signed-off-by: NSage Weil <sage@newdream.net>
```
e98b6fed

09 11月, 2010 2 次提交

ceph: fix update of ctime from MDS · d8672d64

由 Sage Weil 提交于 11月 08, 2010

The client can have a newer ctime than the MDS due to AUTH_EXCL and
XATTR_EXCL caps as well; update the check in ceph_fill_file_time
appropriately.

This fixes cases where ctime/mtime goes backward under the right sequence
of local updates (e.g. chmod) and mds replies (e.g. subsequent stat that
goes to the MDS).
Signed-off-by: NSage Weil <sage@newdream.net>

d8672d64

ceph: fix version check on racing inode updates · 8bd59e01

由 Sage Weil 提交于 11月 08, 2010

We may get updates on the same inode from multiple MDSs; generally we only
pay attention if the update is newer than what we already have.  The
exception is when an MDS sense unstable information, in which case we
always update.

The old > check got this wrong when our version was odd (e.g. 3) and the
reply version was even (e.g. 2): the older stale (v2) info would be
applied.  Fixed and clarified the comment.
Signed-off-by: NSage Weil <sage@newdream.net>

8bd59e01

08 11月, 2010 6 次提交

ceph: fix uid/gid on resent mds requests · cb4276cc

由 Sage Weil 提交于 11月 08, 2010

MDS requests can be rebuilt and resent in non-process context, but were
filling in uid/gid from current_fsuid/gid.  Put that information in the
request struct on request setup.

This fixes incorrect (and root) uid/gid getting set for requests that
are forwarded between MDSs, usually due to metadata migrations.
Signed-off-by: NSage Weil <sage@newdream.net>

cb4276cc

ceph: fix rdcache_gen usage and invalidate · cd045cb4

由 Sage Weil 提交于 11月 04, 2010

We used to use rdcache_gen to indicate whether we "might" have cached
pages. Now we just look at the mapping to determine that. However, some
old behavior remains from that transition.

First, rdcache_gen == 0 no longer means we have no pages. That can happen
at any time (presumably when we carry FILE_CACHE). We should not reset it
to zero, and we should not check that it is zero.

That means that the only purpose for rdcache_revoking is to resolve races
between new issues of FILE_CACHE and an async invalidate. If they are
equal, we should invalidate. On success, we decrement rdcache_revoking,
so that it is no longer equal to rdcache_gen. Similarly, if we success
in doing a sync invalidate, set revoking = gen - 1. (This is a small
optimization to avoid doing unnecessary invalidate work and does not
affect correctness.)
Signed-off-by: NSage Weil <sage@newdream.net>

cd045cb4

ceph: re-request max_size if cap auth changes · feb4cc9b

由 Sage Weil 提交于 11月 07, 2010

If the auth cap migrates to another MDS, clear requested_max_size so that
we resend any pending max_size increase requests.  This fixes potential
hangs on writes that extend a file and race with an cap migration between
MDSs.
Signed-off-by: NSage Weil <sage@newdream.net>

feb4cc9b

ceph: only let auth caps update max_size · 912a9b03

由 Sage Weil 提交于 11月 07, 2010

Only the auth MDS has a meaningful max_size value for us, so only update it
in fill_inode if we're being issued an auth cap. Otherwise, a random
stat result from a non-auth MDS can clobber a meaningful max_size, get
the client<->mds cap state out of sync, and make writes hang.

Specifically, even if the client re-requests a larger max_size (which it
will), the MDS won't respond because as far as it knows we already have a
sufficiently large value.
Signed-off-by: NSage Weil <sage@newdream.net>

912a9b03

ceph: fix open for write on clustered mds · 7421ab80

由 Sage Weil 提交于 11月 07, 2010

Normally when we open a file we already have a cap, and simply update the
wanted set. However, if we open a file for write, but don't have an auth
cap, that doesn't work; we need to open a new cap with the auth MDS. Only
reuse existing caps if we are opening for read or the existing cap is auth.
Signed-off-by: NSage Weil <sage@newdream.net>

7421ab80

ceph: fix bad pointer dereference in ceph_fill_trace · d8b16b3d

由 Sage Weil 提交于 11月 06, 2010

We dereference *in a few lines down, but only set it on rename.  It is
apparently pretty rare for this to trigger, but I have been hitting it
with a clustered MDSs.
Signed-off-by: NSage Weil <sage@newdream.net>

d8b16b3d

29 10月, 2010 1 次提交
- A
  convert ceph · a7f9fb20
  由 Al Viro 提交于 7月 26, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  a7f9fb20
28 10月, 2010 1 次提交

Revert "ceph: update issue_seq on cap grant" · 2f56f56a

由 Sage Weil 提交于 10月 27, 2010

This reverts commit d91f2438.

The intent of issue_seq is to distinguish between mds->client messages that
(re)create the cap and those that do not, which means we should _only_ be
updating that value in the create paths.  By updating it in handle_cap_grant,
we reset it to zero, which then breaks release.

The larger question is what workload/problem made me think it should be
updated here...
Signed-off-by: NSage Weil <sage@newdream.net>

2f56f56a

27 10月, 2010 1 次提交

writeback: remove nonblocking/encountered_congestion references · 1b430bee

由 Wu Fengguang 提交于 10月 26, 2010

This removes more dead code that was somehow missed by commit 0d99519e
(writeback: remove unused nonblocking and congestion checks).  There are
no behavior change except for the removal of two entries from one of the
ext4 tracing interface.

The nonblocking checks in ->writepages are no longer used because the
flusher now prefer to block on get_request_wait() than to skip inodes on
IO congestion.  The latter will lead to more seeky IO.

The nonblocking checks in ->writepage are no longer used because it's
redundant with the WB_SYNC_NONE check.

We no long set ->nonblocking in VM page out and page migration, because
a) it's effectively redundant with WB_SYNC_NONE in current code
b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
   that would skip some dirty inodes on congestion and page out others, which
   is unfair in terms of LRU age.

Inspired by Christoph Hellwig. Thanks!
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: David Howells <dhowells@redhat.com>
Cc: Sage Weil <sage@newdream.net>
Cc: Steve French <sfrench@samba.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1b430bee

21 10月, 2010 8 次提交

ceph: do not carry i_lock for readdir from dcache · efa4c120

由 Sage Weil 提交于 10月 18, 2010

We were taking dcache_lock inside of i_lock, which introduces a dependency
not found elsewhere in the kernel, complicationg the vfs locking
scalability work.  Since we don't actually need it here anyway, remove
it.

We only need i_lock to test for the I_COMPLETE flag, so be careful to do
so without dcache_lock held.
Signed-off-by: NSage Weil <sage@newdream.net>

efa4c120

fs/ceph/xattr.c: Use kmemdup · 61413c2f

由 Julia Lawall 提交于 10月 17, 2010

Convert a sequence of kmalloc and memcpy to use kmemdup.

The semantic patch that performs this transformation is:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression a,flag,len;
expression arg,e1,e2;
statement S;
@@

  a =
-  \(kmalloc\|kzalloc\)(len,flag)
+  kmemdup(arg,len,flag)
  <... when != a
  if (a == NULL || ...) S
  ...>
- memcpy(a,arg,len+1);
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NSage Weil <sage@newdream.net>

61413c2f

G
ceph: add CEPH_MDS_OP_SETDIRLAYOUT and associated ioctl. · 571dba52
由 Greg Farnum 提交于 9月 24, 2010
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
571dba52

ceph: fix debugfs warnings · 6f453ed6

由 Randy Dunlap 提交于 9月 28, 2010

Include "super.h" outside of CONFIG_DEBUG_FS to eliminate a compiler warning:

fs/ceph/debugfs.c:266: warning: 'struct ceph_fs_client' declared inside parameter list
fs/ceph/debugfs.c:266: warning: its scope is only this definition or declaration, which is probably not what you want
fs/ceph/debugfs.c:271: warning: 'struct ceph_fs_client' declared inside parameter list
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

6f453ed6

ceph: switch from BKL to lock_flocks() · 496e5955

由 Sage Weil 提交于 9月 22, 2010

Switch from using the BKL explicitly to the new lock_flocks() interface.
Eventually this will turn into a spinlock.
Signed-off-by: NSage Weil <sage@newdream.net>

496e5955

ceph: preallocate flock state without locks held · fca4451a

由 Greg Farnum 提交于 9月 17, 2010

When the lock_kernel() turns into lock_flocks() and a spinlock, we won't
be able to do allocations with the lock held. Preallocate space without
the lock, and retry if the lock state changes out from underneath us.
Signed-off-by: NGreg Farnum <gregf@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

fca4451a

S
ceph: use mapping->nrpages to determine if mapping is empty · 18a38193
由 Sage Weil 提交于 9月 17, 2010
```
This is simpler and faster.
Signed-off-by: NSage Weil <sage@newdream.net>
```
18a38193

ceph: only invalidate on check_caps if we actually have pages · 93afd449

由 Sage Weil 提交于 9月 17, 2010

The i_rdcache_gen value only implies we MAY have cached pages; actually
check the mapping to see if it's worth bothering with an invalidate.
Signed-off-by: NSage Weil <sage@newdream.net>

93afd449