提交 · b415bf4f9fe25f39934f5c464125e4a2dffb6d08 · openanolis / cloud-kernel

04 7月, 2013 1 次提交

ceph: fix pending vmtruncate race · b415bf4f

由 Yan, Zheng 提交于 7月 02, 2013

The locking order for pending vmtruncate is wrong, it can lead to
following race:

        write                  wmtruncate work
------------------------    ----------------------
lock i_mutex
check i_truncate_pending   check i_truncate_pending
truncate_inode_pages()     lock i_mutex (blocked)
copy data to page cache
unlock i_mutex
                           truncate_inode_pages()

The fix is take i_mutex before calling __ceph_do_pending_vmtruncate()

Fixes: http://tracker.ceph.com/issues/5453Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

b415bf4f

02 5月, 2013 4 次提交

ceph: fix symlink inode operations · 0b932672

由 Yan, Zheng 提交于 4月 07, 2013

add getattr/setattr and xattrs related methods.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>

0b932672

ceph: use i_release_count to indicate dir's completeness · 2f276c51

由 Yan, Zheng 提交于 3月 13, 2013

Current ceph code tracks directory's completeness in two places.
ceph_readdir() checks i_release_count to decide if it can set the
I_COMPLETE flag in i_ceph_flags. All other places check the I_COMPLETE
flag. This indirection introduces locking complexity.

This patch adds a new variable i_complete_count to ceph_inode_info.
Set i_release_count's value to it when marking a directory complete.
By comparing the two variables, we know if a directory is complete
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

2f276c51

ceph: acquire i_mutex in __ceph_do_pending_vmtruncate · 3f99969f

由 Yan, Zheng 提交于 3月 01, 2013

make __ceph_do_pending_vmtruncate() acquire the i_mutex if the caller
does not hold the i_mutex, so ceph_aio_read() can call safely.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>

3f99969f

ceph: use I_COMPLETE inode flag instead of D_COMPLETE flag · a8673d61

由 Yan, Zheng 提交于 2月 18, 2013

commit c6ffe100 moved the flag that tracks if the dcache contents
for a directory are complete to dentry. The problem is there are
lots of places that use ceph_dir_{set,clear,test}_complete() while
holding i_ceph_lock. but ceph_dir_{set,clear,test}_complete() may
sleep because they call dput().

This patch basically reverts that commit. For ceph_d_prune(), it's
called with both the dentry to prune and the parent dentry are
locked. So it's safe to access the parent dentry's d_inode and
clear I_COMPLETE flag.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a8673d61

26 2月, 2013 1 次提交

ceph: prepopulate inodes only when request is aborted · 79f9f99a

由 Sage Weil 提交于 1月 29, 2013

If r_aborted is true, we do not hold the dir i_mutex, and cannot touch
the dcache.  However, we still need to update the inodes with the state
returned by the MDS.
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NSage Weil <sage@inktank.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

79f9f99a

12 2月, 2013 2 次提交

ceph: Convert kuids and kgids before printing them. · bd2bae6a

由 Eric W. Biederman 提交于 1月 31, 2013

Before printing kuid and kgids values convert them into
the initial user namespace.

Cc: Sage Weil <sage@inktank.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

bd2bae6a

ceph: Translate inode uid and gid attributes to/from kuids and kgids. · ab871b90

由 Eric W. Biederman 提交于 1月 31, 2013

- In fill_inode() transate uids and gids in the initial user namespace
  into kuids and kgids stored in inode->i_uid and inode->i_gid.

- In ceph_setattr() if they have changed convert inode->i_uid and
  inode->i_gid into initial user namespace uids and gids for
  transmission.

Cc: Sage Weil <sage@inktank.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

ab871b90

13 12月, 2012 1 次提交

ceph: Fix __ceph_do_pending_vmtruncate · a85f50b6

由 Yan, Zheng 提交于 11月 19, 2012

we should set i_truncate_pending to 0 after page cache is truncated
to i_truncate_size
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NSage Weil <sage@inktank.com>

a85f50b6

27 9月, 2012 1 次提交
- A
  ceph: don't abuse d_delete() on failure exits · 2744c171
  由 Al Viro 提交于 9月 26, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  2744c171
22 8月, 2012 1 次提交

ceph: tolerate (and warn on) extraneous dentry from mds · 6c5e50fa

由 Sage Weil 提交于 8月 21, 2012

If the MDS gives us a dentry and we weren't prepared to handle it,
WARN_ON_ONCE instead of crashing.
Reported-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

6c5e50fa

22 3月, 2012 1 次提交

ceph: avoid panic with mismatched symlink sizes in fill_inode() · 810339ec

由 Xi Wang 提交于 2月 03, 2012

Return -EINVAL rather than panic if iinfo->symlink_len and inode->i_size
do not match.

Also use kstrndup rather than kmalloc/memcpy.
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Reviewed-by: NAlex Elder <elder@dreamhost.com>

810339ec

11 1月, 2012 1 次提交
- Y
  ceph: dereference pointer after checking for NULL · b8cd952b
  由 Yehuda Sadeh 提交于 12月 13, 2011
```
moved dereference after BUG_ON
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
```
  b8cd952b
04 1月, 2012 1 次提交

vfs: fix the stupidity with i_dentry in inode destructors · 6b520e05

由 Al Viro 提交于 12月 12, 2011

Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else. Not to mention the removal of
boilerplate code from ->destroy_inode() instances...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6b520e05

08 12月, 2011 1 次提交

ceph: use i_ceph_lock instead of i_lock · be655596

由 Sage Weil 提交于 11月 30, 2011

We have been using i_lock to protect all kinds of data structures in the
ceph_inode_info struct, including lists of inodes that we need to iterate
over while avoiding races with inode destruction.  That requires grabbing
a reference to the inode with the list lock protected, but igrab() now
takes i_lock to check the inode flags.

Changing the list lock ordering would be a painful process.

However, using a ceph-specific i_ceph_lock in the ceph inode instead of
i_lock is a simple mechanical change and avoids the ordering constraints
imposed by igrab().
Reported-by: NAmon Ott <a.ott@m-privacy.de>
Signed-off-by: NSage Weil <sage@newdream.net>

be655596

06 11月, 2011 2 次提交

ceph: fix iput race when queueing inode work · 15a2015f

由 Sage Weil 提交于 11月 05, 2011

If we queue a work item that calls iput(), make sure we ihold() before
attempting to queue work. Otherwise our queued work might miraculously run
before we notice the queue_work() succeeded and call ihold(), allowing the
inode to be destroyed.

That is, instead of

	if (queue_work(...))
		ihold();

we need to do

	ihold();
	if (!queue_work(...))
		iput();
Reported-by: NAmon Ott <a.ott@m-privacy.de>
Signed-off-by: NSage Weil <sage@newdream.net>

15a2015f

ceph: use new D_COMPLETE dentry flag · c6ffe100

由 Sage Weil 提交于 11月 03, 2011

We used to use a flag on the directory inode to track whether the dcache
contents for a directory were a complete cached copy. Switch to a dentry
flag CEPH_D_COMPLETE that is safely updated by ->d_prune().
Signed-off-by: NSage Weil <sage@newdream.net>

c6ffe100

02 11月, 2011 1 次提交

filesystems: add set_nlink() · bfe86848

由 Miklos Szeredi 提交于 10月 28, 2011

Replace remaining direct i_nlink updates with a new set_nlink()
updater function.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Tested-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bfe86848

26 10月, 2011 1 次提交

Revert "ceph: don't truncate dirty pages in invalidate work thread" · 83eaea22

由 Sage Weil 提交于 8月 24, 2011

This reverts commit c9af9fb6.

We need to block and truncate all pages in order to reliably invalidate
them.  Otherwise, we could:

 - have some uptodate pages in the cache
 - queue an invalidate
 - write(2) locks some pages
 - invalidate_work skips them
 - write(2) only overwrites part of the page
 - page now dirty and uptodate
 -> partial leakage of invalidated data

It's not entirely clear why we started skipping locked pages in the first
place.  I just ran this through fsx and didn't see any problems.
Signed-off-by: NSage Weil <sage@newdream.net>

83eaea22

27 7月, 2011 4 次提交

S
ceph: document locking for ceph_set_dentry_offset · 4f177264
由 Sage Weil 提交于 7月 26, 2011
```
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>
```
4f177264

ceph: protect access to d_parent · 5f21c96d

由 Sage Weil 提交于 7月 26, 2011

d_parent is protected by d_lock: use it when looking up a dentry's parent
directory inode.  Also take a reference and drop it in the caller to avoid
a use-after-free.
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

5f21c96d

ceph: set dir complete frag after adding capability · dfabbed6

由 Sage Weil 提交于 7月 26, 2011

Curretly ceph_add_cap clears the complete bit if we are newly issued the
FILE_SHARED cap, which is normally the case for a newly issue cap on a new
directory.  That means we clear the just-set bit.  Move the check that sets
the flag to after the cap is added/updated.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

dfabbed6

ceph: ignore lease mask · 2f90b852

由 Sage Weil 提交于 7月 26, 2011

The lease mask is no longer used (and it changed a while back).  Instead,
use a non-zero duration to indicate that there is a lease being issued.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

2f90b852

20 7月, 2011 3 次提交

A
->permission() sanitizing: don't pass flags to ->permission() · 10556cb2
由 Al Viro 提交于 6月 20, 2011
```
not used by the instances anymore.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
10556cb2

->permission() sanitizing: don't pass flags to generic_permission() · 2830ba7f

由 Al Viro 提交于 6月 20, 2011

redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of
them removes that bit.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2830ba7f

kill check_acl callback of generic_permission() · 178ea735

由 Al Viro 提交于 6月 20, 2011

its value depends only on inode and does not change; we might as
well store it in ->i_op->check_acl and be done with that.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

178ea735

08 6月, 2011 1 次提交

ceph: use ihold when we already have an inode ref · 70b666c3

由 Sage Weil 提交于 5月 27, 2011

We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock.  This avoids adding new and unnecessary
locking dependencies.
Signed-off-by: NSage Weil <sage@newdream.net>

70b666c3

12 5月, 2011 1 次提交

ceph: do not use i_wrbuffer_ref as refcount for Fb cap · d3d0720d

由 Henry C Chang 提交于 5月 11, 2011

We increments i_wrbuffer_ref when taking the Fb cap. This breaks
the dirty page accounting and causes looping in
__ceph_do_pending_vmtruncate, and ceph client hangs.

This bug can be reproduced occasionally by running blogbench.

Add a new field i_wb_ref to inode and dedicate it to Fb reference
counting.
Signed-off-by: NHenry C Chang <henry.cy.chang@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

d3d0720d

05 5月, 2011 1 次提交

ceph: do not call __mark_dirty_inode under i_lock · fca65b4a

由 Sage Weil 提交于 5月 04, 2011

The __mark_dirty_inode helper now takes i_lock as of 250df6ed. Fix the
one ceph callers that held i_lock (__ceph_mark_dirty_caps) to return the
flags value so that the callers can do it outside of i_lock.
Signed-off-by: NSage Weil <sage@newdream.net>

fca65b4a

22 3月, 2011 1 次提交

ceph: add ino32 mount option · ad1fee96

由 Yehuda Sadeh 提交于 1月 21, 2011

The ino32 mount option forces the ceph fs to report 32 bit
ino values.  This is useful for 64 bit kernels with 32 bit userspace.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

ad1fee96

16 3月, 2011 1 次提交

ceph: preserve I_COMPLETE across rename · 09adc80c

由 Sage Weil 提交于 2月 04, 2011

d_move puts the renamed dentry at the end of d_subdirs, screwing with our
cached dentry directory offsets. We were just clearing I_COMPLETE to avoid
any possibility of trouble. However, assigning the renamed dentry an
offset at the end of the directory (to match it's new d_subdirs position)
is sufficient to maintain correct behavior and hold onto I_COMPLETE.

This is especially important for workloads like rsync, which renames files
into place. Before, we would lose I_COMPLETE and do MDS lookups for each
file. With this patch we only talk to the MDS on create and rename.
Signed-off-by: NSage Weil <sage@newdream.net>

09adc80c

04 3月, 2011 1 次提交

ceph: do not set I_COMPLETE · b545cc15

由 Sage Weil 提交于 2月 28, 2011

Do not set the I_COMPLETE flag on directories until we resolve races with
dcache pruning.
Signed-off-by: NSage Weil <sage@newdream.net>

b545cc15

14 1月, 2011 1 次提交

ceph: fix getattr on directory when using norbytes · 1c1266bb

由 Yehuda Sadeh 提交于 1月 12, 2011

The norbytes mount option was broken, and when doing getattr
on a directory it return the rbytes instead of the number of
entities. This commit fixes it.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

1c1266bb

13 1月, 2011 2 次提交

ceph: implement DIRLAYOUTHASH feature to get dir layout from MDS · 14303d20

由 Sage Weil 提交于 12月 14, 2010

This implements the DIRLAYOUTHASH protocol feature, which passes the dir
layout over the wire from the MDS. This gives the client knowledge
of the correct hash function to use for mapping dentries among dir
fragments.

Note that if this feature is _not_ present on the client but is on the
MDS, the client may misdirect requests. This will result in a forward
and degrade performance. It may also result in inaccurate NFS filehandle
generation, which will prevent fh resolution when the inode is not present
in the client cache and the parent directories have been fragmented.
Signed-off-by: NSage Weil <sage@newdream.net>

14303d20

ceph: add dir_layout to inode · 6c0f3af7

由 Sage Weil 提交于 11月 16, 2010

Add a ceph_dir_layout to the inode, and calculate dentry hash values based
on the parent directory's specified dir_hash function. This is needed
because the old default Linux dcache hash function is extremely week and
leads to a poor distribution of files among dir fragments.
Signed-off-by: NSage Weil <sage@newdream.net>

6c0f3af7

07 1月, 2011 5 次提交

N
fs: provide rcu-walk aware permission i_ops · b74c79e9
由 Nick Piggin 提交于 1月 07, 2011
```
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
```
b74c79e9

fs: icache RCU free inodes · fa0d7e3d

由 Nick Piggin 提交于 1月 07, 2011

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fa0d7e3d

fs: dcache remove dcache_lock · b5c84bf6

由 Nick Piggin 提交于 1月 07, 2011

dcache_lock no longer protects anything. remove it.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b5c84bf6

fs: dcache scale subdirs · 2fd6b7f5

由 Nick Piggin 提交于 1月 07, 2011

Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
using dcache_lock for these anyway (eg. using i_mutex).

Note: if we change the locking rule in future so that ->d_child protection is
provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
But it would be an exception to an otherwise regular locking scheme, so we'd
have to see some good results. Probably not worthwhile.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

2fd6b7f5

fs: dcache scale dentry refcount · b7ab39f6

由 Nick Piggin 提交于 1月 07, 2011

Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b7ab39f6

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功