提交 · b415bf4f9fe25f39934f5c464125e4a2dffb6d08 · openanolis / cloud-kernel

04 7月, 2013 5 次提交

ceph: fix pending vmtruncate race · b415bf4f

由 Yan, Zheng 提交于 7月 02, 2013

The locking order for pending vmtruncate is wrong, it can lead to
following race:

        write                  wmtruncate work
------------------------    ----------------------
lock i_mutex
check i_truncate_pending   check i_truncate_pending
truncate_inode_pages()     lock i_mutex (blocked)
copy data to page cache
unlock i_mutex
                           truncate_inode_pages()

The fix is take i_mutex before calling __ceph_do_pending_vmtruncate()

Fixes: http://tracker.ceph.com/issues/5453Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

b415bf4f

ceph: Reconstruct the func ceph_reserve_caps. · 93faca6e

由 majianpeng 提交于 6月 26, 2013

Drop ignored return value.  Fix allocation failure case to not leak.
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Reviewed-by: NSage Weil <sage@inktank.com>

93faca6e

Y
ceph: move inode to proper flushing list when auth MDS changes · 005c4697
由 Yan, Zheng 提交于 5月 31, 2013
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>
```
005c4697

ceph: check migrate seq before changing auth cap · b8c2f3ae

由 Yan, Zheng 提交于 5月 31, 2013

We may receive old request reply from the exporter MDS after receiving
the importer MDS' cap import message.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

b8c2f3ae

ceph: fix cap release race · bb137f84

由 Yan, Zheng 提交于 6月 03, 2013

ceph_encode_inode_release() can race with ceph_open() and release
caps wanted by open files. So it should call __ceph_caps_wanted()
to get the wanted caps.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

bb137f84

02 5月, 2013 5 次提交

ceph: take i_mutex before getting Fw cap · 37505d57

由 Yan, Zheng 提交于 4月 12, 2013

There is deadlock as illustrated bellow. The fix is taking i_mutex
before getting Fw cap reference.

      write                    truncate                 MDS
---------------------     --------------------      --------------
get Fw cap
                          lock i_mutex
lock i_mutex (blocked)
                          request setattr.size  ->
                                                <-   revoke Fw cap
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

37505d57

ceph: use i_release_count to indicate dir's completeness · 2f276c51

由 Yan, Zheng 提交于 3月 13, 2013

Current ceph code tracks directory's completeness in two places.
ceph_readdir() checks i_release_count to decide if it can set the
I_COMPLETE flag in i_ceph_flags. All other places check the I_COMPLETE
flag. This indirection introduces locking complexity.

This patch adds a new variable i_complete_count to ceph_inode_info.
Set i_release_count's value to it when marking a directory complete.
By comparing the two variables, we know if a directory is complete
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

2f276c51

ceph: use I_COMPLETE inode flag instead of D_COMPLETE flag · a8673d61

由 Yan, Zheng 提交于 2月 18, 2013

commit c6ffe100 moved the flag that tracks if the dcache contents
for a directory are complete to dentry. The problem is there are
lots of places that use ceph_dir_{set,clear,test}_complete() while
holding i_ceph_lock. but ceph_dir_{set,clear,test}_complete() may
sleep because they call dput().

This patch basically reverts that commit. For ceph_d_prune(), it's
called with both the dentry to prune and the parent dentry are
locked. So it's safe to access the parent dentry's d_inode and
clear I_COMPLETE flag.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a8673d61

ceph: set mds_want according to cap import message · 964266cc

由 Yan, Zheng 提交于 2月 27, 2013

MDS ignores cap update message if migrate_seq mismatch, so when
receiving a cap import message with higher migrate_seq, set mds_want
according to the cap import message.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>

964266cc

ceph: queue cap release when trimming cap · d40ee0dc

由 Yan, Zheng 提交于 2月 18, 2013

So the client will later send cap release message to MDS
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>

d40ee0dc

12 2月, 2013 2 次提交

ceph: Convert kuids and kgids before printing them. · bd2bae6a

由 Eric W. Biederman 提交于 1月 31, 2013

Before printing kuid and kgids values convert them into
the initial user namespace.

Cc: Sage Weil <sage@inktank.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

bd2bae6a

ceph: Translate between uid and gids in cap messages and kuids and kgids · 05cb11c1

由 Eric W. Biederman 提交于 1月 31, 2013

- Make the uid and gid arguments of send_cap_msg() used to compose
  ceph_mds_caps messages of type kuid_t and kgid_t.

- Pass inode->i_uid and inode->i_gid in __send_cap to send_cap_msg()
  through variables of type kuid_t and kgid_t.

- Modify struct ceph_cap_snap to store uids and gids in types kuid_t
  and kgid_t.  This allows capturing inode->i_uid and inode->i_gid in
  ceph_queue_cap_snap() without loss and pssing them to
  __ceph_flush_snaps() where they are removed from struct
  ceph_cap_snap and passed to send_cap_msg().

- In handle_cap_grant translate uid and gids in the initial user
  namespace stored in struct ceph_mds_cap into kuids and kgids
  before setting inode->i_uid and inode->i_gid.

Cc: Sage Weil <sage@inktank.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

05cb11c1

18 1月, 2013 4 次提交

ceph: check mds_wanted for imported cap · 390306c3

由 Yan, Zheng 提交于 1月 04, 2013

The MDS may have incorrect wanted caps after importing caps. So the
client should check the value mds has and send cap update if necessary.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

390306c3

ceph: allocate cap_release message when receiving cap import · 66f58691

由 Yan, Zheng 提交于 1月 04, 2013

When client wants to release an imported cap, it's possible there
is no reserved cap_release message in corresponding mds session.
so __queue_cap_release causes kernel panic.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

66f58691

ceph: allow revoking duplicated caps issued by non-auth MDS · 395c312b

由 Yan, Zheng 提交于 1月 04, 2013

Allow revoking duplicated caps issued by non-auth MDS if these caps
are also issued by auth MDS.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

395c312b

Y
ceph: move dirty inode to migrating list when clearing auth caps · 8a92a119
由 Yan, Zheng 提交于 1月 04, 2013
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>
```
8a92a119

13 12月, 2012 3 次提交

ceph: call handle_cap_grant() for cap import message · 0e5e1774

由 Yan, Zheng 提交于 11月 19, 2012

If client sends cap message that requests new max size during
exporting caps, the exporting MDS will drop the message quietly.
So the client may wait for the reply that updates the max size
forever. call handle_cap_grant() for cap import message can
avoid this issue.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NSage Weil <sage@inktank.com>

0e5e1774

ceph: Don't add dirty inode to dirty list if caps is in migration · 0685235f

由 Yan, Zheng 提交于 11月 19, 2012

Add dirty inode to cap_dirty_migrating list instead, this can avoid
ceph_flush_dirty_caps() entering infinite loop.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NSage Weil <sage@inktank.com>

0685235f

ceph: Don't update i_max_size when handling non-auth cap · 5e62ad30

由 Yan, Zheng 提交于 11月 19, 2012

The cap from non-auth mds doesn't have a meaningful max_size value.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NSage Weil <sage@inktank.com>

5e62ad30

04 11月, 2012 1 次提交
- Y
  ceph: Hold caps_list_lock when adjusting caps_{use, total}_count · 4d1d0534
  由 Yan, Zheng 提交于 11月 03, 2012
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NSage Weil <sage@inktank.com>
```
  4d1d0534
02 10月, 2012 1 次提交

ceph: convert to use le32_add_cpu() · b905a7f8

由 Wei Yongjun 提交于 9月 28, 2012

Convert cpu_to_le32(le32_to_cpu(E1) + E2) to use le32_add_cpu().

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NSage Weil <sage@inktank.com>

b905a7f8

03 2月, 2012 1 次提交

ceph: create a new session lock to avoid lock inversion · d8fb02ab

由 Alex Elder 提交于 1月 12, 2012

Lockdep was reporting a possible circular lock dependency in
dentry_lease_is_valid().  That function needs to sample the
session's s_cap_gen and and s_cap_ttl fields coherently, but needs
to do so while holding a dentry lock.  The s_cap_lock field was
being used to protect the two fields, but that can't be taken while
holding a lock on a dentry within the session.

In most cases, the s_cap_gen and s_cap_ttl fields only get operated
on separately.  But in three cases they need to be updated together.
Implement a new lock to protect the spots updating both fields
atomically is required.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

d8fb02ab

04 1月, 2012 1 次提交
- A
  ceph: propagate umode_t · 5706b27d
  由 Al Viro 提交于 7月 26, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  5706b27d
08 12月, 2011 1 次提交

ceph: use i_ceph_lock instead of i_lock · be655596

由 Sage Weil 提交于 11月 30, 2011

We have been using i_lock to protect all kinds of data structures in the
ceph_inode_info struct, including lists of inodes that we need to iterate
over while avoiding races with inode destruction.  That requires grabbing
a reference to the inode with the list lock protected, but igrab() now
takes i_lock to check the inode flags.

Changing the list lock ordering would be a painful process.

However, using a ceph-specific i_ceph_lock in the ceph inode instead of
i_lock is a simple mechanical change and avoids the ordering constraints
imposed by igrab().
Reported-by: NAmon Ott <a.ott@m-privacy.de>
Signed-off-by: NSage Weil <sage@newdream.net>

be655596

06 11月, 2011 1 次提交

ceph: use new D_COMPLETE dentry flag · c6ffe100

由 Sage Weil 提交于 11月 03, 2011

We used to use a flag on the directory inode to track whether the dcache
contents for a directory were a complete cached copy. Switch to a dentry
flag CEPH_D_COMPLETE that is safely updated by ->d_prune().
Signed-off-by: NSage Weil <sage@newdream.net>

c6ffe100

02 11月, 2011 1 次提交

filesystems: add set_nlink() · bfe86848

由 Miklos Szeredi 提交于 10月 28, 2011

Replace remaining direct i_nlink updates with a new set_nlink()
updater function.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Tested-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bfe86848

26 10月, 2011 1 次提交

libceph: don't complain on msgpool alloc failures · b61c2763

由 Sage Weil 提交于 8月 09, 2011

The pool allocation failures are masked by the pool; there is no need to
spam the console about them.  (That's the whole point of having the pool
in the first place.)

Mark msg allocations whose failure is safely handled as such.
Signed-off-by: NSage Weil <sage@newdream.net>

b61c2763

21 7月, 2011 1 次提交

fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82

由 Josef Bacik 提交于 7月 16, 2011

Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

02c24a82

08 6月, 2011 1 次提交

ceph: use ihold when we already have an inode ref · 70b666c3

由 Sage Weil 提交于 5月 27, 2011

We should use ihold whenever we already have a stable inode ref, even
when we aren't holding i_lock.  This avoids adding new and unnecessary
locking dependencies.
Signed-off-by: NSage Weil <sage@newdream.net>

70b666c3

25 5月, 2011 1 次提交

ceph: fix cap flush race reentrancy · db354052

由 Sage Weil 提交于 5月 24, 2011

In e9964c10 we change cap flushing to do a delicate dance because some
inodes on the cap_dirty list could be in a migrating state (got EXPORT but
not IMPORT) in which we couldn't actually flush and move from
dirty->flushing, breaking the while (!empty) { process first } loop
structure.  It worked for a single sync thread, but was not reentrant and
triggered infinite loops when multiple syncers came along.

Instead, move inodes with dirty to a separate cap_dirty_migrating list
when in the limbo export-but-no-import state, allowing us to go back to
the simple loop structure (which was reentrant).  This is cleaner and more
robust.

Audited the cap_dirty users and this looks fine:
list_empty(&ci->i_dirty_item) is still a reliable indicator of whether we
have dirty caps (which list we're on is irrelevant) and list_del_init()
calls still do the right thing.
Signed-off-by: NSage Weil <sage@newdream.net>

db354052

20 5月, 2011 1 次提交

ceph: fix rare potential cap leak · 3540303f

由 Sage Weil 提交于 5月 12, 2011

If we grab new_cap, retake the lock, and find we already have a cap now
for the given mds, release new_cap.
Signed-off-by: NSage Weil <sage@newdream.net>

3540303f

12 5月, 2011 1 次提交

ceph: do not use i_wrbuffer_ref as refcount for Fb cap · d3d0720d

由 Henry C Chang 提交于 5月 11, 2011

We increments i_wrbuffer_ref when taking the Fb cap. This breaks
the dirty page accounting and causes looping in
__ceph_do_pending_vmtruncate, and ceph client hangs.

This bug can be reproduced occasionally by running blogbench.

Add a new field i_wb_ref to inode and dedicate it to Fb reference
counting.
Signed-off-by: NHenry C Chang <henry.cy.chang@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

d3d0720d

05 5月, 2011 1 次提交

ceph: do not call __mark_dirty_inode under i_lock · fca65b4a

由 Sage Weil 提交于 5月 04, 2011

The __mark_dirty_inode helper now takes i_lock as of 250df6ed. Fix the
one ceph callers that held i_lock (__ceph_mark_dirty_caps) to return the
flags value so that the callers can do it outside of i_lock.
Signed-off-by: NSage Weil <sage@newdream.net>

fca65b4a

04 5月, 2011 1 次提交
- S
  ceph: use ihold() when i_lock is held · 3772d26d
  由 Sage Weil 提交于 5月 03, 2011
```
See 0444d76a.
Signed-off-by: NSage Weil <sage@newdream.net>
```
  3772d26d
31 3月, 2011 1 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

20 1月, 2011 3 次提交

ceph: avoid immediate cap check after import · 7e57b81c

由 Sage Weil 提交于 1月 18, 2011

The NODELAY flag avoids the heuristics that delay cap (issued/wanted)
release.  There's no reason for that after we import a cap, and it kills
whatever benefit we get from those delays.
Signed-off-by: NSage Weil <sage@newdream.net>

7e57b81c

ceph: fix flushing of caps vs cap import · 088b3f5e

由 Sage Weil 提交于 1月 18, 2011

If we are mid-flush and a cap is migrated to another node, we need to
resend the cap flush message to the new MDS, and do so with the original
flush_seq to avoid leaking across a sync boundary.  Previously we didn't
redo the flush (we only flushed newly dirty data), which would cause a
later sync to hang forever.
Signed-off-by: NSage Weil <sage@newdream.net>

088b3f5e

ceph: fix erroneous cap flush to non-auth mds · 24be0c48

由 Sage Weil 提交于 1月 18, 2011

The int flushing is global and not clear on each iteration of the loop,
which can cause a second flush of caps to any MDSs with ids greater than
the auth.
Signed-off-by: NSage Weil <sage@newdream.net>

24be0c48

08 11月, 2010 2 次提交

ceph: fix rdcache_gen usage and invalidate · cd045cb4

由 Sage Weil 提交于 11月 04, 2010

We used to use rdcache_gen to indicate whether we "might" have cached
pages. Now we just look at the mapping to determine that. However, some
old behavior remains from that transition.

First, rdcache_gen == 0 no longer means we have no pages. That can happen
at any time (presumably when we carry FILE_CACHE). We should not reset it
to zero, and we should not check that it is zero.

That means that the only purpose for rdcache_revoking is to resolve races
between new issues of FILE_CACHE and an async invalidate. If they are
equal, we should invalidate. On success, we decrement rdcache_revoking,
so that it is no longer equal to rdcache_gen. Similarly, if we success
in doing a sync invalidate, set revoking = gen - 1. (This is a small
optimization to avoid doing unnecessary invalidate work and does not
affect correctness.)
Signed-off-by: NSage Weil <sage@newdream.net>

cd045cb4

ceph: re-request max_size if cap auth changes · feb4cc9b

由 Sage Weil 提交于 11月 07, 2010

If the auth cap migrates to another MDS, clear requested_max_size so that
we resend any pending max_size increase requests.  This fixes potential
hangs on writes that extend a file and race with an cap migration between
MDSs.
Signed-off-by: NSage Weil <sage@newdream.net>

feb4cc9b

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功