提交 · 39be95e9c8c0b5668c9f8806ffe29bf9f4bc0f40 · openanolis / cloud-kernel

18 5月, 2013 1 次提交

ceph: ceph_pagelist_append might sleep while atomic · 39be95e9

由 Jim Schutt 提交于 5月 15, 2013

Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
while holding a lock, but it's spoiled because ceph_pagelist_addpage()
always calls kmap(), which might sleep.  Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
    . . .
[13439.376225] Call Trace:
[13439.378757]  [<ffffffff81076f4c>] __might_sleep+0xfc/0x110
[13439.384353]  [<ffffffffa03f4ce0>] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491]  [<ffffffffa0448fe9>] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035]  [<ffffffff814ee849>] ? _raw_spin_lock+0x49/0x50
[13439.403775]  [<ffffffff811cadf5>] ? lock_flocks+0x15/0x20
[13439.409277]  [<ffffffffa045e2af>] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622]  [<ffffffff81196748>] ? igrab+0x28/0x70
[13439.420610]  [<ffffffffa045e9f8>] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584]  [<ffffffffa045ea25>] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499]  [<ffffffffa045de90>] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646]  [<ffffffffa0462888>] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363]  [<ffffffffa0464542>] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250]  [<ffffffffa0462e42>] check_new_map+0x352/0x500 [ceph]
[13439.461534]  [<ffffffffa04631ad>] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432]  [<ffffffff814ebc7e>] ? mutex_unlock+0xe/0x10
[13439.473934]  [<ffffffffa043c612>] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464]  [<ffffffffa03f6c2c>] dispatch+0xbc/0x110 [libceph]
[13439.486492]  [<ffffffffa03eec3d>] process_message+0x1ad/0x1d0 [libceph]
[13439.493190]  [<ffffffffa03f1498>] ? read_partial_message+0x3e8/0x520 [libceph]
    . . .
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the number
of encoded locks is stable, copy that into a ceph_pagelist.

[elder@inktank.com: abbreviated the stack info a bit.]

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Reviewed-by: NAlex Elder <elder@inktank.com>

39be95e9

02 5月, 2013 4 次提交

ceph: use i_release_count to indicate dir's completeness · 2f276c51

由 Yan, Zheng 提交于 3月 13, 2013

Current ceph code tracks directory's completeness in two places.
ceph_readdir() checks i_release_count to decide if it can set the
I_COMPLETE flag in i_ceph_flags. All other places check the I_COMPLETE
flag. This indirection introduces locking complexity.

This patch adds a new variable i_complete_count to ceph_inode_info.
Set i_release_count's value to it when marking a directory complete.
By comparing the two variables, we know if a directory is complete
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

2f276c51

ceph: acquire i_mutex in __ceph_do_pending_vmtruncate · 3f99969f

由 Yan, Zheng 提交于 3月 01, 2013

make __ceph_do_pending_vmtruncate() acquire the i_mutex if the caller
does not hold the i_mutex, so ceph_aio_read() can call safely.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>

3f99969f

ceph: use I_COMPLETE inode flag instead of D_COMPLETE flag · a8673d61

由 Yan, Zheng 提交于 2月 18, 2013

commit c6ffe100 moved the flag that tracks if the dcache contents
for a directory are complete to dentry. The problem is there are
lots of places that use ceph_dir_{set,clear,test}_complete() while
holding i_ceph_lock. but ceph_dir_{set,clear,test}_complete() may
sleep because they call dput().

This patch basically reverts that commit. For ceph_d_prune(), it's
called with both the dentry to prune and the parent dentry are
locked. So it's safe to access the parent dentry's d_inode and
clear I_COMPLETE flag.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a8673d61

ceph: queue cap release when trimming cap · d40ee0dc

由 Yan, Zheng 提交于 2月 18, 2013

So the client will later send cap release message to MDS
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>

d40ee0dc

23 2月, 2013 1 次提交

ceph: fix statvfs fr_size · 92a49fb0

由 Sage Weil 提交于 2月 22, 2013

Different versions of glibc are broken in different ways, but the short of
it is that for the time being, frsize should == bsize, and be used as the
multiple for the blocks, free, and available fields. This mirrors what is
done for NFS. The previous reporting of the page size for frsize meant
that newer glibc and df would report a very small value for the fs size.

Fixes http://tracker.ceph.com/issues/3793.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>

92a49fb0

20 2月, 2013 1 次提交

ceph: remove a few bogus declarations · 9e0eb85d

由 Alex Elder 提交于 2月 06, 2013

There are three ceph page vector functions declared in
"fs/ceph/super.h" that don't belong there.  They're
probably left over from some long-ago code reorganization.

They're properly declared in "include/linux/ceph/libceph.h"
so just delete the ones in "super.h".

This and the next few commits resolve:
    http://tracker.ceph.com/issues/4053Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

9e0eb85d

12 2月, 2013 1 次提交

ceph: Translate between uid and gids in cap messages and kuids and kgids · 05cb11c1

由 Eric W. Biederman 提交于 1月 31, 2013

- Make the uid and gid arguments of send_cap_msg() used to compose
  ceph_mds_caps messages of type kuid_t and kgid_t.

- Pass inode->i_uid and inode->i_gid in __send_cap to send_cap_msg()
  through variables of type kuid_t and kgid_t.

- Modify struct ceph_cap_snap to store uids and gids in types kuid_t
  and kgid_t.  This allows capturing inode->i_uid and inode->i_gid in
  ceph_queue_cap_snap() without loss and pssing them to
  __ceph_flush_snaps() where they are removed from struct
  ceph_cap_snap and passed to send_cap_msg().

- In handle_cap_grant translate uid and gids in the initial user
  namespace stored in struct ceph_mds_cap into kuids and kgids
  before setting inode->i_uid and inode->i_gid.

Cc: Sage Weil <sage@inktank.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

05cb11c1

03 8月, 2012 1 次提交

ceph: simplify+fix atomic_open · 5ef50c3b

由 Sage Weil 提交于 7月 31, 2012

The initial ->atomic_open op was carried over from the old intent code,
which was incomplete and didn't really work.  Replace it with a fresh
method.  In particular:

 * always attempt to do an atomic open+lookup, both for the create case
   and for lookups of existing files.
 * fix symlink handling by returning 1 to the VFS so that we can follow
   the link to its destination. This fixes a longstanding ceph bug (#2392).
Signed-off-by: NSage Weil <sage@inktank.com>

5ef50c3b

31 7月, 2012 1 次提交

ceph: define snap counts as u32 everywhere · aa711ee3

由 Alex Elder 提交于 7月 13, 2012

There are two structures in which a count of snapshots are
maintained:

    struct ceph_snap_context {
	...
        u32 num_snaps;
	...
    }
and
    struct ceph_snap_realm {
	...
        u32 num_prior_parent_snaps;   /*  had prior to parent_since */
	...
        u32 num_snaps;
	...
    }

These fields never take on negative values (e.g., to hold special
meaning), and so are really inherently unsigned.  Furthermore they
take their value from over-the-wire or on-disk formatted 32-bit
values.

So change their definition to have type u32, and change some spots
elsewhere in the code to account for this change.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

aa711ee3

14 7月, 2012 5 次提交

kill struct opendata · 30d90494

由 Al Viro 提交于 6月 22, 2012

Just pass struct file *.  Methods are happier that way...
There's no need to return struct file * from finish_open() now,
so let it return int.  Next: saner prototypes for parts in
namei.c
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

30d90494

make ->atomic_open() return int · d9585277

由 Al Viro 提交于 6月 22, 2012

Change of calling conventions:
old		new
NULL		1
file		0
ERR_PTR(-ve)	-ve

Caller *knows* that struct file *; no need to return it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d9585277

->atomic_open() prototype change - pass int * instead of bool * · 47237687

由 Al Viro 提交于 6月 10, 2012

... and let finish_open() report having opened the file via that sucker.
Next step: don't modify od->filp at all.

[AV: FILE_CREATE was already used by cifs; Miklos' fix folded]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

47237687

ceph: implement i_op->atomic_open() · 2d83bde9

由 Miklos Szeredi 提交于 6月 05, 2012

Add an ->atomic_open implementation which replaces the atomic lookup+open+create
operation implemented via ->lookup and ->create operations.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
CC: Sage Weil <sage@newdream.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2d83bde9

ceph: remove unused arg from ceph_lookup_open() · 3819219b

由 Miklos Szeredi 提交于 6月 05, 2012

What was the purpose of this?
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
CC: Sage Weil <sage@newdream.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3819219b

22 3月, 2012 2 次提交

ceph: avoid repeatedly computing the size of constant vxattr names · 3ce6cd12

由 Alex Elder 提交于 1月 23, 2012

All names defined in the directory and file virtual extended
attribute tables are constant, and the size of each is known at
compile time.  So there's no need to compute their length every
time any file's attribute is listed.

Record the length of each string and use it when needed to determine
the space need to represent them.  In addition, compute the
aggregate size of strings in each table just once at initialization
time.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

3ce6cd12

ceph: use 2 instead of 1 as fallback for 32-bit inode number · a661fc56

由 Amon Ott 提交于 1月 23, 2012

The root directory of the Ceph mount has inode number 1, so falling back
to 1 always creates a collision. 2 is unused on my test systems and seems
less likely to collide.
Signed-off-by: NAmon Ott <ao@m-privacy.de>
Signed-off-by: NSage Weil <sage@newdream.net>

a661fc56

13 1月, 2012 1 次提交

ceph: enable/disable dentry complete flags via mount option · a40dc6cc

由 Sage Weil 提交于 1月 10, 2012

Enable/disable use of the dentry dir 'complete' flag via a mount option.
This lets the admin control whether ceph uses the dcache to satisfy
negative lookups or readdir when it has the entire directory contents in
its cache.

This is purely a performance optimization; correctness is guaranteed
whether it is enabled or not.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSage Weil <sage@newdream.net>

a40dc6cc

04 1月, 2012 1 次提交
- A
  ceph: propagate umode_t · 5706b27d
  由 Al Viro 提交于 7月 26, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  5706b27d
08 12月, 2011 1 次提交

ceph: use i_ceph_lock instead of i_lock · be655596

由 Sage Weil 提交于 11月 30, 2011

We have been using i_lock to protect all kinds of data structures in the
ceph_inode_info struct, including lists of inodes that we need to iterate
over while avoiding races with inode destruction.  That requires grabbing
a reference to the inode with the list lock protected, but igrab() now
takes i_lock to check the inode flags.

Changing the list lock ordering would be a painful process.

However, using a ceph-specific i_ceph_lock in the ceph inode instead of
i_lock is a simple mechanical change and avoids the ordering constraints
imposed by igrab().
Reported-by: NAmon Ott <a.ott@m-privacy.de>
Signed-off-by: NSage Weil <sage@newdream.net>

be655596

06 11月, 2011 1 次提交

ceph: use new D_COMPLETE dentry flag · c6ffe100

由 Sage Weil 提交于 11月 03, 2011

We used to use a flag on the directory inode to track whether the dcache
contents for a directory were a complete cached copy. Switch to a dentry
flag CEPH_D_COMPLETE that is safely updated by ->d_prune().
Signed-off-by: NSage Weil <sage@newdream.net>

c6ffe100

04 11月, 2011 1 次提交

ceph: clear parent D_COMPLETE flag when on dentry prune · b58dc410

由 Sage Weil 提交于 3月 15, 2011

When the VFS prunes a dentry from the cache, clear the D_COMPLETE flag
on the parent dentry. Do this for the live and snapshotted namespaces. Do
not bother for the .snap dir contents, since we do not cache that.
Signed-off-by: NSage Weil <sage@newdream.net>

b58dc410

26 10月, 2011 2 次提交
- A
  ceph: fix 32-bit ino numbers · 3310f754
  由 Amon Ott 提交于 10月 20, 2011
```
Fix 32-bit ino generation to not always be 1.
Signed-off-by: NAmon Ott <a.ott@m-privacy.de>
```
  3310f754
- S
  ceph: rename rsize -> rasize · 83817e35
  由 Sage Weil 提交于 8月 04, 2011
```
It controls readahead.
Signed-off-by: NSage Weil <sage@newdream.net>
```
  83817e35
27 7月, 2011 6 次提交

ceph: avoid d_parent in ceph_dentry_hash; fix ceph_encode_fh() hashing bug · e5f86dc3

由 Sage Weil 提交于 7月 26, 2011

Have caller pass in a safely-obtained reference to the parent directory
for calculating a dentry's hash valud.

While we're here, simpify the flow through ceph_encode_fh() so that there
is a single exit point and cleanup.

Also fix a bug with the dentry hash calculation: calculate the hash for the
dentry we were given, not its parent.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

e5f86dc3

ceph: protect access to d_parent · 5f21c96d

由 Sage Weil 提交于 7月 26, 2011

d_parent is protected by d_lock: use it when looking up a dentry's parent
directory inode.  Also take a reference and drop it in the caller to avoid
a use-after-free.
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

5f21c96d

ceph: fix ceph_lookup_open intent usage · 468640e3

由 Sage Weil 提交于 7月 26, 2011

We weren't properly calling lookup_instantiate_filp when setting up the
lookup intent, which could lead to file leakage on errors.  So:

 - use separate helper for the hidden snapdir translation, immediately
   following the mds request
 - use ceph_finish_lookup for the final dentry/return value dance in the
   exit path
 - lookup_instantiate_filp on success
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

468640e3

ceph: use flag bit for at_end readdir flag · 9cfa1098

由 Sage Weil 提交于 7月 26, 2011

This saves us a word of memory per file.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

9cfa1098

ceph: add F_SYNC file flag to force sync (non-O_DIRECT) io · 4918b6d1

由 Sage Weil 提交于 7月 26, 2011

This allows us to force IO through the sync path which you normally only
get when multiple clients are reading/writing to the same file or by
mounting with -o sync.  Among other things, this lets test programs verify
correctness with a single mount.
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

4918b6d1

ceph: add flags field to file_info · 252c6728

由 Sage Weil 提交于 7月 26, 2011

Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

252c6728

21 7月, 2011 1 次提交

fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82

由 Josef Bacik 提交于 7月 16, 2011

Btrfs needs to be able to control how filemap_write_and_wait_range() is called
in fsync to make it less of a painful operation, so push down taking i_mutex and
the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
file systems can drop taking the i_mutex altogether it seems, like ext3 and
ocfs2. For correctness sake I just pushed everything down in all cases to make
sure that we keep the current behavior the same for everybody, and then each
individual fs maintainer can make up their mind about what to do from there.
Thanks,
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

02c24a82

20 7月, 2011 1 次提交
- A
  ->permission() sanitizing: don't pass flags to ->permission() · 10556cb2
  由 Al Viro 提交于 6月 20, 2011
```
not used by the instances anymore.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  10556cb2
12 5月, 2011 1 次提交

ceph: do not use i_wrbuffer_ref as refcount for Fb cap · d3d0720d

由 Henry C Chang 提交于 5月 11, 2011

We increments i_wrbuffer_ref when taking the Fb cap. This breaks
the dirty page accounting and causes looping in
__ceph_do_pending_vmtruncate, and ceph client hangs.

This bug can be reproduced occasionally by running blogbench.

Add a new field i_wb_ref to inode and dedicate it to Fb reference
counting.
Signed-off-by: NHenry C Chang <henry.cy.chang@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

d3d0720d

05 5月, 2011 1 次提交

ceph: do not call __mark_dirty_inode under i_lock · fca65b4a

由 Sage Weil 提交于 5月 04, 2011

The __mark_dirty_inode helper now takes i_lock as of 250df6ed. Fix the
one ceph callers that held i_lock (__ceph_mark_dirty_caps) to return the
flags value so that the callers can do it outside of i_lock.
Signed-off-by: NSage Weil <sage@newdream.net>

fca65b4a

22 3月, 2011 2 次提交

S
ceph: move readahead default to fs/ceph from libceph · 80456f86
由 Sage Weil 提交于 3月 10, 2011
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
80456f86

ceph: add ino32 mount option · ad1fee96

由 Yehuda Sadeh 提交于 1月 21, 2011

The ino32 mount option forces the ceph fs to report 32 bit
ino values.  This is useful for 64 bit kernels with 32 bit userspace.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

ad1fee96

04 3月, 2011 1 次提交

Revert "ceph: keep reference to parent inode on ceph_dentry" · 9bde178d

由 Sage Weil 提交于 2月 28, 2011

This reverts commit 97d79b40.

This fails to account for d_parent changes due to rename or disconnected
dentries due to submounts or NFS reexports.
Signed-off-by: NSage Weil <sage@newdream.net>

9bde178d

20 2月, 2011 1 次提交

ceph: keep reference to parent inode on ceph_dentry · 97d79b40

由 Yehuda Sadeh 提交于 1月 18, 2011

When creating a new dentry we now hold a reference to the parent
inode in the ceph_dentry.  This is required due to the new RCU
changes from 949854d0, which set dentry->d_parent to NULL in d_kill before
calling the ->release() callback.  If/when that behavior is changed, we can
revert this hack.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

97d79b40

13 1月, 2011 1 次提交

ceph: add dir_layout to inode · 6c0f3af7

由 Sage Weil 提交于 11月 16, 2010

Add a ceph_dir_layout to the inode, and calculate dentry hash values based
on the parent directory's specified dir_hash function. This is needed
because the old default Linux dcache hash function is extremely week and
leads to a poor distribution of files among dir fragments.
Signed-off-by: NSage Weil <sage@newdream.net>

6c0f3af7

07 1月, 2011 1 次提交
- N
  fs: provide rcu-walk aware permission i_ops · b74c79e9
  由 Nick Piggin 提交于 1月 07, 2011
```
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
```
  b74c79e9

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功