提交 · e4339d28f640a7c0d92903bcf389a2dfa281270d · openeuler / raspberrypi-kernel

15 10月, 2014 6 次提交

libceph: reference counting pagelist · e4339d28

由 Yan, Zheng 提交于 9月 16, 2014

this allow pagelist to present data that may be sent multiple times.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

e4339d28

ceph: send client metadata to MDS · dbd0c8bf

由 John Spray 提交于 9月 09, 2014

Implement version 2 of CEPH_MSG_CLIENT_SESSION syntax,
which includes additional client metadata to allow
the MDS to report on clients by user-sensible names
like hostname.
Signed-off-by: NJohn Spray <john.spray@redhat.com>
Reviewed-by: NYan, Zheng <zyan@redhat.com>

dbd0c8bf

ceph: move ceph_find_inode() outside the s_mutex · 6cd3bcad

由 Yan, Zheng 提交于 9月 17, 2014

ceph_find_inode() may wait on freeing inode, using it inside the s_mutex
may cause deadlock. (the freeing inode is waiting for OSD read reply, but
dispatch thread is blocked by the s_mutex)
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

6cd3bcad

ceph: make sure request isn't in any waiting list when kicking request. · 03974e81

由 Yan, Zheng 提交于 9月 11, 2014

we may corrupt waiting list if a request in the waiting list is kicked.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>

03974e81

Y
ceph: protect kick_requests() with mdsc->mutex · 656e4382
由 Yan, Zheng 提交于 9月 11, 2014
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Reviewed-by: NSage Weil <sage@redhat.com>
```
656e4382

ceph: trim unused inodes before reconnecting to recovering MDS · 5d23371f

由 Yan, Zheng 提交于 9月 10, 2014

So the recovering MDS does not need to fetch these ununsed inodes during
cache rejoin. This may reduce MDS recovery time.
Signed-off-by: NYan, Zheng <zyan@redhat.com>

5d23371f

07 8月, 2014 1 次提交

ceph: fix kick_requests() · 282c1052

由 Yan, Zheng 提交于 7月 30, 2014

__do_request() may unregister the request. So we should update
iterator 'p' before calling __do_request()
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

282c1052

14 7月, 2014 1 次提交

ceph: reset r_resend_mds after receiving -ESTALE · 51da8e8c

由 Yan, Zheng 提交于 7月 14, 2014

this makes __choose_mds() choose mds according caps
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

51da8e8c

08 7月, 2014 1 次提交
- Y
  ceph: include time stamp in replayed MDS requests · c5c9a0bf
  由 Yan, Zheng 提交于 7月 01, 2014
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
```
  c5c9a0bf
07 6月, 2014 1 次提交

fs/ceph: replace pr_warning by pr_warn · f3ae1b97

由 Fabian Frederick 提交于 6月 06, 2014

Update the last pr_warning callsites in fs branch
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Cc: Sage Weil <sage@inktank.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3ae1b97

06 6月, 2014 1 次提交

ceph: include time stamp in every MDS request · b8e69066

由 Sage Weil 提交于 5月 21, 2014

We recently modified the client/MDS protocol to include a timestamp in the
client request. This allows ctime updates to follow the client's clock
in most cases, which avoids subtle problems when clocks are out of sync
and timestamps are updated sometimes by the MDS clock (for most requests)
and sometimes by the client clock (for cap writeback).
Signed-off-by: NSage Weil <sage@inktank.com>

b8e69066

05 4月, 2014 3 次提交

Y
ceph: flush cap release queue when trimming session caps · a56371d9
由 Yan, Zheng 提交于 4月 01, 2014
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
```
a56371d9

ceph: preallocate buffer for readdir reply · 54008399

由 Yan, Zheng 提交于 3月 29, 2014

Preallocate buffer for readdir reply. Limit number of entries in
readdir reply according to the buffer size.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

54008399

ceph: fix null pointer dereference in discard_cap_releases() · 00bd8edb

由 Yan, Zheng 提交于 3月 24, 2014

send_mds_reconnect() may call discard_cap_releases() after all
release messages have been dropped by cleanup_cap_releases()
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

00bd8edb

03 4月, 2014 1 次提交

ceph: do not assume r_old_dentry[_dir] always set together · 844d87c3

由 Sage Weil 提交于 2月 05, 2013

Do not assume that r_old_dentry implies that r_old_dentry_dir is also
true.  Separate out the ref cleanup and make the debugs dump behave when
it is NULL.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NYan, Zheng <zheng.z.yan@intel.com>

844d87c3

21 1月, 2014 4 次提交

Y
ceph: add open export target session helper · 5d72d13c
由 Yan, Zheng 提交于 11月 24, 2013
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
```
5d72d13c
Y
ceph: handle session flush message · 186e4f7a
由 Yan, Zheng 提交于 11月 22, 2013
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
```
186e4f7a

ceph: handle -ESTALE reply · ca18bede

由 Yan, Zheng 提交于 11月 22, 2013

Send requests that operate on path to directory's auth MDS if
mode == USE_AUTH_MDS. Always retry using the auth MDS if got
-ESTALE reply from non-auth MDS. Also clean up the code that
handles auth MDS change.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

ca18bede

ceph: fix trim caps · 979abfdd

由 Yan, Zheng 提交于 11月 22, 2013

- don't trim auth cap if there are flusing caps
- don't trim auth cap if any 'write' cap is wanted
- allow trimming non-auth cap even if the inode is dirty
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

979abfdd

01 1月, 2014 1 次提交

libceph: all features fields must be u64 · 12b4629a

由 Ilya Dryomov 提交于 12月 24, 2013

In preparation for ceph_features.h update, change all features fields
from unsigned int/u32 to u64.  (ceph.git has ~40 feature bits at this
point.)
Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

12b4629a

24 11月, 2013 5 次提交

ceph: wake up 'safe' waiters when unregistering request · fc55d2c9

由 Yan, Zheng 提交于 10月 31, 2013

We also need to wake up 'safe' waiters if error occurs or request
aborted. Otherwise sync(2)/fsync(2) may hang forever.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NSage Weil <sage@inktank.com>

fc55d2c9

ceph: cleanup aborted requests when re-sending requests. · eb1b8af3

由 Yan, Zheng 提交于 9月 26, 2013

Aborted requests usually get cleared when the reply is received.
If MDS crashes, no reply will be received. So we need to cleanup
aborted requests when re-sending requests.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NGreg Farnum <greg@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

eb1b8af3

ceph: handle race between cap reconnect and cap release · 99a9c273

由 Yan, Zheng 提交于 9月 22, 2013

When a cap get released while composing the cap reconnect message.
We should skip queuing the release message if the cap hasn't been
added to the cap reconnect message.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

99a9c273

ceph: set caps count after composing cap reconnect message · 44c99757

由 Yan, Zheng 提交于 9月 22, 2013

It's possible that some caps get released while composing the cap
reconnect message.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

44c99757

ceph: queue cap release in __ceph_remove_cap() · a096b09a

由 Yan, Zheng 提交于 9月 22, 2013

call __queue_cap_release() in __ceph_remove_cap(), this avoids
acquiring s_cap_lock twice.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

a096b09a

01 10月, 2013 1 次提交

ceph: handle frag mismatch between readdir request and reply · 81c6aea5

由 Yan, Zheng 提交于 9月 18, 2013

If client has outdated directory fragments information, it may request
readdir an non-existent directory fragment. In this case, the MDS finds
an approximate directory fragment and sends its contents back to the
client. When receiving a reply with fragment that is different than the
requested one, the client need to reset the 'readdir offset'.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>

81c6aea5

07 9月, 2013 1 次提交

ceph: remove ceph_lookup_inode() · ed284c49

由 Yan, Zheng 提交于 9月 02, 2013

commit 6f60f889 (ceph: fix freeing inode vs removing session caps race)
introduced ceph_lookup_inode(). But there is already a ceph_find_inode()
which provides similar function. So remove ceph_lookup_inode(), use
ceph_find_inode() instead.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NAlex Elder <alex.elder@linary.org>
Reviewed-by: NSage Weil <sage@inktank.com>

ed284c49

10 8月, 2013 2 次提交

ceph: fix freeing inode vs removing session caps race · 6f60f889

由 Yan, Zheng 提交于 7月 24, 2013

remove_session_caps() uses iterate_session_caps() to remove caps,
but iterate_session_caps() skips inodes that are being deleted.
So session->s_nr_caps can be non-zero after iterate_session_caps()
return.

We can fix the issue by waiting until deletions are complete.
__wait_on_freeing_inode() is designed for the job, but it is not
exported, so we use lookup inode function to access it.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

6f60f889

ceph: fix null pointer dereference · c338c07c

由 Nathaniel Yazdani 提交于 8月 04, 2013

When register_session() is given an out-of-range argument for mds,
ceph_mdsmap_get_addr() will return a null pointer, which would be given to
ceph_con_open() & be dereferenced, causing a kernel oops. This fixes bug #4685
in the Ceph bug tracker <http://tracker.ceph.com/issues/4685>.
Signed-off-by: NNathaniel Yazdani <n1ght.4nd.d4y@gmail.com>
Reviewed-by: NSage Weil <sage@inktank.com>

c338c07c

05 7月, 2013 1 次提交
- A
  helper for reading ->d_count · 84d08fa8
  由 Al Viro 提交于 7月 05, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  84d08fa8
04 7月, 2013 3 次提交
- M
  ceph: Free mdsc if alloc mdsc->mdsmap failed. · fb3101b6
  由 majianpeng 提交于 6月 25, 2013
```
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
Reviewed-by: NSage Weil <sage@inktank.com>
```
  fb3101b6
- Y
  ceph: clear migrate seq when MDS restarts · 667ca05c
  由 Yan, Zheng 提交于 5月 31, 2013
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>
```
  667ca05c
- Y
  ceph: reset iov_len when discarding cap release messages · 3803da49
  由 Yan, Zheng 提交于 5月 31, 2013
```
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: NSage Weil <sage@inktank.com>
```
  3803da49
29 6月, 2013 1 次提交

locks: protect most of the file_lock handling with i_lock · 1c8c601a

由 Jeff Layton 提交于 6月 21, 2013

Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.

->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the  blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1c8c601a

18 5月, 2013 2 次提交

ceph: ceph_pagelist_append might sleep while atomic · 39be95e9

由 Jim Schutt 提交于 5月 15, 2013

Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc()
while holding a lock, but it's spoiled because ceph_pagelist_addpage()
always calls kmap(), which might sleep.  Here's the result:

[13439.295457] ceph: mds0 reconnect start
[13439.300572] BUG: sleeping function called from invalid context at include/linux/highmem.h:58
[13439.309243] in_atomic(): 1, irqs_disabled(): 0, pid: 12059, name: kworker/1:1
    . . .
[13439.376225] Call Trace:
[13439.378757]  [<ffffffff81076f4c>] __might_sleep+0xfc/0x110
[13439.384353]  [<ffffffffa03f4ce0>] ceph_pagelist_append+0x120/0x1b0 [libceph]
[13439.391491]  [<ffffffffa0448fe9>] ceph_encode_locks+0x89/0x190 [ceph]
[13439.398035]  [<ffffffff814ee849>] ? _raw_spin_lock+0x49/0x50
[13439.403775]  [<ffffffff811cadf5>] ? lock_flocks+0x15/0x20
[13439.409277]  [<ffffffffa045e2af>] encode_caps_cb+0x41f/0x4a0 [ceph]
[13439.415622]  [<ffffffff81196748>] ? igrab+0x28/0x70
[13439.420610]  [<ffffffffa045e9f8>] ? iterate_session_caps+0xe8/0x250 [ceph]
[13439.427584]  [<ffffffffa045ea25>] iterate_session_caps+0x115/0x250 [ceph]
[13439.434499]  [<ffffffffa045de90>] ? set_request_path_attr+0x2d0/0x2d0 [ceph]
[13439.441646]  [<ffffffffa0462888>] send_mds_reconnect+0x238/0x450 [ceph]
[13439.448363]  [<ffffffffa0464542>] ? ceph_mdsmap_decode+0x5e2/0x770 [ceph]
[13439.455250]  [<ffffffffa0462e42>] check_new_map+0x352/0x500 [ceph]
[13439.461534]  [<ffffffffa04631ad>] ceph_mdsc_handle_map+0x1bd/0x260 [ceph]
[13439.468432]  [<ffffffff814ebc7e>] ? mutex_unlock+0xe/0x10
[13439.473934]  [<ffffffffa043c612>] extra_mon_dispatch+0x22/0x30 [ceph]
[13439.480464]  [<ffffffffa03f6c2c>] dispatch+0xbc/0x110 [libceph]
[13439.486492]  [<ffffffffa03eec3d>] process_message+0x1ad/0x1d0 [libceph]
[13439.493190]  [<ffffffffa03f1498>] ? read_partial_message+0x3e8/0x520 [libceph]
    . . .
[13439.587132] ceph: mds0 reconnect success
[13490.720032] ceph: mds0 caps stale
[13501.235257] ceph: mds0 recovery completed
[13501.300419] ceph: mds0 caps renewed

Fix it up by encoding locks into a buffer first, and when the number
of encoded locks is stable, copy that into a ceph_pagelist.

[elder@inktank.com: abbreviated the stack info a bit.]

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Reviewed-by: NAlex Elder <elder@inktank.com>

39be95e9

ceph: add cpu_to_le32() calls when encoding a reconnect capability · c420276a

由 Jim Schutt 提交于 5月 15, 2013

In his review, Alex Elder mentioned that he hadn't checked that
num_fcntl_locks and num_flock_locks were properly decoded on the
server side, from a le32 over-the-wire type to a cpu type.
I checked, and AFAICS it is done; those interested can consult
    Locker::_do_cap_update()
in src/mds/Locker.cc and src/include/encoding.h in the Ceph server
code (git://github.com/ceph/ceph).

I also checked the server side for flock_len decoding, and I believe
that also happens correctly, by virtue of having been declared
__le32 in struct ceph_mds_cap_reconnect, in src/include/ceph_fs.h.

Cc: stable@vger.kernel.org # 3.4+
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Reviewed-by: NAlex Elder <elder@inktank.com>

c420276a

02 5月, 2013 4 次提交

libceph: add, don't set data for a message · 90af3602

由 Alex Elder 提交于 4月 05, 2013

Change the names of the functions that put data on a pagelist to
reflect that we're adding to whatever's already there rather than
just setting it to the one thing.  Currently only one data item is
ever added to a message, but that's about to change.

This resolves:
    http://tracker.ceph.com/issues/2770Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NJosh Durgin <josh.durgin@inktank.com>

90af3602

libceph: wrap auth ops in wrapper functions · 27859f97

由 Sage Weil 提交于 3月 25, 2013

Use wrapper functions that check whether the auth op exists so that callers
do not need a bunch of conditional checks.  Simplifies the external
interface.
Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

27859f97

libceph: add update_authorizer auth method · 0bed9b5c

由 Sage Weil 提交于 3月 25, 2013

Currently the messenger calls out to a get_authorizer con op, which will
create a new authorizer if it doesn't yet have one.  In the meantime, when
we rotate our service keys, the authorizer doesn't get updated.  Eventually
it will be rejected by the server on a new connection attempt and get
invalidated, and we will then rebuild a new authorizer, but this is not
ideal.

Instead, if we do have an authorizer, call a new update_authorizer op that
will verify that the current authorizer is using the latest secret.  If it
is not, we will build a new one that does.  This avoids the transient
failure.

This fixes one of the sorry sequence of events for bug

	http://tracker.ceph.com/issues/4282Signed-off-by: NSage Weil <sage@inktank.com>
Reviewed-by: NAlex Elder <elder@inktank.com>

0bed9b5c

ceph: use i_release_count to indicate dir's completeness · 2f276c51

由 Yan, Zheng 提交于 3月 13, 2013

Current ceph code tracks directory's completeness in two places.
ceph_readdir() checks i_release_count to decide if it can set the
I_COMPLETE flag in i_ceph_flags. All other places check the I_COMPLETE
flag. This indirection introduces locking complexity.

This patch adds a new variable i_complete_count to ceph_inode_info.
Set i_release_count's value to it when marking a directory complete.
By comparing the two variables, we know if a directory is complete
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>

2f276c51