提交 · 7c1332b8cb5b27656cf6ab1f5fe808a8eb8bb2c0 · openeuler / raspberrypi-kernel

18 2月, 2010 2 次提交

ceph: fix iterate_caps removal race · 7c1332b8

由 Sage Weil 提交于 2月 16, 2010

We need to be able to iterate over all caps on a session with a
possibly slow callback on each cap.  To allow this, we used to
prevent cap reordering while we were iterating.  However, we were
not safe from races with removal: removing the 'next' cap would
make the next pointer from list_for_each_entry_safe be invalid,
and cause a lock up or similar badness.

Instead, we keep an iterator pointer in the session pointing to
the current cap.  As before, we avoid reordering.  For removal,
if the cap isn't the current cap we are iterating over, we are
fine.  If it is, we clear cap->ci (to mark the cap as pending
removal) but leave it in the session list.  In iterate_caps, we
can safely finish removal and get the next cap pointer.

While we're at it, clean up put_cap to not take a cap reservation
context, as it was never used.
Signed-off-by: NSage Weil <sage@newdream.net>

7c1332b8

ceph: clean up readdir caps reservation · 85ccce43

由 Sage Weil 提交于 2月 17, 2010

Use a global counter for the minimum number of allocated caps instead of
hard coding a check against readdir_max.  This takes into account multiple
client instances, and avoids examining the superblock mount options when a
cap is dropped.
Signed-off-by: NSage Weil <sage@newdream.net>

85ccce43

17 2月, 2010 6 次提交

ceph: fix authentication races, auth_none oops · 5ce6e9db

由 Sage Weil 提交于 2月 15, 2010

Call __validate_auth() under monc->mutex, and use helper for
initial hello so that the pending_auth flag is set.  This fixes
possible races in which we have an authentication request (hello
or otherwise) pending and send another one.  In particular, with
auth_none, we _never_ want to call ceph_build_auth() from
__validate_auth(), since the ->build_request() method is NULL.
Signed-off-by: NSage Weil <sage@newdream.net>

5ce6e9db

ceph: use rbtree for mon statfs requests · 85ff03f6

由 Sage Weil 提交于 2月 15, 2010

An rbtree is lighter weight, particularly given we will generally have
very few in-flight statfs requests.
Signed-off-by: NSage Weil <sage@newdream.net>

85ff03f6

ceph: use rbtree for snap_realms · a105f00c

由 Sage Weil 提交于 2月 15, 2010

Switch from radix tree to rbtree for snap realms.  This is much more
appropriate given that realm keys are few and far between.
Signed-off-by: NSage Weil <sage@newdream.net>

a105f00c

ceph: use rbtree for mds requests · 44ca18f2

由 Sage Weil 提交于 2月 15, 2010

The rbtree is a more appropriate data structure than a radix_tree.  It
avoids extra memory usage and simplifies the code.

It also fixes a bug where the debugfs 'mdsc' file wasn't including the
most recent mds request.
Signed-off-by: NSage Weil <sage@newdream.net>

44ca18f2

ceph: cancel delayed work when closing connection · 91e45ce3

由 Sage Weil 提交于 2月 15, 2010

This ensures that if/when we reopen the connection, we can requeue work on
the connection immediately, without waiting for an old timer to expire.
Queue new delayed work inside con->mutex to avoid any race.

This fixes problems with clients failing to reconnect to the MDS due to
the client_reconnect message arriving too late (due to waiting for an old
delayed work timeout to expire).
Signed-off-by: NSage Weil <sage@newdream.net>

91e45ce3

ceph: allow connection to be reopened by fault callback · e2663ab6

由 Sage Weil 提交于 2月 16, 2010

Fix the messenger to allow a ceph_con_open() during the fault callback.
Previously the work wasn't getting queued on the connection because the
fault path avoids requeued work (normally spurious).  Loop on reopening by
checking for the OPENING state bit.

This fixes OSD reconnects when a TCP connection drops.
Signed-off-by: NSage Weil <sage@newdream.net>

e2663ab6

16 2月, 2010 1 次提交

ceph: reset osd connections after fault · 153a008b

由 Sage Weil 提交于 2月 15, 2010

A single osd connection fault (e.g. tcp disconnect) wasn't
reopening the connection, which causes all current and future
requests for that osd to hang.
Signed-off-by: NSage Weil <sage@newdream.net>

153a008b

14 2月, 2010 1 次提交

ceph: fix msgr to keep sent messages until acked · 6c5d1a49

由 Sage Weil 提交于 2月 13, 2010

The test was backwards from commit b3d1dbbd: keep the message if the
connection _isn't_ lossy.  This allows the client to continue when the
TCP connection drops for some reason (network glitch) but both ends
survive.
Signed-off-by: NSage Weil <sage@newdream.net>

6c5d1a49

12 2月, 2010 14 次提交

ceph: remove bogus invalidate_mapping_pages · 80310491

由 Sage Weil 提交于 2月 09, 2010

We were invalidating mapping pages when dropping FILE_CACHE in
__send_cap().  But ceph_check_caps attempts to invalidate already, and
also checks for success, so we should never get to this point.
Signed-off-by: NSage Weil <sage@newdream.net>

80310491

ceph: invalidate pages even if truncate is pending · 0840d8af

由 Sage Weil 提交于 2月 09, 2010

There is no reason not to invalidate pages when a truncate is pending.
Both throw out page cache pages.
Signed-off-by: NSage Weil <sage@newdream.net>

0840d8af

ceph: cleanup async writeback, truncation, invalidate helpers · 3c6f6b79

由 Sage Weil 提交于 2月 09, 2010

Grab inode ref in helper.  Make work functions static, with consistent
naming.
Signed-off-by: NSage Weil <sage@newdream.net>

3c6f6b79

ceph: fix sync read eof check deadlock · 6a026589

由 Sage Weil 提交于 2月 09, 2010

If a sync read gets a short result from the OSD, it may need to do a
getattr to see if it is short due to reaching end-of-file. The getattr
was being done while holding a reference to FILE_RD, which can lead to
a deadlock if the MDS is revoking that capability bit and can't process
the getattr until it does.

We fix this by setting a flag if EOF size validation is needed, and doing
the getattr in ceph_aio_read, after the RD cap ref is dropped. If the
read needs to be continued, we loop and continue traversing the file.
Signed-off-by: NSage Weil <sage@newdream.net>

6a026589

ceph: do not retain caps that are being revoked · 68c28323

由 Sage Weil 提交于 2月 09, 2010

Never retain caps in __send_cap() that are being revoked.
Signed-off-by: NSage Weil <sage@newdream.net>

68c28323

ceph: cap revocation fixes · cbd03635

由 Sage Weil 提交于 2月 09, 2010

Try to invalidate pages in ceph_check_caps() if FILE_CACHE is being
revoked. If we fail, queue an immediate async invalidate if FILE_CACHE
is being revoked. (If it's not being revoked, we just queue the caps
for later evaluation later, as per the old behavior.)
Signed-off-by: NSage Weil <sage@newdream.net>

cbd03635

ceph: sync read/write considers page cache · 29065a51

由 Yehuda Sadeh 提交于 2月 09, 2010

In the cases where we either do a sync read or a write, we
need to make sure that everything in the page cache is flushed.
In the case of a sync write we invalidate the relevant pages,
so that subsequent read/write reflects the new data written.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

29065a51

ceph: fix truncation when not holding caps · 3d497d85

由 Yehuda Sadeh 提交于 2月 09, 2010

A truncation should occur when either we have the
specified caps for the file, or (in cases where we are
not the only ones referencing the file) when it is mapped
or when it is opened. The latter two cases were not
handled.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

3d497d85

ceph: refactor ceph_write_begin, fix ceph_page_mkwrite · 4af6b225

由 Yehuda Sadeh 提交于 2月 09, 2010

Originally ceph_page_mkwrite called ceph_write_begin, hoping that
the returned locked page would be the page that it was requested
to mkwrite. Factored out relevant part of ceph_page_mkwrite and
we lock the right page anyway.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

4af6b225

ceph: fix short synchronous reads · 972f0d3a

由 Yehuda Sadeh 提交于 2月 04, 2010

Zeroing of holes was not done correctly: page_off was miscalculated and
zeroing the tail didn't not adjust the 'read' value to include the zeroed
portion.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

972f0d3a

S
ceph: add uid field to ceph_pg_pool · 02f90c61
由 Sage Weil 提交于 2月 04, 2010
```
Also verify encoding version as we go.
Signed-off-by: NSage Weil <sage@newdream.net>
```
02f90c61

ceph: put unused osd connections on lru · f5a2041b

由 Yehuda Sadeh 提交于 2月 03, 2010

Instead of removing osd connection immediately when the
requests list is empty, put the osd connection on an lru.
Only if that osd has not been used for more than a specified
time, will it be removed.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

f5a2041b

ceph: remove unused variable · b056c876

由 Yehuda Sadeh 提交于 2月 03, 2010

Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

b056c876

ceph: add support for auth_x authentication protocol · ec0994e4

由 Sage Weil 提交于 2月 02, 2010

The auth_x protocol implements support for a kerberos-like mutual
authentication infrastructure used by Ceph.  We do not simply use vanilla
kerberos because of scalability and performance issues when dealing with
a large cluster of nodes providing a single logical service.

Auth_x provides mutual authentication of client and server and protects
against replay and man in the middle attacks.  It does not encrypt
the full session over the wire, however, so data payload may still be
snooped.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

ec0994e4

11 2月, 2010 4 次提交

ceph: add struct version to auth encoding · 07c8739c

由 Sage Weil 提交于 2月 04, 2010

Inlucde struct version in encoding. This will streamline future protocol
changes.
Signed-off-by: NSage Weil <sage@newdream.net>

07c8739c

ceph: allow renewal of auth credentials · 9bd2e6f8

由 Sage Weil 提交于 2月 02, 2010

Add infrastructure to allow the mon_client to periodically renew its auth
credentials.  Also add a messenger callback that will force such a renewal
if a peer rejects our authenticator.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

9bd2e6f8

ceph: aes crypto and base64 encode/decode helpers · 8b6e4f2d

由 Sage Weil 提交于 2月 02, 2010

Helpers to encrypt/decrypt AES and base64.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

8b6e4f2d

ceph: buffer decoding helpers · c7e337d6

由 Sage Weil 提交于 2月 02, 2010

Helper for decoding into a ceph_buffer, and other misc decoding helpers
we will need.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

c7e337d6

03 2月, 2010 2 次提交

ceph: release all pages after successful osd write response · 79788c69

由 Sage Weil 提交于 2月 02, 2010

We release all the pages, even if the osd response was
different than the number of pages written. This could only
happen due to truncation that arrives the osd in
different order, for which we want the pages released anyway.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

79788c69

ceph: always send truncation info with read and write osd ops · 0c948992

由 Yehuda Sadeh 提交于 2月 01, 2010

This fixes a bug where the read/write ops arrive the osd after
a following truncation request.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

0c948992

30 1月, 2010 2 次提交

ceph: remove unreachable code · 0f26c4b2

由 Yehuda Sadeh 提交于 1月 29, 2010

We never truncate to a smaller size without contacting the MDS.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

0f26c4b2

ceph: include type in ceph_entity_addr, filepath · ac8839d7

由 Sage Weil 提交于 1月 27, 2010

Include a type/version in ceph_entity_addr and filepath.  Include extra
byte in filepath encoding as necessary.
Signed-off-by: NSage Weil <sage@newdream.net>

ac8839d7

26 1月, 2010 8 次提交

S
ceph: precede encoded ceph_pg_pool struct with version · 361be860
由 Sage Weil 提交于 1月 25, 2010
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
361be860

ceph: keep reserved replies on the request structure · 0d59ab81

由 Yehuda Sadeh 提交于 1月 13, 2010

This includes treating all the data preallocation and revokation
at the same place, not having to have a special case for
the reserved pages.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

0d59ab81

ceph: alloc message data pages and check if tid exists · 0547a9b3

由 Yehuda Sadeh 提交于 1月 11, 2010

Now doing it in the same callback that is also responsible for
allocating the 'front' part of the message. If we get a message
that we haven't got a corresponding tid for, mark it for skipping.

Moving the mutex unlock/lock from the osd alloc_msg callback
to the calling function in the messenger.
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

0547a9b3

Y
ceph: refactor messages data section allocation · 9d7f0f13
由 Yehuda Sadeh 提交于 1月 11, 2010
```
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
```
9d7f0f13

ceph: allocate middle of message before stating to read · 2450418c

由 Yehuda Sadeh 提交于 1月 08, 2010

Both front and middle parts of the message are now being
allocated at the ceph_alloc_msg().
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>

2450418c

ceph: properly handle aborted mds requests · 5b1daecd

由 Sage Weil 提交于 1月 25, 2010

Previously, if the MDS request was interrupted, we would unregister the
request and ignore any reply. This could cause the caps or other cache
state to become out of sync. (For instance, aborting dbench and doing
rm -r on clients would complain about a non-empty directory because the
client didn't realize it's aborted file create request completed.)

Even we don't unregister, we still can't process the reply normally because
we are no longer holding the caller's locks (like the dir i_mutex).

So, mark aborted operations with r_aborted, and in the reply handler, be
sure to process all the caps. Do not process the namespace changes,
though, since we no longer will hold the dir i_mutex. The dentry lease
state can also be ignored as it's more forgiving.
Signed-off-by: NSage Weil <sage@newdream.net>

5b1daecd

ceph: mark MDS CREATE as a write op · 3ea25f94

由 Sage Weil 提交于 1月 25, 2010

CEPH_MDS_OP_CREATE was not correctly marked as a write operation.
Signed-off-by: NSage Weil <sage@newdream.net>

3ea25f94

ceph: remove duplicate variable initialization · ec7384ec

由 Julia Lawall 提交于 1月 20, 2010

The variable client is initialized twice to the same (side effect-free)
expression.  Drop one initialization.

A simplified version of the semantic match that finds this problem is:
(http://coccinelle.lip6.fr/)

// <smpl>
@forall@
idexpression *x;
identifier f!=ERR_PTR;
@@

x = f(...)
... when != x
(
x = f(...,<+...x...+>,...)
|
* x = f(...)
)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NSage Weil <sage@newdream.net>

ec7384ec