- 05 3月, 2010 1 次提交
-
-
由 Yehuda Sadeh 提交于
This simplifies the process of timing out messages. We keep lru of current messages that are in flight. If a timeout has passed, we reset the osd connection, so that messages will be retransmitted. This is a failsafe in case we hit some sort of problem sending out message to the OSD. Normally, we'll get notification via an updated osdmap if there are problems. If a request is older than the keepalive timeout, send a keepalive to ensure we detect any breaks in the TCP connection. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 02 3月, 2010 9 次提交
-
-
由 Sage Weil 提交于
The flush_dirty_caps() used to loop over the first entry of the cap_dirty dirty list on the assumption that after calling ceph_check_caps() it would be removed from the list. This isn't true for caps that are being migrated between MDSs, where we've received the EXPORT but not the IMPORT. Instead, do a safe list iteration, and pin the next inode on the list via the CEPH_I_NOFLUSH flag. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We should include caps that are mid-migration (we've received the EXPORT, but not the IMPORT) in the issued caps set. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Add missing pointer dereference (p is a void **). Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Verify the file is actually open for the given caps when we are waiting for caps. This ensures we will wake up and return EBADF if another thread closes the file out from under us. Note that EBADF is also the correct return code from write(2) when called on a file handle opened for reading (although the vfs should catch that). Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We didn't set the front length correctly. When messages used the message pool we ended up with the conservative max (4 KB), and the rest of the time the slightly less conservative estimate. Even though the OSD ignores the extra data, set it to the right value to avoid sending extra data over the network. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Reset msg front len when a message is returned to the pool: the caller may have changed it. BUG if we try to send a message with a hdr.front_len that doesn't match the front iov. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
This was simply broken. Apparently at some point we thought about putting the snaptrace in the middle section, but didn't. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Use a single ceph_msg for the osd reply, even when we are getting multiple replies. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Clear LOSSYTX bit, so that if/when we reconnect, said reconnect will retry on failure. Clear _PENDING bits too, to avoid polluting subsequent connection state. Drop unused REGISTERED bit. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 27 2月, 2010 2 次提交
-
-
由 Sage Weil 提交于
The must_resend flag is always true, not false. In any case, we can just ignore it anyway. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We used to try to avoid freeing and then reallocating the osd struct. This is a bit fragile due to potential interactions with other references (beyond o_requests), and may be the cause of this crash: [120633.442358] BUG: unable to handle kernel NULL pointer dereference at (null) [120633.443292] IP: [<ffffffff812549b6>] rb_erase+0x11d/0x277 [120633.443292] PGD f7ff3067 PUD f7f53067 PMD 0 [120633.443292] Oops: 0000 [#1] PREEMPT SMP [120633.443292] last sysfs file: /sys/kernel/uevent_seqnum [120633.443292] CPU 1 [120633.443292] Modules linked in: ceph fan ac battery psmouse ehci_hcd ide_pci_generic ohci_hcd thermal processor button [120633.443292] Pid: 3023, comm: ceph-msgr/1 Not tainted 2.6.32-rc2 #12 H8SSL [120633.443292] RIP: 0010:[<ffffffff812549b6>] [<ffffffff812549b6>] rb_erase+0x11d/0x277 [120633.443292] RSP: 0018:ffff8800f7b13a50 EFLAGS: 00010246 [120633.443292] RAX: ffff880022907819 RBX: ffff880022907818 RCX: 0000000000000000 [120633.443292] RDX: ffff8800f7b13a80 RSI: ffff8800f587eb48 RDI: 0000000000000000 [120633.443292] RBP: ffff8800f7b13a60 R08: 0000000000000000 R09: 0000000000000004 [120633.443292] R10: 0000000000000000 R11: ffff8800c4441000 R12: ffff8800f587eb48 [120633.443292] R13: ffff8800f58eaa00 R14: ffff8800f413c000 R15: 0000000000000001 [120633.443292] FS: 00007fbef6e226e0(0000) GS:ffff880009200000(0000) knlGS:0000000000000000 [120633.443292] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [120633.443292] CR2: 0000000000000000 CR3: 00000000f7c53000 CR4: 00000000000006e0 [120633.443292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [120633.443292] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [120633.443292] Process ceph-msgr/1 (pid: 3023, threadinfo ffff8800f7b12000, task ffff8800f5858b40) [120633.443292] Stack: [120633.443292] ffff8800f413c000 ffff8800f587e9c0 ffff8800f7b13a80 ffffffffa0098a86 [120633.443292] <0> 00000000000006f1 0000000000000000 ffff8800f7b13af0 ffffffffa009959b [120633.443292] <0> ffff8800f413c000 ffff880022a68400 ffff880022a68400 ffff8800f587e9c0 [120633.443292] Call Trace: [120633.443292] [<ffffffffa0098a86>] __remove_osd+0x4d/0xbc [ceph] [120633.443292] [<ffffffffa009959b>] __map_osds+0x199/0x4fa [ceph] [120633.443292] [<ffffffffa00999f4>] ? __send_request+0xf8/0x186 [ceph] [120633.443292] [<ffffffffa0099beb>] kick_requests+0x169/0x3cb [ceph] [120633.443292] [<ffffffffa009a8c1>] ceph_osdc_handle_map+0x370/0x522 [ceph] Since we're probably screwed anyway if a small kmalloc is failing, don't bother with trying to be clever here. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 26 2月, 2010 2 次提交
-
-
由 Sage Weil 提交于
Move any out_sent messages to out_queue _before_ checking if out_queue is empty and going to STANDBY, or else we may drop something that was never acked. And clean up the code a bit (less goto). Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
This fixes lock ABBA inversion, as the ->invalidate_authorizer() op may need to take a lock (or even call back into the messenger). Signed-off-by: NSage Weil <sage@newdream.net>
-
- 24 2月, 2010 6 次提交
-
-
由 Yehuda Sadeh 提交于
Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The tid is in the message header, not body. Broken since 6df058c0. No need to look at next mds session; just mark the request and be done. (The old error path was broken too, but now it's gone.) Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Verify the mds session is currently registered before handling incoming messages. Clean up message handlers to pull mds out of session->s_mds instead of less trustworthy src field. Clean up con_{get,put} debug output. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The destroy_inode path needs no inode locks since there are no inode references. Update __ceph_remove_cap comment to reflect that it is called without cap->session->s_mutex in this case. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Alexander Beregalov 提交于
Signed-off-by: NAlexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Fix skipping of unexpected message types from osd, mon. Clean up pr_info and debug output. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 20 2月, 2010 4 次提交
-
-
由 Yehuda Sadeh 提交于
There is no state in local vars that requires us to loop after temporarily dropping i_lock. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Yehuda Sadeh 提交于
Instead of truncating the whole range of pages, we skip those pages that are dirty or in the middle of writeback. Those pages will be cleared later when the writeback completes. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Yehuda Sadeh 提交于
This page should have been removed earlier when the cache cap was revoked, but a writeback was in flight, so it was skipped. We truncate it here just as the writeback finishes, while it's still locked. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We need to know whether there was any page left behind, and not the return value (the total number of pages invalidated). Look at the mapping to see if we were successful or not. Move it all into a helper to simplify the two callers. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 18 2月, 2010 6 次提交
-
-
由 Sage Weil 提交于
Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Since we can now create and destroy pg pools, the pool ids will be sparse, and an array no longer makes sense for looking up by pool id. Use an rbtree instead. The OSDMap encoding also no longer has a max pool count (previously used to allocate the array). There is a new pool_max, that is the largest pool id we've ever used, although we don't actually need it in the client. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Also move _lookup_pg_mapping into a helper. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We need to be able to iterate over all caps on a session with a possibly slow callback on each cap. To allow this, we used to prevent cap reordering while we were iterating. However, we were not safe from races with removal: removing the 'next' cap would make the next pointer from list_for_each_entry_safe be invalid, and cause a lock up or similar badness. Instead, we keep an iterator pointer in the session pointing to the current cap. As before, we avoid reordering. For removal, if the cap isn't the current cap we are iterating over, we are fine. If it is, we clear cap->ci (to mark the cap as pending removal) but leave it in the session list. In iterate_caps, we can safely finish removal and get the next cap pointer. While we're at it, clean up put_cap to not take a cap reservation context, as it was never used. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Use a global counter for the minimum number of allocated caps instead of hard coding a check against readdir_max. This takes into account multiple client instances, and avoids examining the superblock mount options when a cap is dropped. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 17 2月, 2010 6 次提交
-
-
由 Sage Weil 提交于
Call __validate_auth() under monc->mutex, and use helper for initial hello so that the pending_auth flag is set. This fixes possible races in which we have an authentication request (hello or otherwise) pending and send another one. In particular, with auth_none, we _never_ want to call ceph_build_auth() from __validate_auth(), since the ->build_request() method is NULL. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
An rbtree is lighter weight, particularly given we will generally have very few in-flight statfs requests. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Switch from radix tree to rbtree for snap realms. This is much more appropriate given that realm keys are few and far between. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The rbtree is a more appropriate data structure than a radix_tree. It avoids extra memory usage and simplifies the code. It also fixes a bug where the debugfs 'mdsc' file wasn't including the most recent mds request. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
This ensures that if/when we reopen the connection, we can requeue work on the connection immediately, without waiting for an old timer to expire. Queue new delayed work inside con->mutex to avoid any race. This fixes problems with clients failing to reconnect to the MDS due to the client_reconnect message arriving too late (due to waiting for an old delayed work timeout to expire). Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Fix the messenger to allow a ceph_con_open() during the fault callback. Previously the work wasn't getting queued on the connection because the fault path avoids requeued work (normally spurious). Loop on reopening by checking for the OPENING state bit. This fixes OSD reconnects when a TCP connection drops. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 16 2月, 2010 1 次提交
-
-
由 Sage Weil 提交于
A single osd connection fault (e.g. tcp disconnect) wasn't reopening the connection, which causes all current and future requests for that osd to hang. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 14 2月, 2010 1 次提交
-
-
由 Sage Weil 提交于
The test was backwards from commit b3d1dbbd: keep the message if the connection _isn't_ lossy. This allows the client to continue when the TCP connection drops for some reason (network glitch) but both ends survive. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 12 2月, 2010 2 次提交
-
-
由 Sage Weil 提交于
We were invalidating mapping pages when dropping FILE_CACHE in __send_cap(). But ceph_check_caps attempts to invalidate already, and also checks for success, so we should never get to this point. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
There is no reason not to invalidate pages when a truncate is pending. Both throw out page cache pages. Signed-off-by: NSage Weil <sage@newdream.net>
-