- 10 10月, 2012 1 次提交
-
-
由 Alex Elder 提交于
If ceph_fault() is unable to queue work after a delay, it sets the BACKOFF connection flag so con_work() will attempt to do so. In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't result in newly-queued work, it simply ignores this condition and proceeds as if no backoff delay were desired. There are two problems with this--one of which is a bug. The first problem is simply that the intended behavior is to back off, and if we aren't able queue the work item to run after a delay we're not doing that. The only reason queue_delayed_work() won't queue work is if the provided work item is already queued. In the messenger, this means that con_work() is already scheduled to be run again. So if we simply set the BACKOFF flag again when this occurs, we know the next con_work() call will again attempt to hold off activity on the connection until after the delay. The second problem--the bug--is a leak of a reference count. If queue_delayed_work() returns 0 in con_work(), con->ops->put() drops the connection reference held on entry to con_work(). However, processing is (was) allowed to continue, and at the end of the function a second con->ops->put() is called. This patch fixes both problems. Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
- 02 10月, 2012 5 次提交
-
-
由 Sage Weil 提交于
If we are creating an osd request and get an invalid layout, return an EINVAL to the caller. We switch up the return to have an error code instead of NULL implying -ENOMEM. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
If we encounter an invalid (e.g., zeroed) mapping, return an error and avoid a divide by zero. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Wei Yongjun 提交于
Using list_move_tail() instead of list_del() + list_add_tail(). Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: NSage Weil <sage@inktank.com>
-
由 Iulius Curt 提交于
Make ceph_monc_do_poolop() static to remove the following sparse warning: * net/ceph/mon_client.c:616:5: warning: symbol 'ceph_monc_do_poolop' was not declared. Should it be static? Also drops the 'ceph_monc_' prefix, now being a private function. Signed-off-by: NIulius Curt <icurt@ixiacom.com> Signed-off-by: NSage Weil <sage@inktank.com>
-
由 Sage Weil 提交于
This is unused; use monc->client->have_fsid. Signed-off-by: NSage Weil <sage@inktank.com>
-
- 22 9月, 2012 1 次提交
-
-
由 Alex Elder 提交于
In write_partial_msg_pages(), pages need to be kmapped in order to perform a CRC-32c calculation on them. As an artifact of the way this code used to be structured, the kunmap() call was separated from the kmap() call and both were done conditionally. But the conditions under which the kmap() and kunmap() calls were made differed, so there was a chance a kunmap() call would be done on a page that had not been mapped. The symptom of this was tripping a BUG() in kunmap_high() when pkmap_count[nr] became 0. Reported-by: NBryan K. Wright <bryan@virginia.edu> Signed-off-by: NAlex Elder <elder@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
- 22 8月, 2012 1 次提交
-
-
由 Jim Schutt 提交于
Because the Ceph client messenger uses a non-blocking connect, it is possible for the sending of the client banner to race with the arrival of the banner sent by the peer. When ceph_sock_state_change() notices the connect has completed, it schedules work to process the socket via con_work(). During this time the peer is writing its banner, and arrival of the peer banner races with con_work(). If con_work() calls try_read() before the peer banner arrives, there is nothing for it to do, after which con_work() calls try_write() to send the client's banner. In this case Ceph's protocol negotiation can complete succesfully. The server-side messenger immediately sends its banner and addresses after accepting a connect request, *before* actually attempting to read or verify the banner from the client. As a result, it is possible for the banner from the server to arrive before con_work() calls try_read(). If that happens, try_read() will read the banner and prepare protocol negotiation info via prepare_write_connect(). prepare_write_connect() calls con_out_kvec_reset(), which discards the as-yet-unsent client banner. Next, con_work() calls try_write(), which sends the protocol negotiation info rather than the banner that the peer is expecting. The result is that the peer sees an invalid banner, and the client reports "negotiation failed". Fix this by moving con_out_kvec_reset() out of prepare_write_connect() to its callers at all locations except the one where the banner might still need to be sent. [elder@inktak.com: added note about server-side behavior] Signed-off-by: NJim Schutt <jaschut@sandia.gov> Reviewed-by: NAlex Elder <elder@inktank.com>
-
- 21 8月, 2012 1 次提交
-
-
由 Sage Weil 提交于
The debugfs directory includes the cluster fsid and our unique global_id. We need to delay the initialization of the debug entry until we have learned both the fsid and our global_id from the monitor or else the second client can't create its debugfs entry and will fail (and multiple client instances aren't properly reflected in debugfs). Reported by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
-
- 03 8月, 2012 1 次提交
-
-
由 Sylvain Munaut 提交于
Avoid crashing if the crypto key payload was NULL, as when it was not correctly allocated and initialized. Also, avoid leaking it. Signed-off-by: NSylvain Munaut <tnt@246tNt.com> Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
- 31 7月, 2012 29 次提交
-
-
由 Sage Weil 提交于
We drop the lock when calling the ->alloc_msg() con op, which means we need to (a) not clobber con->in_msg without the mutex held, and (b) we need to verify that we are still in the OPEN state when we retake it to avoid causing any mayhem. If the state does change, -EAGAIN will get us back to con_work() and loop. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
This function's calling convention is very limiting. In particular, we can't return any error other than ENOMEM (and only implicitly), which is a problem (see next patch). Instead, return an normal 0 or error code, and make the skip a pointer output parameter. Drop the useless in_hdr argument (we have the con pointer). Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
The ceph_fault() function takes the con mutex, so we should avoid dropping it before calling it. This fixes a potential race with another thread calling ceph_con_close(), or _open(), or similar (we don't reverify con->state after retaking the lock). Add annotation so that lockdep realizes we will drop the mutex before returning. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
We drop the con mutex when delivering a message. When we retake the lock, we need to verify we are still in the OPEN state before preparing to read the next tag, or else we risk stepping on a connection that has been closed. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
Revoke all mon_client messages when we shut down the old connection. This is mostly moot since we are re-using the same ceph_connection, but it is cleaner. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
If the connect() call immediately fails such that sock == NULL, we still need con_close_socket() to reset our socket state to CLOSED. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
There are many (normal) conditions that can lead to us getting unexpected replies, include cluster topology changes, osd failures, and timeouts. There's no need to spam the console about it. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
Signed-off-by: NSage Weil <sage@inktank.com>
-
由 Sage Weil 提交于
Rename flags with CON_FLAG prefix, move the definitions into the c file, and (better) document their meaning. Signed-off-by: NSage Weil <sage@inktank.com>
-
由 Sage Weil 提交于
Use a simple set of 6 enumerated values for the socket states (CON_STATE_*) and use those instead of the state bits. All of the con->state checks are now under the protection of the con mutex, so this is safe. It also simplifies many of the state checks because we can check for anything other than the expected state instead of various bits for races we can think of. This appears to hold up well to stress testing both with and without socket failure injection on the server side. Signed-off-by: NSage Weil <sage@inktank.com>
-
由 Sage Weil 提交于
If we are CLOSED, the socket is closed and we won't get these. Signed-off-by: NSage Weil <sage@inktank.com>
-
由 Sage Weil 提交于
It is simpler to do this immediately, since we already hold the con mutex. It also avoids the need to deal with a not-quite-CLOSED socket in con_work. Signed-off-by: NSage Weil <sage@inktank.com>
-
由 Sage Weil 提交于
If the state is CLOSED or OPENING, we shouldn't have a socket. Signed-off-by: NSage Weil <sage@inktank.com>
-
由 Sage Weil 提交于
Take the con mutex before checking whether the connection is closed to avoid racing with someone else closing it. Signed-off-by: NSage Weil <sage@inktank.com>
-
由 Sage Weil 提交于
Avoid dropping and retaking con->mutex in the ceph_con_send() case by leaving locking up to the caller. Signed-off-by: NSage Weil <sage@inktank.com>
-
由 Sage Weil 提交于
If we fault on a lossy connection, we should still close the socket immediately, and do so under the con mutex. We should also take the con mutex before printing out the state bits in the debug output. Signed-off-by: NSage Weil <sage@inktank.com>
-
由 Jiaju Zhang 提交于
This is a trivial fix for the debug output, as it is inconsistent with the function name so may confuse people when debugging. [elder@inktank.com: switched to use __func__] Signed-off-by: NJiaju Zhang <jjzhang@suse.de> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
We exponentially back off when we encounter connection errors. If several errors accumulate, we will eventually wait ages before even trying to reconnect. Fix this by resetting the backoff counter after a successful negotiation/ connection with the remote node. Fixes ceph issue #2802. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NYehuda Sadeh <yehuda@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
Take the con mutex while we are initiating a ceph open. This is necessary because the may have previously been in use and then closed, which could result in a racing workqueue running con_work(). Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NYehuda Sadeh <yehuda@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com>
-
由 Sage Weil 提交于
Previously, we were opportunistically initializing the bio_iter if it appeared to be uninitialized in the middle of the read path. The problem is that a sequence like: - start reading message - initialize bio_iter - read half a message - messenger fault, reconnect - restart reading message - ** bio_iter now non-NULL, not reinitialized ** - read past end of bio, crash Instead, initialize the bio_iter unconditionally when we allocate/claim the message for read. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com> Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
-
由 Sage Weil 提交于
The linger op registration (i.e., watch) modifies the object state. As such, the OSD will reply with success if it has already applied without doing the associated side-effects (setting up the watch session state). If we lose the ACK and resubmit, we will see success but the watch will not be correctly registered and we won't get notifies. To fix this, always resubmit the linger op with a new tid. We accomplish this by re-registering as a linger (i.e., 'registered') if we are not yet registered. Then the second loop will treat this just like a normal case of re-registering. This mirrors a similar fix on the userland ceph.git, commit 5dd68b95, and ceph bug #2796. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com> Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
-
由 Sage Weil 提交于
Hold the mutex while twiddling all of the state bits to avoid possible races. While we're here, make not of why we cannot close the socket directly. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com> Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
-
由 Sage Weil 提交于
We need to set error_msg to something useful before calling ceph_fault(); do so here for try_{read,write}(). This is more informative than libceph: osd0 192.168.106.220:6801 (null) Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com> Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
-
由 Sage Weil 提交于
The server side recently added support for tuning some magic crush variables. Decode these variables if they are present, or use the default values if they are not present. Corresponds to ceph.git commit 89af369c25f274fe62ef730e5e8aad0c54f1e5a5. Signed-off-by: Ncaleb miles <caleb.miles@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com> Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
-
由 Sage Weil 提交于
This is simply cleanup that will keep things more closely synced with the userland code. Signed-off-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@inktank.com> Reviewed-by: NYehuda Sadeh <yehuda@inktank.com>
-
由 Guanjun He 提交于
Add an atomic variable 'stopping' as flag in struct ceph_messenger, set this flag to 1 in function ceph_destroy_client(), and add the condition code in function ceph_data_ready() to test the flag value, if true(1), just return. Signed-off-by: NGuanjun He <gjhe@suse.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Sage Weil 提交于
In ancient times, the messenger could both initiate and accept connections. An artifact if that was data structures to store/process an incoming ceph_msg_connect request and send an outgoing ceph_msg_connect_reply. Sadly, the negotiation code was referencing those structures and ignoring important information (like the peer's connect_seq) from the correct ones. Among other things, this fixes tight reconnect loops where the server sends RETRY_SESSION and we (the client) retries with the same connect_seq as last time. This bug pretty easily triggered by injecting socket failures on the MDS and running some fs workload like workunits/direct_io/test_sync_io. Signed-off-by: NSage Weil <sage@inktank.com>
-
由 Sage Weil 提交于
These don't strictly need to be initialized based on how they are used, but it is good practice to do so. Reported-by: NAlex Elder <elder@inktank.com> Signed-off-by: NSage Weil <sage@inktank.com>
-
由 Sage Weil 提交于
Initialize the type field for messages in a msgpool. The caller was doing this for osd ops, but not for the reply messages. Reported-by: NAlex Elder <elder@inktank.com> Signed-off-by: NSage Weil <sage@inktank.com>
-
- 18 7月, 2012 1 次提交
-
-
由 Sage Weil 提交于
In ancient times, the messenger could both initiate and accept connections. An artifact if that was data structures to store/process an incoming ceph_msg_connect request and send an outgoing ceph_msg_connect_reply. Sadly, the negotiation code was referencing those structures and ignoring important information (like the peer's connect_seq) from the correct ones. Among other things, this fixes tight reconnect loops where the server sends RETRY_SESSION and we (the client) retries with the same connect_seq as last time. This bug pretty easily triggered by injecting socket failures on the MDS and running some fs workload like workunits/direct_io/test_sync_io. Signed-off-by: NSage Weil <sage@inktank.com>
-