- 29 9月, 2011 2 次提交
-
-
由 Sage Weil 提交于
The incremental map updates have a record for each pg_temp mapping that is to be add/updated (len > 0) or removed (len == 0). The old code was written as if the updates were a complete enumeration; that was just wrong. Update the code to remove 0-length entries and drop the rbtree traversal. This avoids misdirected (and hung) requests that manifest as server errors like [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11 Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We need to apply the modulo pg_num calculation before looking up a pgid in the pg_temp mapping rbtree. This fixes pg_temp mappings, and fixes (some) misdirected requests that result in messages like [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11 on the server and stall make the client block without getting a reply (at least until the pg_temp mapping goes way, but that can take a long long time). Reorder calc_pg_raw() a bit to make more sense. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 17 9月, 2011 3 次提交
-
-
由 Sage Weil 提交于
The r_req_lru_item list node moves between several lists, and that cycle is not directly related (and does not begin) with __register_request(). Initialize it in the request constructor, not __register_request(). This fixes later badness (below) when OSDs restart underneath an rbd mount. Crashes we've seen due to this include: [ 213.974288] kernel BUG at net/ceph/messenger.c:2193! and [ 144.035274] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 144.035278] IP: [<ffffffffa036c053>] con_work+0x1463/0x2ce0 [libceph] Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Noah Watkins 提交于
ceph_destroy_options does not free opt->mon_addr that is allocated in ceph_parse_options. Signed-off-by: NNoah Watkins <noahwatkins@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Jim Schutt 提交于
Commit 4cf9d544 recorded when an outgoing ceph message was ACKed, in order to avoid unnecessary connection resets when an OSD is busy. However, ack_stamp is uninitialized, so there is a window between when the message is sent and when it is ACKed in which handle_timeout() interprets the unitialized value as an expired timeout, and resets the connection unnecessarily. Close the window by initializing ack_stamp. Signed-off-by: NJim Schutt <jaschut@sandia.gov> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 01 9月, 2011 1 次提交
-
-
由 Sage Weil 提交于
We want to remove all OSDs, not just those on the idle LRU. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 10 8月, 2011 1 次提交
-
-
由 Sage Weil 提交于
There were several problems here: 1- we weren't tagging allocations with the pool, so they were never returned to the pool. 2- msgpool_put didn't add back to the mempool, even it were called. 3- msgpool_release didn't clear the pool pointer, so it would have looped had #1 not been broken. These may or may not have been responsible for #1136 or #1381 (BUG due to non-empty mempool on umount). I can't seem to trigger the crash now using the method I was using before. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 27 7月, 2011 1 次提交
-
-
由 Sage Weil 提交于
Keep track of when an outgoing message is ACKed (i.e., the server fully received it and, presumably, queued it for processing). Time out OSD requests only if it's been too long since they've been received. This prevents timeouts and connection thrashing when the OSDs are simply busy and are throttling the requests they read off the network. Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 20 7月, 2011 1 次提交
-
-
由 Sage Weil 提交于
open(2) must always include one of O_RDONLY, O_WRONLY, or O_RDWR. No need for any O_APPEND special case. Passing O_WRONLY|O_RDWR is undefined according to the man page, but the Linux VFS interprets this as O_RDWR, so we'll do the same. This fixes open(2) with flags O_RDWR|O_APPEND, which was incorrectly being translated to readonly. Reported-by: NFyodor Ustinov <ufm@ufm.su> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 14 6月, 2011 1 次提交
-
-
由 Sage Weil 提交于
Set the page count correctly for non-page-aligned IO. We were already doing this correctly for alignment, but not the page count. Fixes DIRECT_IO writes from unaligned pages. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 08 6月, 2011 1 次提交
-
-
由 Sage Weil 提交于
If we cancel a write, trigger the safe completions to prevent a sync from blocking indefinitely in ceph_osdc_sync(). Signed-off-by: NSage Weil <sage@newdream.net>
-
- 25 5月, 2011 2 次提交
-
-
由 Sage Weil 提交于
When the cluster is marked full, subscribe to subsequent map updates to ensure we find out promptly when it is no longer full. This will prevent us from spewing ENOSPC for (much) longer than necessary. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Old incrementals encode a 0 value (nearly always) when an osd goes down. Change that to allow any state bit(s) to be flipped. Special case 0 to mean flip the CEPH_OSD_UP bit to mimic the old behavior. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 20 5月, 2011 8 次提交
-
-
由 Sage Weil 提交于
Since we pass the nofail arg, we should never get an error; BUG if we do. (And fix the function to not return an error if __map_request fails.) Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
If we get a WAIT as a client something went wrong; error out. And don't fall through to an unrelated case. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
If there is no get_authorizer method we set the out_kvec to a bogus pointer. The length is also zero in that case, so it doesn't much matter, but it's better not to add the empty item in the first place. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
If a connection is closed and/or reopened (ceph_con_close, ceph_con_open) it can race with a callback. con_work does various state checks for closed or reopened sockets at the beginning, but drops con->mutex before making callbacks. We need to check for state bit changes after retaking the lock to ensure we restart con_work and execute those CLOSED/OPENING tests or else we may end up operating under stale assumptions. In Jim's case, this was causing 'bad tag' errors. There are four cases where we re-take the con->mutex inside con_work: catch them all and return EAGAIN from try_{read,write} so that we can restart con_work. Reported-by: NJim Schutt <jaschut@sandia.gov> Tested-by: NJim Schutt <jaschut@sandia.gov> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 04 5月, 2011 2 次提交
-
-
由 Sage Weil 提交于
ceph_osdc_alloc_request returns NULL on failure. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Henry C Chang 提交于
If memory allocation failed, calling ceph_msg_put() will cause GPF since some of ceph_msg variables are not initialized first. Fix Bug #970. Signed-off-by: NHenry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 07 4月, 2011 1 次提交
-
-
由 Sage Weil 提交于
Fix the request transition from linger -> normal request. The key is to preserve r_osd and requeue on the same OSD. Reregister as a normal request, add the request to the proper queues, then unregister the linger. Fix the unregister helper to avoid clearing r_osd (and also simplify the parallel check in __unregister_request()). Reported-by: NHenry Chang <henry.cy.chang@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 31 3月, 2011 1 次提交
-
-
由 Lucas De Marchi 提交于
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
-
- 30 3月, 2011 4 次提交
-
-
由 Tommi Virtanen 提交于
This allows us to use existence of the key type as a feature test, from userspace. Signed-off-by: NTommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Tommi Virtanen 提交于
Signed-off-by: NTommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Tommi Virtanen 提交于
This makes the base64 logic be contained in mount option parsing, and prepares us for replacing the homebew key management with the kernel key retention service. Signed-off-by: NTommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
We should only clear r_osd if we are neither registered as a linger or a regular request. We may unregister as a linger while still registered as a regular request (e.g., in reset_osd). Incorrectly clearing r_osd there leads to a null pointer dereference in __send_request. Also simplify the parallel check in __unregister_request() where we just removed r_osd_item and know it's empty. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 29 3月, 2011 1 次提交
-
-
由 Dan Carpenter 提交于
There was a missing unlock on the error path if __map_request() failed. Signed-off-by: NDan Carpenter <error27@gmail.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 27 3月, 2011 1 次提交
-
-
由 Mariusz Kozlowski 提交于
This patch fixes 'event_work' dereference before it is checked for NULL. Signed-off-by: NMariusz Kozlowski <mk@lab.zgora.pl> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 26 3月, 2011 1 次提交
-
-
由 Sage Weil 提交于
The release method for mds connections uses a backpointer to the mds_client, so we need to flush the workqueue of any pending work (and ceph_connection references) prior to freeing the mds_client. This fixes an oops easily triggered under UML by while true ; do mount ... ; umount ... ; done Also fix an outdated comment: the flush in ceph_destroy_client only flushes OSD connections out. This bug is basically an artifact of the ceph -> ceph+libceph conversion. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 23 3月, 2011 1 次提交
-
-
由 Yehuda Sadeh 提交于
Lingering requests are requests that are sent to the OSD normally but tracked also after we get a successful request. This keeps the OSD connection open and resends the original request if the object moves to another OSD. The OSD can then send notification messages back to us if another client initiates a notify. This framework will be used by RBD so that the client gets notification when a snapshot is created by another node or tool. Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 22 3月, 2011 1 次提交
-
-
由 Sage Weil 提交于
If we send a request to osd A, and the request's pg remaps to osd B and then back to A in quick succession, we need to resend the request to A. The old code was only calling kick_requests after processing all incremental maps in a message, so it was very possible to not resend a request that needed to be resent. This would make the osd eventually time out (at least with the current default of osd timeouts enabled). The correct approach is to scan requests on every map incremental. This patch refactors the kick code in a few ways: - all requests are either on req_lru (in flight), req_unsent (ready to send), or req_notarget (currently map to no up osd) - mapping always done by map_request (previous map_osds) - if the mapping changes, we requeue. requests are resent only after all map incrementals are processed. - some osd reset code is moved out of kick_requests into a separate function - the "kick this osd" functionality is moved to kick_osd_requests, as it is unrelated to scanning for request->pg->osd mapping changes Signed-off-by: NSage Weil <sage@newdream.net>
-
- 16 3月, 2011 1 次提交
-
-
由 Tommi Virtanen 提交于
It used to return -EINVAL because it thought the end was not aligned to 4 bytes. Clean up superfluous src < end test in if, the while itself guarantees that. Signed-off-by: NTommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 05 3月, 2011 3 次提交
-
-
由 Sage Weil 提交于
The standby logic used to be pretty dependent on the work requeueing behavior that changed when we switched to WQ_NON_REENTRANT. It was also very fragile. Restructure things so that: - We clear WRITE_PENDING when we set STANDBY. This ensures we will requeue work when we wake up later. - con_work backs off if STANDBY is set. There is nothing to do if we are in standby. - clear_standby() helper is called by both con_send() and con_keepalive(), the two actions that can wake us up again. Move the connect_seq++ logic here. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
There was some broken keepalive code using a dead variable. Shift to using the proper bit flag. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
With commit f363e45f we replaced a bunch of hacky workqueue mutual exclusion logic with the WQ_NON_REENTRANT flag. One pieces of fallout is that the exponential backoff breaks in certain cases: * con_work attempts to connect. * we get an immediate failure, and the socket state change handler queues immediate work. * con_work calls con_fault, we decide to back off, but can't queue delayed work. In this case, we add a BACKOFF bit to make con_work reschedule delayed work next time it runs (which should be immediately). Signed-off-by: NSage Weil <sage@newdream.net>
-
- 04 3月, 2011 2 次提交
-
-
由 Sage Weil 提交于
If we mark the connection CLOSED we will give up trying to reconnect to this server instance. That is appropriate for things like a protocol version mismatch that won't change until the server is restarted, at which point we'll get a new addr and reconnect. An authorization failure like this is probably due to the server not properly rotating it's secret keys, however, and should be treated as transient so that the normal backoff and retry behavior kicks in. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
get_user_pages() can return fewer pages than we ask for. We were returning a bogus pointer/error code in that case. Instead, loop until we get all the pages we want or get an error we can return to the caller. Signed-off-by: NSage Weil <sage@newdream.net>
-