- 18 12月, 2014 3 次提交
-
-
由 Yan, Zheng 提交于
Add CEPH_OSD_OP_CREATE support. Also change libceph to not treat CEPH_OSD_OP_DELETE as an extent op and add an assert to that end. Signed-off-by: NYan, Zheng <zyan@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@redhat.com>
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zyan@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@redhat.com>
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
- 14 11月, 2014 3 次提交
-
-
由 Ilya Dryomov 提交于
No reason to use BUG_ON for osd request list assertions. Signed-off-by: NIlya Dryomov <idryomov@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
kick_requests() can put linger requests on the notarget list. This means we need to clear the much-overloaded req->r_req_lru_item in __unregister_linger_request() as well, or we get an assertion failure in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item). AFAICT the assumption was that registered linger requests cannot be on any of req->r_req_lru_item lists, but that's clearly not the case. Signed-off-by: NIlya Dryomov <idryomov@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Requests have to be unlinked from both osd->o_requests (normal requests) and osd->o_linger_requests (linger requests) lists when clearing req->r_osd. Otherwise __unregister_linger_request() gets confused and we trip over a !list_empty(&osd->o_linger_requests) assert in __remove_osd(). MON=1 OSD=1: # cat remove-osd.sh #!/bin/bash rbd create --size 1 test DEV=$(rbd map test) ceph osd out 0 sleep 3 rbd map dne/dne # obtain a new osdmap as a side effect rbd unmap $DEV & # will block sleep 3 ceph osd in 0 Signed-off-by: NIlya Dryomov <idryomov@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 15 10月, 2014 5 次提交
-
-
由 Ilya Dryomov 提交于
Bring in missing osd ops and strings, use macros to eliminate multiple points of maintenance. Signed-off-by: NIlya Dryomov <idryomov@redhat.com> Reviewed-by: NSage Weil <sage@redhat.com>
-
由 Ilya Dryomov 提交于
queue_work() doesn't "fail to queue", it returns false if work was already on a queue, which can't happen here since we allocate event_work right before we queue it. So don't bother at all. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Joe Perches 提交于
Use the more common pr_warn. Other miscellanea: o Coalesce formats o Realign arguments Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
-
由 Ilya Dryomov 提交于
Both not yet registered (r_linger && list_empty(&r_linger_item)) and registered linger requests should use the new tid on resend to avoid the dup op detection logic on the OSDs, yet we were doing this only for "registered" case. Factor out and simplify the "registered" logic and use the new helper for "not registered" case as well. Fixes: http://tracker.ceph.com/issues/8806Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Introduce __enqueue_request() and switch to it. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 08 7月, 2014 9 次提交
-
-
由 Ilya Dryomov 提交于
Remove now unused ceph_osdc_unregister_linger_request(). Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Introduce ceph_osdc_cancel_request() intended for canceling requests from the higher layers (rbd and cephfs). Because higher layers are in charge and are supposed to know what and when they are canceling, the request is not completed, only unref'ed and removed from the libceph data structures. __cancel_request() is no longer called before __unregister_request(), because __unregister_request() unconditionally revokes r_request and there is no point in trying to do it twice. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
We should check if request is on the linger request list of any of the OSDs, not whether request is registered or not. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Linger requests that have not yet been registered should not be unregistered by __unregister_linger_request(). This messes up ref count and leads to use-after-free. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
It is important that both regular and lingering requests lists are empty when the OSD is removed. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Add some WARN_ONs to alert us when we try to destroy requests that are still registered. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Add dout()s to ceph_osdc_request_{get,put}(). Also move them to .c and turn kref release callback into a static function. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Abstract out __move_osd_to_lru() logic from __unregister_request() and __unregister_linger_request(). Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
So that: req->r_osd_item --> osd->o_requests list req->r_linger_osd_item --> osd->o_linger_requests list Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 12 6月, 2014 1 次提交
-
-
由 stephen hemminger 提交于
Sparse complained about this bogus extern on definition of a function. Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 4月, 2014 2 次提交
-
-
由 Ilya Dryomov 提交于
In preparation for adding support for primary_temp, stop assuming primaryness: add a primary out parameter to ceph_calc_pg_acting() and change call sites accordingly. Primary is now specified separately from the order of osds in the set. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Split osdmap allocation and initialization into a separate function, ceph_osdmap_decode(). Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 03 4月, 2014 2 次提交
-
-
由 Ilya Dryomov 提交于
This is primarily for rbd's benefit and is supposed to combat fragmentation: "... knowing that rbd images have a 4m size, librbd can pass a hint that will let the osd do the xfs allocation size ioctl on new files so that they are allocated in 1m or 4m chunks. We've seen cases where users with rbd workloads have very high levels of fragmentation in xfs and this would mitigate that and probably have a pretty nice performance benefit." SETALLOCHINT is considered advisory, so our backwards compatibility mechanism here is to set FAILOK flag for all SETALLOCHINT ops. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Encode ceph_osd_op::flags field so that it gets sent over the wire. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 14 2月, 2014 1 次提交
-
-
由 stephen hemminger 提交于
One of my pet coding style peeves is the practice of adding extra return; at the end of function. Kill several instances of this in network code. I suppose some coccinelle wizardy could do this automatically. Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 2月, 2014 2 次提交
-
-
由 Ilya Dryomov 提交于
Handling redirect replies requires both map_sem and request_mutex. Taking map_sem unconditionally near the top of handle_reply() avoids possible race conditions that arise from releasing request_mutex to be able to acquire map_sem in redirect reply case. (Lock ordering is: map_sem, request_mutex, crush_mutex.) Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Factor out logic from ceph_osdc_start_request() into a new helper, __ceph_osdc_start_request(). ceph_osdc_start_request() now amounts to taking locks and calling __ceph_osdc_start_request(). Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
- 04 2月, 2014 1 次提交
-
-
由 Ilya Dryomov 提交于
msgpool_op_reply message pool isn't destroyed if workqueue construction fails. Fix it. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
- 28 1月, 2014 7 次提交
-
-
由 Ilya Dryomov 提交于
Follow redirect replies from osds, for details see ceph.git commit fbbe3ad1220799b7bb00ea30fce581c5eadaf034. v1 (current) version of redirect reply consists of oloc and oid, which expands to pool, key, nspace, hash and oid. However, server-side code that would populate anything other than pool doesn't exist yet, and hence this commit adds support for pool redirects only. To make sure that future server-side updates don't break us, we decode all fields and, if any of key, nspace, hash or oid have a non-default value, error out with "corrupt osd_op_reply ..." message. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Rename ceph_osd_request::r_{oloc,oid} to r_base_{oloc,oid} before introducing r_target_{oloc,oid} needed for redirects. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Overwrite ceph_osd_request::r_oloc.pool with read_tier for read ops and write_tier for write and read+write ops (aka basic tiering support). {read,write}_tier are part of pg_pool_t since v9. This commit bumps our pg_pool_t decode compat version from v7 to v9, all new fields except for {read,write}_tier are ignored. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Switch ceph_calc_ceph_pg() to new oloc and oid abstractions and rename it to ceph_oloc_oid_to_pg() to make its purpose more clear. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
In preparation for tiering support, which would require having two (base and target) object names for each osd request and also copying those names around, introduce struct ceph_object_id (oid) and a couple helpers to facilitate those copies and encapsulate the fact that object name is not necessarily a NUL-terminated string. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
In preparation for adding oid abstraction, rename MAX_OBJ_NAME_SIZE to CEPH_MAX_OID_NAME_LEN. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Instead of relying on pool fields in ceph_file_layout (for mapping) and ceph_pg (for enconding), start using ceph_object_locator (oloc) abstraction. Note that userspace oloc currently consists of pool, key, nspace and hash fields, while this one contains only a pool. This is OK, because at this point we only send (i.e. encode) olocs and never have to receive (i.e. decode) them. This makes keeping a copy of ceph_file_layout in every osd request unnecessary, so ceph_osd_request::r_file_layout field is nuked. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
- 26 1月, 2014 1 次提交
-
-
由 Ilya Dryomov 提交于
Add a missing newline to a dout() in __reset_osd(). Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
-
- 14 1月, 2014 2 次提交
-
-
由 Ilya Dryomov 提交于
The check that makes sure that we have enough memory allocated to read in the entire header of the message in question is currently busted. It compares front_len of the incoming message with iov_len field of ceph_msg::front structure, which is used primarily to indicate the amount of data already read in, and not the size of the allocated buffer. Under certain conditions (e.g. a short read from a socket followed by that socket's shutdown and owning ceph_connection reset) this results in a warning similar to [85688.975866] libceph: get_reply front 198 > preallocated 122 (4#0) and, through another bug, leads to forever hung tasks and forced reboots. Fix this by comparing front_len with front_alloc_len field of struct ceph_msg, which stores the actual size of the buffer. Fixes: http://tracker.ceph.com/issues/5425Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Rename front local variable to front_len in get_reply() to make its purpose more clear. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
- 14 12月, 2013 1 次提交
-
-
由 Josh Durgin 提交于
With the current full handling, there is a race between osds and clients getting the first map marked full. If the osd wins, it will return -ENOSPC to any writes, but the client may already have writes in flight. This results in the client getting the error and propagating it up the stack. For rbd, the block layer turns this into EIO, which can cause corruption in filesystems above it. To avoid this race, osds are being changed to drop writes that came from clients with an osdmap older than the last osdmap marked full. In order for this to work, clients must resend all writes after they encounter a full -> not full transition in the osdmap. osds will wait for an updated map instead of processing a request from a client with a newer map, so resent writes will not be dropped by the osd unless there is another not full -> full transition. This approach requires both osds and clients to be fixed to avoid the race. Old clients talking to osds with this fix may hang instead of returning EIO and potentially corrupting an fs. New clients talking to old osds have the same behavior as before if they encounter this race. Fixes: http://tracker.ceph.com/issues/6938Reviewed-by: NSage Weil <sage@inktank.com> Signed-off-by: NJosh Durgin <josh.durgin@inktank.com>
-