- 03 11月, 2015 2 次提交
-
-
由 Ioana Ciornei 提交于
This patch changes the osd_req_op_data() macro to not evaluate arguments more than once in order to follow the kernel coding style. Signed-off-by: NIoana Ciornei <ciorneiioana@gmail.com> Reviewed-by: NAlex Elder <elder@linaro.org> [idryomov@gmail.com: changelog, formatting] Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Shraddha Barke 提交于
Since handle_reply() does not use its con argument, remove it. Signed-off-by: NShraddha Barke <shraddha.6596@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 16 10月, 2015 1 次提交
-
-
由 Ilya Dryomov 提交于
This covers only the simplest case - an object size sized write, but it's still useful in tiering setups when EC is used for the base tier as writefull op can be proxied, saving an object promotion. Even though updating ceph_osdc_new_request() to allow writefull should just be a matter of fixing an assert, I didn't do it because its only user is cephfs. All other sites were updated. Reflects ceph.git commit 7bfb7f9025a8ee0d2305f49bf0336d2424da5b5b. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 09 9月, 2015 1 次提交
-
-
由 Ilya Dryomov 提交于
Only ->alloc_msg() should check data_len of the incoming message against the preallocated ceph_msg, doing it in the messenger is not right. The contract is that either ->alloc_msg() returns a ceph_msg which will fit all of the portions of the incoming message, or it returns NULL and possibly sets skip, signaling whether NULL is due to an -ENOMEM. ->alloc_msg() should be the only place where we make the skip/no-skip decision. I stumbled upon this while looking at con/osd ref counting. Right now, if we get a non-extent message with a larger data portion than we are prepared for, ->alloc_msg() returns a ceph_msg, and then, when we skip it in the messenger, we don't put the con/osd ref acquired in ceph_con_in_msg_alloc() (which is normally put in process_message()), so this also fixes a memory leak. An existing BUG_ON in ceph_msg_data_cursor_init() ensures we don't corrupt random memory should a buggy ->alloc_msg() return an unfit ceph_msg. While at it, I changed the "unknown tid" dout() to a pr_warn() to make sure all skips are seen and unified format strings. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 25 6月, 2015 3 次提交
-
-
由 Ilya Dryomov 提交于
There are currently three libceph-level timeouts that the user can specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of these are in seconds and no checking is done on user input: negative values are accepted, we multiply them all by HZ which may or may not overflow, arbitrarily large jiffies then get added together, etc. There is also a bug in the way mount_timeout=0 is handled. It's supposed to mean "infinite timeout", but that's not how wait.h APIs treat it and so __ceph_open_session() for example will busy loop without much chance of being interrupted if none of ceph-mons are there. Fix all this by verifying user input, storing timeouts capped by msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies() helper for all user-specified waits to handle infinite timeouts correctly. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zyan@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zyan@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 21 5月, 2015 2 次提交
-
-
由 Ilya Dryomov 提交于
This reverts commit ba9d114e. .. which introduced a regression that prevented all lingering requests requeued in kick_requests() from ever being sent to the OSDs, resulting in a lot of missed notifies. In retrospect it's pretty obvious that r_req_lru_item item in the case of lingering requests can be used not only for notarget, but also for unsent linkage due to how tightly actual map and enqueue operations are coupled in __map_request(). The assertion that was being silenced is taken care of in the previous ("libceph: request a new osdmap if lingering request maps to no osd") commit: by always kicking homeless lingering requests we ensure that none of them ends up on the notarget list outside of the critical section guarded by request_mutex. Cc: stable@vger.kernel.org # 3.18+, needs b0494532 "libceph: request a new osdmap if lingering request maps to no osd" Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NSage Weil <sage@redhat.com>
-
由 Ilya Dryomov 提交于
This commit does two things. First, if there are any homeless lingering requests, we now request a new osdmap even if the osdmap that is being processed brought no changes, i.e. if a given lingering request turned homeless in one of the previous epochs and remained homeless in the current epoch. Not doing so leaves us with a stale osdmap and as a result we may miss our window for reestablishing the watch and lose notifies. MON=1 OSD=1: # cat linger-needmap.sh #!/bin/bash rbd create --size 1 test DEV=$(rbd map test) ceph osd out 0 rbd map dne/dne # obtain a new osdmap as a side effect (!) sleep 1 ceph osd in 0 rbd resize --size 2 test # rbd info test | grep size -> 2M # blockdev --getsize $DEV -> 1M N.B.: Not obtaining a new osdmap in between "osd out" and "osd in" above is enough to make it miss that resize notify, but that is a bug^Wlimitation of ceph watch/notify v1. Second, homeless lingering requests are now kicked just like those lingering requests whose mapping has changed. This is mainly to recognize that a homeless lingering request makes no sense and to preserve the invariant that a registered lingering request is not sitting on any of r_req_lru_item lists. This spares us a WARN_ON, which commit ba9d114e ("libceph: clear r_req_lru_item in __unregister_linger_request()") tried to fix the _wrong_ way. Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NSage Weil <sage@redhat.com>
-
- 19 2月, 2015 2 次提交
-
-
由 Ilya Dryomov 提交于
a255651d ("ceph: ensure auth ops are defined before use") made kfree() in put_osd() conditional on the authorizer. A mechanical mistake most likely - fix it. Cc: Alex Elder <elder@linaro.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NSage Weil <sage@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
It turns out it's possible to get __remove_osd() called twice on the same OSD. That doesn't sit well with rb_erase() - depending on the shape of the tree we can get a NULL dereference, a soft lockup or a random crash at some point in the future as we end up touching freed memory. One scenario that I was able to reproduce is as follows: <osd3 is idle, on the osd lru list> <con reset - osd3> con_fault_finish() osd_reset() <osdmap - osd3 down> ceph_osdc_handle_map() <takes map_sem> kick_requests() <takes request_mutex> reset_changed_osds() __reset_osd() __remove_osd() <releases request_mutex> <releases map_sem> <takes map_sem> <takes request_mutex> __kick_osd_requests() __reset_osd() __remove_osd() <-- !!! A case can be made that osd refcounting is imperfect and reworking it would be a proper resolution, but for now Sage and I decided to fix this by adding a safe guard around __remove_osd(). Fixes: http://tracker.ceph.com/issues/8087 Cc: Sage Weil <sage@redhat.com> Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc5: libceph: assert both regular and lingering lists in __remove_osd() Cc: stable@vger.kernel.org # 3.9+: cc9f1f51: libceph: change from BUG to WARN for __remove_osd() asserts Cc: stable@vger.kernel.org # 3.9+ Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NSage Weil <sage@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 18 12月, 2014 4 次提交
-
-
由 Yan, Zheng 提交于
allow specifying position of extent operation in multi-operations osd request. This is required for cephfs to convert inline data to normal data (compare xattr, then write object). Signed-off-by: NYan, Zheng <zyan@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@redhat.com>
-
由 Yan, Zheng 提交于
Add CEPH_OSD_OP_CREATE support. Also change libceph to not treat CEPH_OSD_OP_DELETE as an extent op and add an assert to that end. Signed-off-by: NYan, Zheng <zyan@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@redhat.com>
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zyan@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@redhat.com>
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
- 14 11月, 2014 3 次提交
-
-
由 Ilya Dryomov 提交于
No reason to use BUG_ON for osd request list assertions. Signed-off-by: NIlya Dryomov <idryomov@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
kick_requests() can put linger requests on the notarget list. This means we need to clear the much-overloaded req->r_req_lru_item in __unregister_linger_request() as well, or we get an assertion failure in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item). AFAICT the assumption was that registered linger requests cannot be on any of req->r_req_lru_item lists, but that's clearly not the case. Signed-off-by: NIlya Dryomov <idryomov@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Requests have to be unlinked from both osd->o_requests (normal requests) and osd->o_linger_requests (linger requests) lists when clearing req->r_osd. Otherwise __unregister_linger_request() gets confused and we trip over a !list_empty(&osd->o_linger_requests) assert in __remove_osd(). MON=1 OSD=1: # cat remove-osd.sh #!/bin/bash rbd create --size 1 test DEV=$(rbd map test) ceph osd out 0 sleep 3 rbd map dne/dne # obtain a new osdmap as a side effect rbd unmap $DEV & # will block sleep 3 ceph osd in 0 Signed-off-by: NIlya Dryomov <idryomov@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 15 10月, 2014 5 次提交
-
-
由 Ilya Dryomov 提交于
Bring in missing osd ops and strings, use macros to eliminate multiple points of maintenance. Signed-off-by: NIlya Dryomov <idryomov@redhat.com> Reviewed-by: NSage Weil <sage@redhat.com>
-
由 Ilya Dryomov 提交于
queue_work() doesn't "fail to queue", it returns false if work was already on a queue, which can't happen here since we allocate event_work right before we queue it. So don't bother at all. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Joe Perches 提交于
Use the more common pr_warn. Other miscellanea: o Coalesce formats o Realign arguments Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com>
-
由 Ilya Dryomov 提交于
Both not yet registered (r_linger && list_empty(&r_linger_item)) and registered linger requests should use the new tid on resend to avoid the dup op detection logic on the OSDs, yet we were doing this only for "registered" case. Factor out and simplify the "registered" logic and use the new helper for "not registered" case as well. Fixes: http://tracker.ceph.com/issues/8806Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Introduce __enqueue_request() and switch to it. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 08 7月, 2014 9 次提交
-
-
由 Ilya Dryomov 提交于
Remove now unused ceph_osdc_unregister_linger_request(). Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Introduce ceph_osdc_cancel_request() intended for canceling requests from the higher layers (rbd and cephfs). Because higher layers are in charge and are supposed to know what and when they are canceling, the request is not completed, only unref'ed and removed from the libceph data structures. __cancel_request() is no longer called before __unregister_request(), because __unregister_request() unconditionally revokes r_request and there is no point in trying to do it twice. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
We should check if request is on the linger request list of any of the OSDs, not whether request is registered or not. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Linger requests that have not yet been registered should not be unregistered by __unregister_linger_request(). This messes up ref count and leads to use-after-free. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
It is important that both regular and lingering requests lists are empty when the OSD is removed. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Add some WARN_ONs to alert us when we try to destroy requests that are still registered. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Add dout()s to ceph_osdc_request_{get,put}(). Also move them to .c and turn kref release callback into a static function. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Abstract out __move_osd_to_lru() logic from __unregister_request() and __unregister_linger_request(). Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
So that: req->r_osd_item --> osd->o_requests list req->r_linger_osd_item --> osd->o_linger_requests list Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 12 6月, 2014 1 次提交
-
-
由 stephen hemminger 提交于
Sparse complained about this bogus extern on definition of a function. Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 4月, 2014 2 次提交
-
-
由 Ilya Dryomov 提交于
In preparation for adding support for primary_temp, stop assuming primaryness: add a primary out parameter to ceph_calc_pg_acting() and change call sites accordingly. Primary is now specified separately from the order of osds in the set. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Split osdmap allocation and initialization into a separate function, ceph_osdmap_decode(). Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 03 4月, 2014 2 次提交
-
-
由 Ilya Dryomov 提交于
This is primarily for rbd's benefit and is supposed to combat fragmentation: "... knowing that rbd images have a 4m size, librbd can pass a hint that will let the osd do the xfs allocation size ioctl on new files so that they are allocated in 1m or 4m chunks. We've seen cases where users with rbd workloads have very high levels of fragmentation in xfs and this would mitigate that and probably have a pretty nice performance benefit." SETALLOCHINT is considered advisory, so our backwards compatibility mechanism here is to set FAILOK flag for all SETALLOCHINT ops. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Encode ceph_osd_op::flags field so that it gets sent over the wire. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 14 2月, 2014 1 次提交
-
-
由 stephen hemminger 提交于
One of my pet coding style peeves is the practice of adding extra return; at the end of function. Kill several instances of this in network code. I suppose some coccinelle wizardy could do this automatically. Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 2月, 2014 2 次提交
-
-
由 Ilya Dryomov 提交于
Handling redirect replies requires both map_sem and request_mutex. Taking map_sem unconditionally near the top of handle_reply() avoids possible race conditions that arise from releasing request_mutex to be able to acquire map_sem in redirect reply case. (Lock ordering is: map_sem, request_mutex, crush_mutex.) Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Ilya Dryomov 提交于
Factor out logic from ceph_osdc_start_request() into a new helper, __ceph_osdc_start_request(). ceph_osdc_start_request() now amounts to taking locks and calling __ceph_osdc_start_request(). Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NSage Weil <sage@inktank.com>
-