- 07 7月, 2017 3 次提交
-
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Store both raw pgid and actual spgid in ceph_osd_request_target. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 04 5月, 2017 4 次提交
-
-
由 Jeff Layton 提交于
Cephfs can get cap update requests that contain a new epoch barrier in them. When that happens we want to pause all OSD traffic until the right map epoch arrives. Add an epoch_barrier field to ceph_osd_client that is protected by the osdc->lock rwsem. When the barrier is set, and the current OSD map epoch is below that, pause the request target when submitting the request or when revisiting it. Add a way for upper layers (cephfs) to update the epoch_barrier as well. If we get a new map, compare the new epoch against the barrier before kicking requests and request another map if the map epoch is still lower than the one we want. If we get a map with a full pool, or at quota condition, then set the barrier to the current epoch value. Signed-off-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
Usually, when the osd map is flagged as full or the pool is at quota, write requests just hang. This is not what we want for cephfs, where it would be better to simply report -ENOSPC back to userland instead of stalling. If the caller knows that it will want an immediate error return instead of blocking on a full or at-quota error condition then allow it to set a flag to request that behavior. Set that flag in ceph_osdc_new_request (since ceph.ko is the only caller), and on any other write request from ceph.ko. A later patch will deal with requests that were submitted before the new map showing the full condition came in. Signed-off-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
Nothing uses this anymore with the removal of the ack vs. commit code. Remove the field and just encode zeroes into place in the request encoding. Signed-off-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Elena Reshetova 提交于
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: NElena Reshetova <elena.reshetova@intel.com> Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NDavid Windsor <dwindsor@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 07 3月, 2017 1 次提交
-
-
由 Ilya Dryomov 提交于
osd_request_timeout specifies how many seconds to wait for a response from OSDs before returning -ETIMEDOUT from an OSD request. 0 (default) means no limit. osd_request_timeout is osdkeepalive-precise -- in-flight requests are swept through every osdkeepalive seconds. With ack vs commit behaviour gone, abort_request() is really simple. This is based on a patch from Artur Molchanov <artur.molchanov@synesis.ru>. Tested-by: NArtur Molchanov <artur.molchanov@synesis.ru> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NSage Weil <sage@redhat.com>
-
- 25 2月, 2017 1 次提交
-
-
由 Ilya Dryomov 提交于
- CEPH_OSD_FLAG_ACK shouldn't be set anymore, so assert on it - remove support for handling ack replies (OSDs will send ack replies only if clients request them) - drop the "do lingering callbacks under osd->lock" logic from handle_reply() -- lreq->lock is sufficient in all three cases Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: NSage Weil <sage@redhat.com>
-
- 15 12月, 2016 1 次提交
-
-
由 Ilya Dryomov 提交于
r_safe_completion is currently, and has always been, signaled only if on-disk ack was requested. It's there for fsync and syncfs, which wait for in-flight writes to flush - all data write requests set ONDISK. However, the pool perm check code introduced in 4.2 sends a write request with only ACK set. An unfortunately timed syncfs can then hang forever: r_safe_completion won't be signaled because only an unsafe reply was requested. We could patch ceph_osdc_sync() to skip !ONDISK write requests, but that is somewhat incomplete and yet another special case. Instead, rename this completion to r_done_completion and always signal it when the OSD client is done with the request, whether unsafe, safe, or error. This is a bit cleaner and helps with the cancellation code. Reported-by: NYan, Zheng <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 11 11月, 2016 1 次提交
-
-
由 Ilya Dryomov 提交于
osdc->last_linger_id is a counter for lreq->linger_id, which is used for watch cookies. Starting with a large integer should ease the task of telling apart kernel and userspace clients. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 25 8月, 2016 2 次提交
-
-
由 Douglas Fuller 提交于
Add a convenience function to osd_client to send Ceph OSD 'class' ops. The interface assumes that the request and reply data each consist of single pages. Signed-off-by: NDouglas Fuller <dfuller@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Douglas Fuller 提交于
Add support for this Ceph OSD op, needed to support the RBD exclusive lock feature. Signed-off-by: NDouglas Fuller <dfuller@redhat.com> [idryomov@gmail.com: refactor, misc fixes throughout] Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 28 7月, 2016 1 次提交
-
-
由 Ilya Dryomov 提交于
- decode.h needs slab.h for kmalloc() - osd_client.h needs msgpool.h for struct ceph_msgpool - msgpool.h doesn't need messenger.h Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 31 5月, 2016 1 次提交
-
-
由 Ilya Dryomov 提交于
For the benefit of every single caller, take osdc instead of map. Also, now that osdc->osdmap can't ever be NULL, drop the check. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 26 5月, 2016 14 次提交
-
-
由 Ilya Dryomov 提交于
... with a wrapper around maybe_request_map() - no need for two osdmap-specific functions. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
This adds the "map check" infrastructure for sending osdmap version checks on CALC_TARGET_POOL_DNE and completing in-flight requests with -ENOENT if the target pool doesn't exist or has just been deleted. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Implement ceph_osdc_watch_check() to be able to check on status of watch. Note that the time it takes for a watch/notify event to get delivered through the notify_wq is taken into account. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Implement ceph_osdc_notify() for sending notifies. Due to the fact that the current messenger can't do read-in into pagelists (it can only do write-out from them), I had to go with a page vector for a NOTIFY_COMPLETE payload, for now. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
This adds support and switches rbd to a new, more reliable version of watch/notify protocol. As with the OSD client update, this is mostly about getting the right structures linked into the right places so that reconnects are properly sent when needed. watch/notify v2 also requires sending regular pings to the OSDs - send_linger_ping(). A major change from the old watch/notify implementation is the introduction of ceph_osd_linger_request - linger requests no longer piggy back on ceph_osd_request. ceph_osd_event has been merged into ceph_osd_linger_request. All the details are now hidden within libceph, the interface consists of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack(). ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep the lifetime management simple. ceph_osdc_notify_ack() accepts an optional data payload, which is relayed back to the notifier. Portions of this patch are loosely based on work by Douglas Fuller <dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
This is a major sync up, up to ~Jewel. The highlights are: - per-session request trees (vs a global per-client tree) - per-session locking (vs a global per-client rwlock) - homeless OSD session - no ad-hoc global per-client lists - support for pool quotas - foundation for watch/notify v2 support - foundation for map check (pool deletion detection) support The switchover is incomplete: lingering requests can be setup and teared down but aren't ever reestablished. This functionality is restored with the introduction of the new lingering infrastructure (ceph_osd_linger_request, linger_work, etc) in a later commit. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
OSD client is getting moved from the big per-client lock to a set of per-session locks. The big rwlock would only be held for read most of the time, so a global osdc->osd_lru needs additional protection. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
If you specify ACK | ONDISK and set ->r_unsafe_callback, both ->r_callback and ->r_unsafe_callback(true) are called on ack. This is very confusing. Redo this so that only one of them is called: ->r_unsafe_callback(true), on ack ->r_unsafe_callback(false), on commit or ->r_callback, on ack|commit Decode everything in decode_MOSDOpReply() to reduce clutter. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
finish_read(), its only user, uses it to get to hdr.data_len, which is what ->r_result is set to on success. This gains us the ability to safely call callbacks from contexts other than reply, e.g. map check. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
The crux of this is getting rid of ceph_osdc_build_request(), so that MOSDOp can be encoded not before but after calc_target() calculates the actual target. Encoding now happens within ceph_osdc_start_request(). Also nuked is the accompanying bunch of pointers into the encoded buffer that was used to update fields on each send - instead, the entire front is re-encoded. If we want to support target->name_len != base->name_len in the future, there is no other way, because oid is surrounded by other fields in the encoded buffer. Encoding OSD ops and adding data items to the request message were mixed together in osd_req_encode_op(). While we want to re-encode OSD ops, we don't want to add duplicate data items to the message when resending, so all call to ceph_osdc_msg_data_add() are factored out into a new setup_request_data(). Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Replace __calc_request_pg() and most of __map_request() with calc_target() and start using req->r_t. ceph_osdc_build_request() however still encodes base_oid, because it's called before calc_target() is and target_oid is empty at that point in time; a printf in osdc_show() also shows base_oid. This is fixed in "libceph: switch to calc_target(), part 2". Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Introduce ceph_osd_request_target, containing all mapping-related fields of ceph_osd_request and calc_target() for calculating mappings and populating it. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Either unused or useless: osdmap->mkfs_epoch osd->o_marked_for_keepalive monc->num_generic_requests osdc->map_waiters osdc->last_requested_map osdc->timeout_tid osd_req_op_cls_response_data() osdmap_apply_incremental() @msgr arg Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
The size of ->r_request and ->r_reply messages depends on the size of the object name (ceph_object_id), while the size of ceph_osd_request is fixed. Move message allocation into a separate function that would have to be called after ceph_object_id and ceph_object_locator (which is also going to become variable in size with RADOS namespaces) have been filled in: req = ceph_osdc_alloc_request(...); <fill in req->r_base_oid> <fill in req->r_base_oloc> ceph_osdc_alloc_messages(req); Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 26 4月, 2016 1 次提交
-
-
由 Ilya Dryomov 提交于
Starting the kernel client with cephx disabled and then enabling cephx and restarting userspace daemons can result in a crash: [262671.478162] BUG: unable to handle kernel paging request at ffffebe000000000 [262671.531460] IP: [<ffffffff811cd04a>] kfree+0x5a/0x130 [262671.584334] PGD 0 [262671.635847] Oops: 0000 [#1] SMP [262672.055841] CPU: 22 PID: 2961272 Comm: kworker/22:2 Not tainted 4.2.0-34-generic #39~14.04.1-Ubuntu [262672.162338] Hardware name: Dell Inc. PowerEdge R720/068CDY, BIOS 2.4.3 07/09/2014 [262672.268937] Workqueue: ceph-msgr con_work [libceph] [262672.322290] task: ffff88081c2d0dc0 ti: ffff880149ae8000 task.ti: ffff880149ae8000 [262672.428330] RIP: 0010:[<ffffffff811cd04a>] [<ffffffff811cd04a>] kfree+0x5a/0x130 [262672.535880] RSP: 0018:ffff880149aeba58 EFLAGS: 00010286 [262672.589486] RAX: 000001e000000000 RBX: 0000000000000012 RCX: ffff8807e7461018 [262672.695980] RDX: 000077ff80000000 RSI: ffff88081af2be04 RDI: 0000000000000012 [262672.803668] RBP: ffff880149aeba78 R08: 0000000000000000 R09: 0000000000000000 [262672.912299] R10: ffffebe000000000 R11: ffff880819a60e78 R12: ffff8800aec8df40 [262673.021769] R13: ffffffffc035f70f R14: ffff8807e5b138e0 R15: ffff880da9785840 [262673.131722] FS: 0000000000000000(0000) GS:ffff88081fac0000(0000) knlGS:0000000000000000 [262673.245377] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [262673.303281] CR2: ffffebe000000000 CR3: 0000000001c0d000 CR4: 00000000001406e0 [262673.417556] Stack: [262673.472943] ffff880149aeba88 ffff88081af2be04 ffff8800aec8df40 ffff88081af2be04 [262673.583767] ffff880149aeba98 ffffffffc035f70f ffff880149aebac8 ffff8800aec8df00 [262673.694546] ffff880149aebac8 ffffffffc035c89e ffff8807e5b138e0 ffff8805b047f800 [262673.805230] Call Trace: [262673.859116] [<ffffffffc035f70f>] ceph_x_destroy_authorizer+0x1f/0x50 [libceph] [262673.968705] [<ffffffffc035c89e>] ceph_auth_destroy_authorizer+0x3e/0x60 [libceph] [262674.078852] [<ffffffffc0352805>] put_osd+0x45/0x80 [libceph] [262674.134249] [<ffffffffc035290e>] remove_osd+0xae/0x140 [libceph] [262674.189124] [<ffffffffc0352aa3>] __reset_osd+0x103/0x150 [libceph] [262674.243749] [<ffffffffc0354703>] kick_requests+0x223/0x460 [libceph] [262674.297485] [<ffffffffc03559e2>] ceph_osdc_handle_map+0x282/0x5e0 [libceph] [262674.350813] [<ffffffffc035022e>] dispatch+0x4e/0x720 [libceph] [262674.403312] [<ffffffffc034bd91>] try_read+0x3d1/0x1090 [libceph] [262674.454712] [<ffffffff810ab7c2>] ? dequeue_entity+0x152/0x690 [262674.505096] [<ffffffffc034cb1b>] con_work+0xcb/0x1300 [libceph] [262674.555104] [<ffffffff8108fb3e>] process_one_work+0x14e/0x3d0 [262674.604072] [<ffffffff810901ea>] worker_thread+0x11a/0x470 [262674.652187] [<ffffffff810900d0>] ? rescuer_thread+0x310/0x310 [262674.699022] [<ffffffff810957a2>] kthread+0xd2/0xf0 [262674.744494] [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0 [262674.789543] [<ffffffff817bd81f>] ret_from_fork+0x3f/0x70 [262674.834094] [<ffffffff810956d0>] ? kthread_create_on_node+0x1c0/0x1c0 What happens is the following: (1) new MON session is established (2) old "none" ac is destroyed (3) new "cephx" ac is constructed ... (4) old OSD session (w/ "none" authorizer) is put ceph_auth_destroy_authorizer(ac, osd->o_auth.authorizer) osd->o_auth.authorizer in the "none" case is just a bare pointer into ac, which contains a single static copy for all services. By the time we get to (4), "none" ac, freed in (2), is long gone. On top of that, a new vtable installed in (3) points us at ceph_x_destroy_authorizer(), so we end up trying to destroy a "none" authorizer with a "cephx" destructor operating on invalid memory! To fix this, decouple authorizer destruction from ac and do away with a single static "none" authorizer by making a copy for each OSD or MDS session. Authorizers themselves are independent of ac and so there is no reason for destroy_authorizer() to be an ac op. Make it an op on the authorizer itself by turning ceph_authorizer into a real struct. Fixes: http://tracker.ceph.com/issues/15447Reported-by: NAlan Zhang <alan.zhang@linux.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NSage Weil <sage@redhat.com>
-
- 26 3月, 2016 4 次提交
-
-
由 Yan, Zheng 提交于
This helper duplicates last extent operation in OSD request, then adjusts the new extent operation's offset and length. The helper is for scatterd page writeback, which adds nonconsecutive dirty pages to single OSD request. Signed-off-by: NYan, Zheng <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Turn r_ops into a flexible array member to enable large, consisting of up to 16 ops, OSD requests. The use case is scattered writeback in cephfs and, as far as the kernel client is concerned, 16 is just a made up number. r_ops had size 3 for copyup+hint+write, but copyup is really a special case - it can only happen once. ceph_osd_request_cache is therefore stuffed with num_ops=2 requests, anything bigger than that is allocated with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which means either num_ops=1 or num_ops=2 for use_mempool=true - all existing users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with that. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
This avoids defining large array of r_reply_op_{len,result} in in struct ceph_osd_request. Signed-off-by: NYan, Zheng <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Follow userspace nomenclature on this - the next commit adds outdata_len. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 25 6月, 2015 1 次提交
-
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zyan@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 09 1月, 2015 1 次提交
-
-
由 Ilya Dryomov 提交于
The only real issue is the one in auth_x.c and it came with 3.19-rc1 merge. Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
-
- 18 12月, 2014 2 次提交
-
-
由 Yan, Zheng 提交于
allow specifying position of extent operation in multi-operations osd request. This is required for cephfs to convert inline data to normal data (compare xattr, then write object). Signed-off-by: NYan, Zheng <zyan@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@redhat.com>
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zyan@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@redhat.com>
-
- 08 7月, 2014 2 次提交
-
-
由 Ilya Dryomov 提交于
Remove now unused ceph_osdc_unregister_linger_request(). Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Ilya Dryomov 提交于
Introduce ceph_osdc_cancel_request() intended for canceling requests from the higher layers (rbd and cephfs). Because higher layers are in charge and are supposed to know what and when they are canceling, the request is not completed, only unref'ed and removed from the libceph data structures. __cancel_request() is no longer called before __unregister_request(), because __unregister_request() unconditionally revokes r_request and there is no point in trying to do it twice. Signed-off-by: NIlya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-