- 01 8月, 2017 1 次提交
-
-
由 Ilya Dryomov 提交于
Messages allocated out of ceph_msgpool have a fixed front length (pool->front_len). Asserting that the entire front has been filled while encoding is thus wrong. Fixes: 8cb441c0 ("libceph: MOSDOp v8 encoding (actual spgid + full hash)") Reported-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
-
- 17 7月, 2017 1 次提交
-
-
由 Ilya Dryomov 提交于
encode_request_finish() is for MOSDOp messages. Calling it on MOSDBackoff ack-block messages corrupts them. Fixes: a02a946d ("libceph: respect RADOS_BACKOFF backoffs") Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 07 7月, 2017 12 次提交
-
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
For luminous and beyond we are encoding the actual spgid, which requires operating with the correct pg_num, i.e. that of the target pool. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
need_check_tiering logic doesn't make a whole lot of sense. Drop it and apply tiering unconditionally on every calc_target() call instead. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Otherwise we may miss events like PG splits, pool deletions, etc when we get multiple incremental maps at once. Because check_pool_dne() can now be fed an unlinked request, finish_request() needed to be taught to handle unlinked requests. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
When processing a map update consisting of multiple incrementals, we may end up running check_linger_pool_dne() on a lingering request that was previously added to need_resend_linger list. If it is concluded that the target pool doesn't exist, the request is killed off while still on need_resend_linger list, which leads to a crash on a NULL lreq->osd in kick_requests(): libceph: linger_id 18446462598732840961 pool does not exist BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: ceph_osdc_handle_map+0x4ae/0x870 Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Note that ceph_osd_request_target fields are updated regardless of RESEND_ON_SPLIT. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Replace it with more fine-grained bools to separate updating ceph_osd_request_target fields and the decision to resend. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Factor out encode_{pgid,oloc}() and use ceph_encode_string() for oid. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Store both raw pgid and actual spgid in ceph_osd_request_target. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 09 5月, 2017 1 次提交
-
-
由 Deepa Dinamani 提交于
CURRENT_TIME is not y2038 safe. The macro will be deleted and all the references to it will be replaced by ktime_get_* apis. struct timespec is also not y2038 safe. Retain timespec for timestamp representation here as ceph uses it internally everywhere. These references will be changed to use struct timespec64 in a separate patch. The current_fs_time() api is being changed to use vfs struct inode* as an argument instead of struct super_block*. Set the new mds client request r_stamp field using ktime_get_real_ts() instead of using current_fs_time(). Also, since r_stamp is used as mtime on the server, use timespec_trunc() to truncate the timestamp, using the right granularity from the superblock. This api will be transitioned to be y2038 safe along with vfs. Link: http://lkml.kernel.org/r/1491613030-11599-5-git-send-email-deepa.kernel@gmail.comSigned-off-by: NDeepa Dinamani <deepa.kernel@gmail.com> Reviewed-by: NArnd Bergmann <arnd@arndb.de> M: Ilya Dryomov <idryomov@gmail.com> M: "Yan, Zheng" <zyan@redhat.com> M: Sage Weil <sage@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 5月, 2017 5 次提交
-
-
由 Jeff Layton 提交于
Cephfs can get cap update requests that contain a new epoch barrier in them. When that happens we want to pause all OSD traffic until the right map epoch arrives. Add an epoch_barrier field to ceph_osd_client that is protected by the osdc->lock rwsem. When the barrier is set, and the current OSD map epoch is below that, pause the request target when submitting the request or when revisiting it. Add a way for upper layers (cephfs) to update the epoch_barrier as well. If we get a new map, compare the new epoch against the barrier before kicking requests and request another map if the map epoch is still lower than the one we want. If we get a map with a full pool, or at quota condition, then set the barrier to the current epoch value. Signed-off-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
When a Ceph volume hits capacity, a flag is set in the OSD map to indicate that, and a new map is sprayed around the cluster. With cephfs we want it to shut down any abortable requests that are in progress with an -ENOSPC error as they'd just hang otherwise. Add a new ceph_osdc_abort_on_full helper function to handle this. It will first check whether there is an out-of-space condition in the cluster and then walk the tree and abort any request that has r_abort_on_full set with a -ENOSPC error. Call this new function directly whenever we get a new OSD map. Signed-off-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
Usually, when the osd map is flagged as full or the pool is at quota, write requests just hang. This is not what we want for cephfs, where it would be better to simply report -ENOSPC back to userland instead of stalling. If the caller knows that it will want an immediate error return instead of blocking on a full or at-quota error condition then allow it to set a flag to request that behavior. Set that flag in ceph_osdc_new_request (since ceph.ko is the only caller), and on any other write request from ceph.ko. A later patch will deal with requests that were submitted before the new map showing the full condition came in. Signed-off-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
Nothing uses this anymore with the removal of the ack vs. commit code. Remove the field and just encode zeroes into place in the request encoding. Signed-off-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Elena Reshetova 提交于
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: NElena Reshetova <elena.reshetova@intel.com> Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NDavid Windsor <dwindsor@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 07 3月, 2017 1 次提交
-
-
由 Ilya Dryomov 提交于
osd_request_timeout specifies how many seconds to wait for a response from OSDs before returning -ETIMEDOUT from an OSD request. 0 (default) means no limit. osd_request_timeout is osdkeepalive-precise -- in-flight requests are swept through every osdkeepalive seconds. With ack vs commit behaviour gone, abort_request() is really simple. This is based on a patch from Artur Molchanov <artur.molchanov@synesis.ru>. Tested-by: NArtur Molchanov <artur.molchanov@synesis.ru> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NSage Weil <sage@redhat.com>
-
- 25 2月, 2017 2 次提交
-
-
由 Ilya Dryomov 提交于
CEPH_OSD_FLAG_ONDISK is set in account_request(). Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: NSage Weil <sage@redhat.com>
-
由 Ilya Dryomov 提交于
- CEPH_OSD_FLAG_ACK shouldn't be set anymore, so assert on it - remove support for handling ack replies (OSDs will send ack replies only if clients request them) - drop the "do lingering callbacks under osd->lock" logic from handle_reply() -- lreq->lock is sufficient in all three cases Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: NSage Weil <sage@redhat.com>
-
- 20 2月, 2017 2 次提交
-
-
由 Ilya Dryomov 提交于
To spare checking for "this reply fits into a page, but does it fit into my buffer?" in some callers, osd_req_op_cls_response_data_pages() needs to know how big it is. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJason Dillaman <dillaman@redhat.com>
-
由 Yan, Zheng 提交于
add_to_page_cache_lru() can fails, so the actual pages to read can be smaller than the initial size of osd request. We need to update osd request size in that case. Signed-off-by: NYan, Zheng <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com>
-
- 14 1月, 2017 1 次提交
-
-
由 Peter Zijlstra 提交于
Since we need to change the implementation, stop exposing internals. Provide kref_read() to read the current reference count; typically used for debug messages. Kills two anti-patterns: atomic_read(&kref->refcount) kref->refcount.counter Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 12月, 2016 2 次提交
-
-
由 Ilya Dryomov 提交于
Kill the wrapper and rename __finish_request() to finish_request(). Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
r_safe_completion is currently, and has always been, signaled only if on-disk ack was requested. It's there for fsync and syncfs, which wait for in-flight writes to flush - all data write requests set ONDISK. However, the pool perm check code introduced in 4.2 sends a write request with only ACK set. An unfortunately timed syncfs can then hang forever: r_safe_completion won't be signaled because only an unsafe reply was requested. We could patch ceph_osdc_sync() to skip !ONDISK write requests, but that is somewhat incomplete and yet another special case. Instead, rename this completion to r_done_completion and always signal it when the OSD client is done with the request, whether unsafe, safe, or error. This is a bit cleaner and helps with the cancellation code. Reported-by: NYan, Zheng <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 13 12月, 2016 1 次提交
-
-
由 Ilya Dryomov 提交于
The length of the reply is protocol-dependent - for cephx it's ceph_x_authorize_reply. Nothing sensible can be passed from the messenger layer anyway. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NSage Weil <sage@redhat.com>
-
- 11 11月, 2016 1 次提交
-
-
由 Ilya Dryomov 提交于
osdc->last_linger_id is a counter for lreq->linger_id, which is used for watch cookies. Starting with a large integer should ease the task of telling apart kernel and userspace clients. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 25 8月, 2016 3 次提交
-
-
由 Ilya Dryomov 提交于
Revamp watch code to support retrying watch re-registration: - add rbd_dev->watch_state for more robust errcb handling - store watch cookie separately to avoid dereferencing watch_handle which is set to NULL on unwatch - move re-register code into a delayed work and retry re-registration every second, unless the client is blacklisted Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NMike Christie <mchristi@redhat.com> Tested-by: NMike Christie <mchristi@redhat.com>
-
由 Douglas Fuller 提交于
Add a convenience function to osd_client to send Ceph OSD 'class' ops. The interface assumes that the request and reply data each consist of single pages. Signed-off-by: NDouglas Fuller <dfuller@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Douglas Fuller 提交于
Add support for this Ceph OSD op, needed to support the RBD exclusive lock feature. Signed-off-by: NDouglas Fuller <dfuller@redhat.com> [idryomov@gmail.com: refactor, misc fixes throughout] Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
- 09 8月, 2016 1 次提交
-
-
由 Wei Yongjun 提交于
In case of error, the function ceph_alloc_page_vector() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Fixes: 19079203 ('libceph: support for sending notifies') Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 28 7月, 2016 3 次提交
-
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Add pool namesapce pointer to struct ceph_file_layout and struct ceph_object_locator. Pool namespace is used by when mapping object to PG, it's also used when composing OSD request. The namespace pointer in struct ceph_file_layout is RCU protected. So libceph can read namespace without taking lock. Signed-off-by: NYan, Zheng <zyan@redhat.com> [idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes] Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Define new ceph_file_layout structure and rename old ceph_file_layout to ceph_file_layout_legacy. This is preparation for adding namespace to ceph_file_layout structure. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
- 31 5月, 2016 3 次提交
-
-
由 Ilya Dryomov 提交于
Commit d30291b9 ("libceph: variable-sized ceph_object_id") changed dout()s in what is now encode_request() and ceph_object_locator_to_pg() to use %pE, mostly to document that, although all rbd and cephfs object names are NULL-terminated strings, ceph_object_id will handle any RADOS object name, including the one containing NULs, just fine. However, it turns out that vbin_printf() can't handle anything but ints and %s - all %p suffixes are ignored. The buffer %p** points to isn't recorded, resulting in trash in the messages if the buffer had been reused by the time bstr_printf() got to it. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
handle_reply() may be called twice on the same request: on ack and then on commit. This occurs on btrfs-formatted OSDs or if cephfs sync write path is triggered - CEPH_OSD_FLAG_ACK | CEPH_OSD_FLAG_ONDISK. handle_reply() handles this with the help of done_request(). Fixes: 5aea3dcd ("libceph: a major OSD client update") Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
For the benefit of every single caller, take osdc instead of map. Also, now that osdc->osdmap can't ever be NULL, drop the check. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-