- 28 11月, 2019 1 次提交
-
-
由 David Howells 提交于
Convert the ceph filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. [ Numerous string handling, leak and regression fixes; rbd conversion was particularly broken and had to be redone almost from scratch. ] Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 25 11月, 2019 10 次提交
-
-
由 Ilya Dryomov 提交于
For a read-only mapping, ask for a set of features that make the image only unwritable rather than both unreadable and unwritable by a client that doesn't understand them. As of today, the difference between them for krbd is journaling (JOURNALING) and live migration (MIGRATING). get_features method supports read_only parameter since hammer, ceph.git commit 6176ec5fde2a ("librbd: differentiate between R/O vs R/W RBD features"). Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJason Dillaman <dillaman@redhat.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
Since infernalis, ceph.git commit 281f87f9ee52 ("cls_rbd: get_features on snapshots returns HEAD image features"), querying and checking that is pointless. Userspace support for manipulating image features after image creation came also in infernalis, so a snapshot with a different set of features wasn't ever possible. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJason Dillaman <dillaman@redhat.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
RBD_DEV_FLAG_EXISTS check in rbd_queue_workfn() is racy and leads to inconsistent behaviour. If the object (or its snapshot) isn't there, the OSD returns ENOENT. A read submitted before the snapshot removal notification is processed would be zero-filled and ended with status OK, while future reads would be failed with IOERR. It also doesn't handle a case when an image that is mapped read-only is removed. On top of this, because watch is no longer established for read-only mappings, we no longer get notifications, so rbd_exists_validate() is effectively dead code. While failing requests rather than returning zeros is a good thing, RBD_DEV_FLAG_EXISTS is not it. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJason Dillaman <dillaman@redhat.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
With exclusive lock out of the way, watch is the only thing left that prevents a read-only mapping from being used with read-only OSD caps. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJason Dillaman <dillaman@redhat.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
A read-only mapping should be usable with read-only OSD caps, so neither the header lock nor the object map lock can be acquired. Unfortunately, this means that images mapped read-only lose the advantage of the object map. Snapshots, however, can take advantage of the object map without any exclusionary locks, so if the object map is desired, snapshot the image and map the snapshot instead of the image. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJason Dillaman <dillaman@redhat.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
If an image is mapped read-only, don't allow setting its partition(s) to read-write via BLKROSET: with the previous patch all writes to such images are failed anyway. If an image is mapped read-write, its partition(s) can be set to read-only (and back to read-write) as before. Note that at the rbd level the image will remain writeable: anything sent down by the block layer will be executed, including any write from internal kernel users. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJason Dillaman <dillaman@redhat.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
Even though -o ro/-o read_only/--read-only options are very old, we have never really treated them seriously (on par with snapshots). As a first step, fail writes to images mapped read-only just like we do for snapshots. We need this check in rbd because the block layer basically ignores read-only setting, see commit a32e236e ("Partially revert "block: fail op_is_write() requests to read-only partitions""). Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJason Dillaman <dillaman@redhat.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
rbd_dev->opts is not available for parent images, making checking rbd_dev->opts->read_only in various places (rbd_dev_image_probe(), need_exclusive_lock(), use_object_map() in the following patches) harder than it needs to be. Keeping rbd_dev_image_probe() in mind, move the initialization in do_rbd_add() up. snap_id isn't filled in at that point, so replace rbd_is_snap() with a snap_name comparison. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJason Dillaman <dillaman@redhat.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJason Dillaman <dillaman@redhat.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Colin Ian King 提交于
There is a spelling mistake in a debug message. Fix it. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 15 11月, 2019 1 次提交
-
-
由 Ilya Dryomov 提交于
Some versions of gcc (so far 6.3 and 7.4) throw a warning: drivers/block/rbd.c: In function 'rbd_object_map_callback': drivers/block/rbd.c:2124:21: warning: 'current_state' may be used uninitialized in this function [-Wmaybe-uninitialized] (current_state == OBJECT_EXISTS && state == OBJECT_EXISTS_CLEAN)) drivers/block/rbd.c:2092:23: note: 'current_state' was declared here u8 state, new_state, current_state; ^~~~~~~~~~~~~ It's bogus because all current_state accesses are guarded by has_current_state. Reported-by: Nkbuild test robot <lkp@intel.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
- 15 10月, 2019 1 次提交
-
-
由 Dongsheng Yang 提交于
There is a warning message in my test with below steps: # rbd bench --io-type write --io-size 4K --io-threads 1 --io-pattern rand test & # sleep 5 # pkill -9 rbd # rbd map test & # sleep 5 # pkill rbd The reason is that the rbd_add_acquire_lock() is interruptable, that means, when we kill the waiting on ->acquire_wait, the lock_dwork could be still running. 1. do_rbd_add() 2. lock_dwork rbd_add_acquire_lock() - queue_delayed_work() lock_dwork queued - wait_for_completion_killable_timeout() <-- kill happen rbd_dev_image_unlock() <-- UNLOCKED now, nothing to do. rbd_dev_device_release() rbd_dev_image_release() - ... lock successed here - cancel_delayed_work_sync(&rbd_dev->lock_dwork) Then when we reach the rbd_dev_free(), WARN_ON is triggered because lock_state is not RBD_LOCK_STATE_UNLOCKED. To fix it, this commit make sure the lock_dwork was finished before calling rbd_dev_image_unlock(). On the other hand, this would not happend in do_rbd_remove(), because after rbd mapped, lock_dwork will only be queued for IO request, and request will continue unless lock_dwork finished. when we call rbd_dev_image_unlock() in do_rbd_remove(), all requests are done. That means, lock_state should not be locked again after rbd_dev_image_unlock(). [ Cancel lock_dwork in rbd_add_acquire_lock(), only if the wait is interrupted. ] Fixes: 637cd060 ("rbd: new exclusive lock wait/wake code") Signed-off-by: NDongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 16 9月, 2019 2 次提交
-
-
由 Ilya Dryomov 提交于
Make it more informative: log op_type, offset and length for block layer requests and initiating obj_req for child requests. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Dongsheng Yang 提交于
rbd_dev_image_id() allocates space for length but passes a smaller value to rbd_obj_method_sync(). rbd_dev_v2_object_prefix() doesn't allocate space for length. Fix both to be consistent. Signed-off-by: NDongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 28 8月, 2019 1 次提交
-
-
由 Ilya Dryomov 提交于
The parent image is read only up to the overlap point, the rest of the buffer should be zeroed. This snuck in because as it turns out the overlap test case has not been triggering this code path for a while now. Fixes: a9b67e69 ("rbd: replace obj_req->tried_parent with obj_req->read_state") Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJason Dillaman <dillaman@redhat.com>
-
- 08 7月, 2019 18 次提交
-
-
由 Ilya Dryomov 提交于
setallochint is really only useful on object creation. Continue hinting unconditionally if object map cannot be used. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
Speed up reads, discards and zeroouts through RBD_OBJ_FLAG_MAY_EXIST and RBD_OBJ_FLAG_NOOP_FOR_NONEXISTENT based on object map. Invalid object maps are not trusted, but still updated. Note that we never iterate, resize or invalidate object maps. If object-map feature is enabled but object map fails to load, we just fail the requester (either "rbd map" or I/O, by way of post-acquire action). Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Snapshot object map will be loaded in rbd_dev_image_probe(), so we need to know snapshot's size (as opposed to HEAD's size) sooner. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
This will be used for loading object map. rbd_obj_read_sync() isn't suitable because object map must be accessed through class methods. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn> Reviewed-by: NJeff Layton <jlayton@kernel.org>
-
由 Ilya Dryomov 提交于
rbd_wait_state_locked() is built around rbd_dev->lock_waitq and blocks rbd worker threads while waiting for the lock, potentially impacting other rbd devices. There is no good way to pass an error code into image request state machines when acquisition fails, hence the use of RBD_DEV_FLAG_BLACKLISTED for everything and various other issues. Introduce rbd_dev->acquiring_list and move acquisition into image request state machine. Use rbd_img_schedule() for kicking and passing error codes. No blocking occurs while waiting for the lock, but rbd_dev->lock_rwsem is still held across lock, unlock and set_cookie calls. Always acquire the lock on "rbd map" to avoid associating the latency of acquiring the lock with the first I/O request. A slight regression is that lock_timeout is now respected only if lock acquisition is triggered by "rbd map" and not by I/O. This is somewhat compensated by the fact that we no longer block if the peer refuses to release lock -- I/O is failed with EROFS right away. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
Syncing OSD requests doesn't really work. A single image request may be comprised of multiple object requests, each of which can go through a series of OSD requests (original, copyups, etc). On top of that, the OSD cliest may be shared with other rbd devices. What we want is to ensure that all in-flight image requests complete. Introduce rbd_dev->running_list and block in RBD_LOCK_STATE_RELEASING until that happens. New OSD requests may be started during this time. Note that __rbd_img_handle_request() acquires rbd_dev->lock_rwsem only if need_exclusive_lock() returns true. This avoids a deadlock similar to the one outlined in the previous commit between unlock and I/O that doesn't require lock, such as a read with object-map feature disabled. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
Quiesce exclusive lock at the top of rbd_reacquire_lock() instead of only when ceph_cls_set_cookie() fails. This avoids a deadlock on rbd_dev->lock_rwsem. If rbd_dev->lock_rwsem is needed for I/O completion, set_cookie can hang ceph-msgr worker thread if set_cookie reply ends up behind an I/O reply, because, like lock and unlock requests, set_cookie is sent and waited upon with rbd_dev->lock_rwsem held for write. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
Both write and copyup paths will get more complex with object map. Factor copyup code out into a separate state machine. While at it, take advantage of obj_req->osd_reqs list and issue empty and current snapc OSD requests together, one after another. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
These functions don't allocate and set up OSD requests anymore. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
Following submission, move initial OSD request allocation into object request state machines. Everything that has to do with OSD requests is now handled inside the state machine, all __rbd_img_fill_request() has left is initialization. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
With obj_req->xferred removed, obj_req->ex.oe_off and obj_req->ex.oe_len can be updated if required for alignment. Previously the new offset and length weren't stored anywhere beyond rbd_obj_setup_discard(). Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
Since the dawn of time it had been assumed that a single object request spawns a single OSD request. This is already impacting copyup: instead of sending empty and current snapc copyups together, we wait for empty snapc OSD request to complete in order to reassign obj_req->osd_req with current snapc OSD request. Looking further, updating potentially hundreds of snapshot object maps serially is a non-starter. Replace obj_req->osd_req pointer with obj_req->osd_reqs list. Use osd_req->r_private_item for linkage. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
Make it possible to schedule image requests on a workqueue. This fixes parent chain recursion added in the previous commit and lays the ground for exclusive lock wait/wake improvements. The "wait for pending subrequests and report first nonzero result" code is generalized to be used by object request state machine. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
Start eliminating asymmetry where the initial OSD request is allocated and submitted from outside the state machine, making error handling and restarts harder than they could be. This commit deals with submission, a commit that deals with allocation will follow. Note that this commit adds parent chain recursion on the submission side: rbd_img_request_submit rbd_obj_handle_request __rbd_obj_handle_request rbd_obj_handle_read rbd_obj_handle_write_guard rbd_obj_read_from_parent rbd_img_request_submit This will be fixed in the next commit. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
In preparation for moving OSD request allocation and submission into object request state machines, get rid of RBD_OBJ_WRITE_{FLAT,GUARD}. We would need to start in a new state, whether the request is guarded or not. Unify them into RBD_OBJ_WRITE_OBJECT and pass guard info through obj_req->flags. While at it, make our ENOENT handling a little more precise: only hide ENOENT when it is actually expected, that is on delete. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
Make rbd_obj_handle_read() look like a state machine and get rid of the necessity to patch result in rbd_obj_handle_request(), completing the removal of obj_req->xferred and img_req->xferred. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
由 Ilya Dryomov 提交于
obj_req->xferred and img_req->xferred don't bring any value. The former is used for short reads and has to be set to obj_req->ex.oe_len after that and elsewhere. The latter is just an aggregate. Use result for short reads (>=0 - number of bytes read, <0 - error) and pass it around explicitly. No need to store it in obj_req. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NDongsheng Yang <dongsheng.yang@easystack.cn>
-
- 08 5月, 2019 4 次提交
-
-
由 Ilya Dryomov 提交于
The check added in commit 721c7fc7 ("block: fail op_is_write() requests to read-only partitions") was lifted in commit a32e236e ("Partially revert "block: fail op_is_write() requests to read-only partitions""). Basic things like user triggered writes and discards are still caught, but internal kernel users can submit anything. In particular, ext4 will attempt to write to the superblock if it detects errors in the filesystem, even if the filesystem is mounted read-only on a read-only partition. The assert is overkill regardless. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Arnd Bergmann 提交于
rbd_assert(0) has caused different issues depending on the compiler version in the past, so it seems better to avoid it completely. Replace the remaining instances. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Arnd Bergmann 提交于
clang fails to see that rbd_assert(0) ends in an unreachable code path and warns about a subsequent use of an uninitialized variable when CONFIG_PROFILE_ANNOTATED_BRANCHES is set: drivers/block/rbd.c:2402:4: error: variable 'ret' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] rbd_assert(0); ^~~~~~~~~~~~~ drivers/block/rbd.c:563:7: note: expanded from macro 'rbd_assert' if (unlikely(!(expr))) { \ ^~~~~~~~~~~~~~~~~ include/linux/compiler.h:48:23: note: expanded from macro 'unlikely' # define unlikely(x) (__branch_check__(x, 0, __builtin_constant_p(x))) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/block/rbd.c:2410:6: note: uninitialized use occurs here if (ret) { ^~~ drivers/block/rbd.c:2402:4: note: remove the 'if' if its condition is always true rbd_assert(0); ^ drivers/block/rbd.c:563:3: note: expanded from macro 'rbd_assert' if (unlikely(!(expr))) { \ ^ drivers/block/rbd.c:2376:9: note: initialize the variable 'ret' to silence this warning int ret; ^ = 0 1 error generated. This seems to be a bug in clang, but is easy to work around by using an unconditional BUG(). Signed-off-by: NArnd Bergmann <arnd@arndb.de> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 20 3月, 2019 1 次提交
-
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJason Dillaman <dillaman@redhat.com>
-
- 19 3月, 2019 1 次提交
-
-
由 Ilya Dryomov 提交于
Now that we have alloc_size that controls our discard behavior, it doesn't make sense to have these set to object (set) size. alloc_size defaults to 64k, but because discard_granularity is likely 4M, only ranges that are equal to or bigger than 4M can be considered during fstrim. A smaller io_min is also more likely to be met, resulting in fewer deferred writes on bluestore OSDs. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJason Dillaman <dillaman@redhat.com>
-