- 16 9月, 2019 5 次提交
-
-
由 Erqi Chen 提交于
If client mds session is evicted in CEPH_MDS_SESSION_OPENING state, mds won't send session msg to client, and delayed_work skip CEPH_MDS_SESSION_OPENING state session, the session hang forever. Allow ceph_con_keepalive to reconnect a session in OPENING to avoid session hang. Also, ensure that we skip sessions in RESTARTING and REJECTED states since those states can't be resurrected by issuing a keepalive. Link: https://tracker.ceph.com/issues/41551 Signed-off-by: Erqi Chen chenerqi@gmail.com Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
It's only used to keep count of caps being trimmed, but that requires that we hold the session->s_mutex to prevent multiple trimming operations from running concurrently. We can achieve the same effect using an integer on the stack, which allows us to (eventually) not need the s_mutex. Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Make client use osd reply and session message to infer if itself is blacklisted. Client reconnect to cluster using new entity addr if it is blacklisted. Auto reconnect is limited to once every 30 minutes. Auto reconnect is disabled by default. It can be enabled/disabled by recover_session=<no|clean> mount option. In 'clean' mode, client drops any dirty data/metadata, invalidates page caches and invalidates all writable file handles. After reconnect, file locks become stale because MDS loses track of them. If an inode contains any stale file locks, read/write on the indoe are not allowed until applications release all stale file locks. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
It closes mds sessions, drop all caps and invalidates page caches, then use new entity address to reconnect to the cluster. After reconnect, all dirty data/metadata are dropped, file locks get lost sliently. Open files continue to work because client will try renewing caps on later read/write. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Use errseq_t to track and report errors of async metadata operations, similar to how kernel handles errors during writeback. If any dirty caps or any unsafe request gets dropped during session eviction, record -EIO in corresponding inode's i_meta_err. The error will be reported by subsequent fsync, Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 08 7月, 2019 7 次提交
-
-
由 Jeff Layton 提交于
Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Link: https://tracker.ceph.com/issues/40339Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
handle_cap_export() may add placeholder caps to session that is in opening state. These caps' session pointer become wild after session get unregistered. The fix is not to unregister session in opening state during mds failovers, just let client to reconnect later when mds is recovered. Link: https://tracker.ceph.com/issues/40190Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 David Disseldorp 提交于
MDS InodeStat v3 wire structures include a trailing snapshot creation time member. Unmarshall this and retain it for a future vxattr. Signed-off-by: NDavid Disseldorp <ddiss@suse.de> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 28 6月, 2019 1 次提交
-
-
由 Jeff Layton 提交于
When ceph_mdsc_build_path is handed a positive dentry, it will return a zero-length path string with the base set to that dentry. This is not what we want. Always include at least one path component in the string. ceph_mdsc_build_path has behaved this way for a long time but it didn't matter until recent d_name handling rework. Fixes: 964fff74 ("ceph: use ceph_mdsc_build_path instead of clone_dentry_name") Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 06 6月, 2019 1 次提交
-
-
由 Yan, Zheng 提交于
iput_final() may wait for reahahead pages. The wait can cause deadlock. For example: Workqueue: ceph-msgr ceph_con_workfn [libceph] Call Trace: schedule+0x36/0x80 io_schedule+0x16/0x40 __lock_page+0x101/0x140 truncate_inode_pages_range+0x556/0x9f0 truncate_inode_pages_final+0x4d/0x60 evict+0x182/0x1a0 iput+0x1d2/0x220 iterate_session_caps+0x82/0x230 [ceph] dispatch+0x678/0xa80 [ceph] ceph_con_workfn+0x95b/0x1560 [libceph] process_one_work+0x14d/0x410 worker_thread+0x4b/0x460 kthread+0x105/0x140 ret_from_fork+0x22/0x40 Workqueue: ceph-msgr ceph_con_workfn [libceph] Call Trace: __schedule+0x3d6/0x8b0 schedule+0x36/0x80 schedule_preempt_disabled+0xe/0x10 mutex_lock+0x2f/0x40 ceph_check_caps+0x505/0xa80 [ceph] ceph_put_wrbuffer_cap_refs+0x1e5/0x2c0 [ceph] writepages_finish+0x2d3/0x410 [ceph] __complete_request+0x26/0x60 [libceph] handle_reply+0x6c8/0xa10 [libceph] dispatch+0x29a/0xbb0 [libceph] ceph_con_workfn+0x95b/0x1560 [libceph] process_one_work+0x14d/0x410 worker_thread+0x4b/0x460 kthread+0x105/0x140 ret_from_fork+0x22/0x40 In above example, truncate_inode_pages_range() waits for readahead pages while holding s_mutex. ceph_check_caps() waits for s_mutex and blocks OSD dispatch thread. Later OSD replies (for readahead) can't be handled. ceph_check_caps() also may lock snap_rwsem for read. So similar deadlock can happen if iput_final() is called while holding snap_rwsem. In general, it's not good to call iput_final() inside MDS/OSD dispatch threads or while holding any mutex. The fix is introducing ceph_async_iput(), which calls iput_final() in workqueue. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 08 5月, 2019 11 次提交
-
-
由 Jeff Layton 提交于
Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
I originally thought there was a potential race here, but the fact that this is called with the mdsc->mutex held, ensures that the last reference to the session can't be put here. Still, it's clearer to just return the value from get_session here, and may prevent a bug later if we ever rework this code to be less reliant on mutexes. Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
Nothing calls ceph_mdsc_submit_request today, but in later patches we'll need to be able to call this separately. Have the helper return an int so we can check the r_err under the mutex, and have the caller just check the error code from the submit. Also move the acquisition of CEPH_CAP_PIN references into the same function. Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
No MDS requests use r_callback today, but that will change in the future. The OSD client always does r_callback and then completes r_completion. Let's have the MDS client do the same. Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
We make copies of the dentry name in set_request_path_attr, but then create_request_message re-fetches the lengths out of the dentry. While we don't currently set the *_drop fields unless the parents are locked, it's still better not to rely on that sort of implicit assumption. Use the pathlen values that set_request_path_attr returned instead, as they will always be correct for the returned paths themselves. Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
Al suggested we get rid of the kmalloc here and just use __getname and __putname to get a full PATH_MAX pathname buffer. Since we build the path in reverse, we continue to return a pointer to the beginning of the string and the length, and add a new helper to free the thing at the end. Suggested-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
While it may be slightly more efficient, it's probably not worthwhile to optimize for the case that clone_dentry_name handles. We can get the same result by just calling ceph_mdsc_build_path when the parent isn't locked, with less code duplication. Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
temp is not defined outside of the RCU critical section here. Ensure we grab that value before we drop the rcu_read_lock. Reported-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Luis Henriques 提交于
The CephFS kernel client does not enforce quotas set in a directory that isn't visible from the mount point. For example, given the path '/dir1/dir2', if quotas are set in 'dir1' and the filesystem is mounted with mount -t ceph <server>:<port>:/dir1/ /mnt then the client won't be able to access 'dir1' inode, even if 'dir2' belongs to a quota realm that points to it. This patch fixes this issue by simply doing an MDS LOOKUPINO operation for unknown inodes. Any inode reference obtained this way will be added to a list in ceph_mds_client, and will only be released when the filesystem is umounted. Link: https://tracker.ceph.com/issues/38482Reported-by: NHendrik Peyerl <hpeyerl@plusline.net> Signed-off-by: NLuis Henriques <lhenriques@suse.com> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 24 4月, 2019 2 次提交
-
-
由 Yan, Zheng 提交于
We missed two places that i_wrbuffer_ref_head, i_wr_ref, i_dirty_caps and i_flushing_caps may change. When they are all zeros, we should free i_head_snapc. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/38224Reported-and-tested-by: NLuis Henriques <lhenriques@suse.com> Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
Ben reported tripping the BUG_ON in create_request_message during some performance testing. Analysis of the vmcore showed that the length of the r_dentry->d_name string changed after we allocated the buffer, but before we encoded it. build_dentry_path returns pointers to d_name in the common case of non-snapped dentries, but this optimization isn't safe unless the parent directory is locked. When it isn't, have the code make a copy of the d_name while holding the d_lock. Cc: stable@vger.kernel.org Reported-by: NBen England <bengland@redhat.com> Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 06 3月, 2019 9 次提交
-
-
由 Yan, Zheng 提交于
If number of caps exceed the limit, ceph_trim_dentires() also trim dentries with valid leases. Trimming dentry releases references to associated inode, which may evict inode and release caps. By default, there is no limit for caps count. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Previous commit make VFS delete stale dentry when last reference is dropped. Lease also can become invalid when corresponding dentry has no reference. This patch make cephfs periodically scan lease list, delete corresponding dentry if lease is invalid. There are two types of lease, dentry lease and dir lease. dentry lease has life time and applies to singe dentry. Dentry lease is added to tail of a list when it's updated, leases at front of the list will expire first. Dir lease is CEPH_CAP_FILE_SHARED on directory inode, it applies to all dentries in the directory. Dentries have dir leases are added to another list. Dentries in the list are periodically checked in a round robin manner. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
introduce ceph_d_delete(), which checks if dentry has valid lease. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
When pending cap releases fill up one message, start a work to send cap release message. (old way is sending cap releases every 5 seconds) Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Link: http://tracker.ceph.com/issues/37576Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
In versioned reply, inodestat, dirstat and lease are encoded with version, compat_version and struct_len. Based on a patch from Jos Collin <jcollin@redhat.com>. Link: http://tracker.ceph.com/issues/26936Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
ceph_getattr() return zero dev ID for head inodes and set dev ID to snapid directly for snaphost inodes. This is not good because userspace utilities may consider device ID of 0 as invalid, snapid may conflict with other device's ID. This patch introduces "snapids to anonymous bdev IDs" map. we create a new mapping when we see a snapid for the first time. we trim unused mapping after it is ilde for 5 minutes. Link: http://tracker.ceph.com/issues/22353Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Acked-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 26 12月, 2018 2 次提交
-
-
由 Yan, Zheng 提交于
mds hasn't used inode pathes since introducing inode backtrace. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
mds contains an optimization, it does not re-issue stale caps if client does not want any cap. A special case of the optimization is that client wants some caps, but skipped updating 'wanted'. For this case, client needs to update 'wanted' when stale session get renewed. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 09 11月, 2018 1 次提交
-
-
由 Ilya Dryomov 提交于
No one is running pre-argonaut. In addition one of the argonaut features (NOSRCADDR) has been required since day one (and a half, 2.6.34 vs 2.6.35) of the kernel client. Allow for the possibility of reusing these feature bits later. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NSage Weil <sage@redhat.com>
-
- 22 10月, 2018 1 次提交
-
-
由 Ilya Dryomov 提交于
Currently message data items are allocated with ceph_msg_data_create() in setup_request_data() inside send_request(). send_request() has never been allowed to fail, so each allocation is followed by a BUG_ON: data = ceph_msg_data_create(...); BUG_ON(!data); It's been this way since support for multiple message data items was added in commit 6644ed7b ("libceph: make message data be a pointer") in 3.10. There is no reason to delay the allocation of message data items until the last possible moment and we certainly don't need a linked list of them as they are only ever appended to the end and never erased. Make ceph_msg_new2() take max_data_items and adapt the rest of the code. Reported-by: NJerry Lee <leisurelysw24@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-