- 23 3月, 2022 1 次提交
-
-
由 NeilBrown 提交于
The bdi congestion tracking in not widely used and will be removed. CEPHfs is one of a small number of filesystems that uses it, setting just the async (write) congestion flags at what it determines are appropriate times. The only remaining effect of the async flag is to cause (some) WB_SYNC_NONE writes to be skipped. So instead of setting the flag, set an internal flag and change: - .writepages to do nothing if WB_SYNC_NONE and the flag is set - .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE and the flag is set. The writepages change causes a behavioural change in that pageout() can now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be called on the page which (I think) wil further delay the next attempt at writeout. This might be a good thing. Link: https://lkml.kernel.org/r/164549983739.9187.14895675781408171186.stgit@noble.brownSigned-off-by: NNeilBrown <neilb@suse.de> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Paolo Valente <paolo.valente@linaro.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 3月, 2022 1 次提交
-
-
由 Xiubo Li 提交于
There could be huge number of capsnaps around at any given time. On x86_64 the structure is 248 bytes, which will be rounded up to 256 bytes by kzalloc. Move this to a dedicated slabcache to save 8 bytes for each. [ jlayton: use kmem_cache_zalloc ] Signed-off-by: NXiubo Li <xiubli@redhat.com> Signed-off-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 13 1月, 2022 6 次提交
-
-
由 Jeff Layton 提交于
The uapi headers are missing the ceph definition. Move it there so userland apps can ID cephfs. Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
CephFS is a bit unlike most other filesystems in that it only conditionally does buffered I/O based on the caps that it gets from the MDS. In most cases, unless there is contended access for an inode the MDS does give Fbc caps to the client, so the unbuffered codepaths are only infrequently traveled and are difficult to test. At one time, the "-o sync" mount option would give you this behavior, but that was removed in commit 7ab9b380 ("ceph: Don't use ceph-sync-mode for synchronous-fs."). Add a new mount option to tell the client to ignore Fbc caps when doing I/O, and to use the synchronous codepaths exclusively, even on non-O_DIRECT file descriptors. We already have an ioctl that forces this behavior on a per-file basis, so we can just always set the CEPH_F_SYNC flag in the file description on such mounts. Additionally, this patch also changes the client to not request Fbc when doing direct I/O. We aren't using the cache with O_DIRECT so we don't have any need for those caps. Signed-off-by: NJeff Layton <jlayton@kernel.org> Acked-by: NGreg Farnum <gfarnum@redhat.com> Reviewed-by: NVenky Shankar <vshankar@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Venky Shankar 提交于
Add read-only module parameters for supported mount syntaxes. Primary user is the user-space mount helper for catching v2 syntax bugs during testing by cross verifying if the kernel supports v2 syntax on mount failure. Signed-off-by: NVenky Shankar <vshankar@redhat.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Venky Shankar 提交于
Note that the new monitors are just shown in /proc/mounts. Ceph does not (re)connect to new monitors yet. [ jlayton: s/printk\(KERN_NOTICE/pr_notice(/ s/strcmp/strcmp_null/ ] Signed-off-by: NVenky Shankar <vshankar@redhat.com> Signed-off-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Venky Shankar 提交于
Old mount device syntax (source) has the following problems: - mounts to the same cluster but with different fsnames and/or creds have identical device string which can confuse xfstests. - Userspace mount helper tool resolves monitor addresses and fill in mon addrs automatically, but that means the device shown in /proc/mounts is different than what was used for mounting. New device syntax is as follows: cephuser@fsid.mycephfs2=/path Note, there is no "monitor address" in the device string. That gets passed in as mount option. This keeps the device string same when monitor addresses change (on remounts). Also note that the userspace mount helper tool is backward compatible. I.e., the mount helper will fallback to using old syntax after trying to mount with the new syntax. [ idryomov: drop CEPH_MON_ADDR_MNTOPT_DELIM ] Signed-off-by: NVenky Shankar <vshankar@redhat.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Venky Shankar 提交于
... and remove hardcoded function name in ceph_parse_ips(). [ idryomov: delim parameter, drop CEPH_ADDR_PARSE_DEFAULT_DELIM ] Signed-off-by: NVenky Shankar <vshankar@redhat.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 12 1月, 2022 1 次提交
-
-
由 Jeff Layton 提交于
Now that the fscache API has been reworked and simplified, change ceph over to use it. With the old API, we would only instantiate a cookie when the file was open for reads. Change it to instantiate the cookie when the inode is instantiated and call use/unuse when the file is opened/closed. Also, ensure we resize the cached data on truncates, and invalidate the cache in response to the appropriate events. This will allow us to plumb in write support later. Signed-off-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NDavid Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20211129162907.149445-2-jlayton@kernel.org/ # v1 Link: https://lore.kernel.org/r/20211207134451.66296-2-jlayton@kernel.org/ # v2 Link: https://lore.kernel.org/r/163906984277.143852.14697110691303589000.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967188351.1823006.5065634844099079351.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021581427.640689.14128682147127509264.stgit@warthog.procyon.org.uk/ # v4
-
- 08 11月, 2021 3 次提交
-
-
由 Jeff Layton 提交于
ceph_statfs currently stuffs the cluster fsid into the f_fsid field. This was fine when we only had a single filesystem per cluster, but now that we have multiples we need to use something that will vary between them. Change ceph_statfs to xor each 32-bit chunk of the fsid (aka cluster id) into the lower bits of the statfs->f_fsid. Change the lower bits to hold the fscid (filesystem ID within the cluster). That should give us a value that is guaranteed to be unique between filesystems within a cluster, and should minimize the chance of collisions between mounts of different clusters. URL: https://tracker.ceph.com/issues/52812Reported-by: NSachin Prabhu <sprabhu@redhat.com> Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: NXiubo Li <xiubli@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
As Greg pointed out, if we get a mangled mdsmap or fsmap, then something has gone very wrong, and we should avoid doing any activity on the filesystem. When this occurs, shut down the mount the same way we would with a forced umount by calling ceph_umount_begin when decoding fails on either map. This causes most operations done against the filesystem to return an error. Any dirty data or caps in the cache will be dropped as well. The effect is not reversible, so the only remedy is to umount. [ idryomov: print fsmap decoding error ] URL: https://tracker.ceph.com/issues/52303Signed-off-by: NJeff Layton <jlayton@kernel.org> Acked-by: NGreg Farnum <gfarnum@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
Async dirops have been supported in mainline kernels for quite some time now, and we've recently (as of June) started doing regular testing in teuthology with '-o nowsync'. There were a few issues, but we've sorted those out now. Enable async dirops by default, and change /proc/mounts to show "wsync" when they are disabled rather than "nowsync" when they are enabled. Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 19 10月, 2021 1 次提交
-
-
由 Jeff Layton 提交于
Currently when mounting, we may end up finding an existing superblock that corresponds to a blocklisted MDS client. This means that the new mount ends up being unusable. If we've found an existing superblock with a client that is already blocklisted, and the client is not configured to recover on its own, fail the match. Ditto if the superblock has been forcibly unmounted. While we're in here, also rename "other" to the more conventional "fsc". Cc: stable@vger.kernel.org URL: https://bugzilla.redhat.com/show_bug.cgi?id=1901499Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: NXiubo Li <xiubli@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 15 12月, 2020 1 次提交
-
-
由 Jeff Layton 提交于
When recovering a session (a'la recover_session=clean), we want to do all of the operations that we do on a forced umount, but changing the mount state to SHUTDOWN is can cause queued MDS requests to fail when the session comes back. Most of those can idle until the session is recovered in this situation. Reserve SHUTDOWN state for forced umount, and make a new RECOVER state for the forced reconnect situation. Change several tests for equality with SHUTDOWN to test for that or RECOVER. Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: NXiubo Li <xiubli@redhat.com> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 12 10月, 2020 2 次提交
-
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
ceph open-codes this around some other activity and the rationale for it isn't clear. There is no need to delay free_anon_bdev until the end of kill_sb. Signed-off-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 19 9月, 2020 1 次提交
-
-
由 Al Viro 提交于
Get rid of boilerplate in most of ->statfs() instances... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 8月, 2020 1 次提交
-
-
由 Jeff Layton 提交于
When doing some testing recently, I hit some page allocation failures on mount, when creating the wb_pagevec_pool for the mount. That requires 128k (32 contiguous pages), and after thrashing the memory during an xfstests run, sometimes that would fail. 128k for each mount seems like a lot to hold in reserve for a rainy day, so let's change this to a global mempool that gets allocated when the module is plugged in. Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 03 8月, 2020 2 次提交
-
-
由 Randy Dunlap 提交于
Drop duplicated words "down" and "the" in fs/ceph/. [ idryomov: merge into a single patch ] Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Xiubo Li 提交于
This will send the caps/read/write/metadata metrics to any available MDS once per second, which will be the same as the userland client. It will skip the MDS sessions which don't support the metric collection, as the MDSs will close socket connections when they get an unknown type message. We can disable the metric sending via the disable_send_metrics module parameter. [ jlayton: fix up endianness bug in ceph_mdsc_send_metrics() ] URL: https://tracker.ceph.com/issues/43215Signed-off-by: NXiubo Li <xiubli@redhat.com> Signed-off-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 30 3月, 2020 2 次提交
-
-
由 Jeff Layton 提交于
The MDS is getting a new lock-caching facility that will allow it to cache the necessary locks to allow asynchronous directory operations. Since the CEPH_CAP_FILE_* caps are currently unused on directories, we can repurpose those bits for this purpose. When performing an unlink, if we have Fx on the parent directory, and CEPH_CAP_DIR_UNLINK (aka Fr), and we know that the dentry being removed is the primary link, then then we can fire off an unlink request immediately and don't need to wait on reply before returning. In that situation, just fix up the dcache and link count and return immediately after issuing the call to the MDS. This does mean that we need to hold an extra reference to the inode being unlinked, and extra references to the caps to avoid races. Those references are put and error handling is done in the r_callback routine. If the operation ends up failing, then set a writeback error on the directory inode, and the inode itself that can be fetched later by an fsync on the dir. The behavior of dir caps is slightly different from caps on normal files. Because these are just considered an optimization, if the session is reconnected, we will not automatically reclaim them. They are instead considered lost until we do another synchronous op in the parent directory. Async dirops are enabled via the "nowsync" mount option, which is patterned after the xfs "wsync" mount option. For now, the default is "wsync", but eventually we may flip that. Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Jeff Layton 提交于
On my machine (x86_64) this struct is 952 bytes, which gets rounded up to 1024 by kmalloc. Move this to a dedicated slabcache, so we can allocate them without the extra 72 bytes of overhead per. Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 12 2月, 2020 2 次提交
-
-
由 Xiubo Li 提交于
For the old mount API, the module parameters parseing function will be called in ceph_mount() and also just after the default posix acl flag set, so we can control to enable/disable it via the mount option. But for the new mount API, it will call the module parameters parseing function before ceph_get_tree(), so the posix acl will always be enabled. Fixes: 82995cc6 ("libceph, rbd, ceph: convert to use the new mount API") Signed-off-by: NXiubo Li <xiubli@redhat.com> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
syzbot reported that 4fbc0c71 ("ceph: remove the extra slashes in the server path") had caused a regression where an allocation could be done under a spinlock -- compare_mount_options() is called by sget_fc() with sb_lock held. We don't really need the supplied server path, so canonicalize it in place and compare it directly. To make this work, the leading slash is kept around and the logic in ceph_real_mount() to skip it is restored. CEPH_MSG_CLIENT_SESSION now reports the same (i.e. canonicalized) path, with the leading slash of course. Fixes: 4fbc0c71 ("ceph: remove the extra slashes in the server path") Reported-by: syzbot+98704a51af8e3d9425a9@syzkaller.appspotmail.com Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJeff Layton <jlayton@kernel.org>
-
- 08 2月, 2020 6 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Don't bother with "mixed" options that would allow both the form with and without argument (i.e. both -o foo and -o foo=bar). Rather than trying to shove both into a single fs_parameter_spec, allow having with-argument and no-argument specs with the same name and teach fs_parse to handle that. There are very few options of that sort, and they are actually easier to handle that way - callers end up with less postprocessing. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
The former contains nothing but a pointer to an array of the latter... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric Sandeen 提交于
Unused now. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Acked-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... turning it into struct p_log embedded into fs_context. Initialize the prefix with fs_type->name, turning fs_parse() into a trivial inline wrapper for __fs_parse(). This makes fs_parameter_description->name completely unused. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... and now errorf() et.al. are never called with NULL fs_context, so we can get rid of conditional in those. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 07 2月, 2020 2 次提交
-
-
由 Al Viro 提交于
no real difference now Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Don't do a single array; attach them to fsparam_enum() entry instead. And don't bother trying to embed the names into those - it actually loses memory, with no real speedup worth mentioning. Simplifies validation as well. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 27 1月, 2020 3 次提交
-
-
由 Luis Henriques 提交于
Instead of using the copy-from operation, switch copy_file_range to the new copy-from2 operation, which allows to send the truncate_seq and truncate_size parameters. If an OSD does not support the copy-from2 operation it will return -EOPNOTSUPP. In that case, the kernel client will stop trying to do remote object copies for this fs client and will always use the generic VFS copy_file_range. Signed-off-by: NLuis Henriques <lhenriques@suse.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Xiubo Li 提交于
It's possible to pass the mount helper a server path that has more than one contiguous slash character. For example: $ mount -t ceph 192.168.195.165:40176:/// /mnt/cephfs/ In the MDS server side the extra slashes of the server path will be treated as snap dir, and then we can get the following debug logs: ceph: mount opening path // ceph: open_root_inode opening '//' ceph: fill_trace 0000000059b8a3bc is_dentry 0 is_target 1 ceph: alloc_inode 00000000dc4ca00b ceph: get_inode created new inode 00000000dc4ca00b 1.ffffffffffffffff ino 1 ceph: get_inode on 1=1.ffffffffffffffff got 00000000dc4ca00b And then when creating any new file or directory under the mount point, we can hit the following BUG_ON in ceph_fill_trace(): BUG_ON(ceph_snap(dir) != dvino.snap); Have the client ignore the extra slashes in the server path when mounting. This will also canonicalize the path, so that identical mounts can be consilidated. 1) "//mydir1///mydir//" 2) "/mydir1/mydir" 3) "/mydir1/mydir/" Regardless of the internal treatment of these paths, the kernel still stores the original string including the leading '/' for presentation to userland. URL: https://tracker.ceph.com/issues/42771Signed-off-by: NXiubo Li <xiubli@redhat.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Xiubo Li 提交于
If all the MDS daemons are down for some reason, then the first mount attempt will fail with EIO after the mount request times out. A mount attempt will also fail with EIO if all of the MDS's are laggy. This patch changes the code to return -EHOSTUNREACH in these situations and adds a pr_info error message to help the admin determine the cause. URL: https://tracker.ceph.com/issues/4386Signed-off-by: NXiubo Li <xiubli@redhat.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 10 12月, 2019 1 次提交
-
-
由 Jeff Layton 提交于
Most of these values should never be negative, so convert them to unsigned values. Add some sanity checking to the parsed values, and clean up some unneeded casts. Note that while caps_max should never be negative, this patch leaves it signed, since this value ends up later being compared to a signed counter. Just ensure that userland never passes in a negative value for caps_max. Signed-off-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 28 11月, 2019 1 次提交
-
-
由 David Howells 提交于
Convert the ceph filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. [ Numerous string handling, leak and regression fixes; rbd conversion was particularly broken and had to be redone almost from scratch. ] Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 08 11月, 2019 1 次提交
-
-
由 Jeff Layton 提交于
If someone requests fscache on the mount, and the kernel doesn't support it, it should fail the mount. [ Drop ceph prefix -- it's provided by pr_err. ] Signed-off-by: NJeff Layton <jlayton@kernel.org> Reviewed-by: NIlya Dryomov <idryomov@gmail.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 16 9月, 2019 2 次提交
-
-
由 Jeff Layton 提交于
They're always called in succession. Signed-off-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
Make client use osd reply and session message to infer if itself is blacklisted. Client reconnect to cluster using new entity addr if it is blacklisted. Auto reconnect is limited to once every 30 minutes. Auto reconnect is disabled by default. It can be enabled/disabled by recover_session=<no|clean> mount option. In 'clean' mode, client drops any dirty data/metadata, invalidates page caches and invalidates all writable file handles. After reconnect, file locks become stale because MDS loses track of them. If an inode contains any stale file locks, read/write on the indoe are not allowed until applications release all stale file locks. Signed-off-by: N"Yan, Zheng" <zyan@redhat.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-