- 10 9月, 2019 12 次提交
-
-
由 Miklos Szeredi 提交于
Create a helper named fuse_simple_background() that is similar to fuse_simple_request(). Unlike the latter, it returns immediately and calls the supplied 'end' callback when the reply is received. The supplied 'args' pointer is stored in 'fuse_req' which allows the callback to interpret the output arguments decoded from the reply. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
fuse_req_pages_alloc() is moved to file.c, since its internal use by the device code will eventually be removed. Rename to fuse_pages_alloc() to signify that it's not only usable for fuse_req page array. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Derive fuse_args_pages from fuse_args. This is used to handle requests which use pages for input or output. The related flags are added to fuse_args. New FR_ALLOC_PAGES flags is added to indicate whether the page arrays in fuse_req need to be freed by fuse_put_request() or not. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
We can use the "force" flag to make sure the DESTROY request is always sent to userspace. So no need to keep it allocated during the lifetime of the filesystem. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
In some cases it makes no sense to set pid/uid/gid fields in the request header. Allow fuse_simple_background() to omit these. This is only required in the "force" case, so for now just WARN if set otherwise. Fold fuse_get_req_nofail_nopages() into its only caller. Comment is obsolete anyway. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Move this function to the readdir.c where its only caller resides. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
This will be used by fuse_force_forget(). We can expand fuse_request_send() into fuse_simple_request(). The FR_WAITING bit has already been set, no need to check. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Add 'force' to fuse_args and use fuse_get_req_nofail_nopages() to allocate the request in that case. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Instead of complex games with a reserved request, just use __GFP_NOFAIL. Both calers (flush, readdir) guarantee that connection was already initialized, so no need to wait for fc->initialized. Also remove unneeded clearing of FR_BACKGROUND flag. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
This makes the structure better packed. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
...to make future expansion simpler. The hiearachical structure is a historical thing that does not serve any practical purpose. The generated code is excatly the same before and after the patch. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Eric Biggers 提交于
When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock. This may have to wait for fuse_iqueue::waitq.lock to be released by one of many places that take it with IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep reports that a deadlock is possible. Fix it by protecting the state of struct fuse_iqueue with a separate spinlock, and only accessing fuse_iqueue::waitq using the versions of the waitqueue functions which do IRQ-safe locking internally. Reproducer: #include <fcntl.h> #include <stdio.h> #include <sys/mount.h> #include <sys/stat.h> #include <sys/syscall.h> #include <unistd.h> #include <linux/aio_abi.h> int main() { char opts[128]; int fd = open("/dev/fuse", O_RDWR); aio_context_t ctx = 0; struct iocb cb = { .aio_lio_opcode = IOCB_CMD_POLL, .aio_fildes = fd }; struct iocb *cbp = &cb; sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd); mkdir("mnt", 0700); mount("foo", "mnt", "fuse", 0, opts); syscall(__NR_io_setup, 1, &ctx); syscall(__NR_io_submit, ctx, 1, &cbp); } Beginning of lockdep output: ===================================================== WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected 5.3.0-rc5 #9 Not tainted ----------------------------------------------------- syz_fuse/135 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: 000000003590ceda (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline] 000000003590ceda (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1751 [inline] 000000003590ceda (&fiq->waitq){+.+.}, at: __io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825 and this task is already holding: 0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:363 [inline] 0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1749 [inline] 0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one.constprop.0+0x1f4/0x5b0 fs/aio.c:1825 which would create a new lock dependency: (&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.} but this new dependency connects a SOFTIRQ-irq-safe lock: (&(&ctx->ctx_lock)->rlock){..-.} [...] Reported-by: syzbot+af05535bb79520f95431@syzkaller.appspotmail.com Reported-by: syzbot+d86c4426a01f60feddc7@syzkaller.appspotmail.com Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL") Cc: <stable@vger.kernel.org> # v4.19+ Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 24 4月, 2019 2 次提交
-
-
由 Kirill Smelkov 提交于
On networked filesystems file data can be changed externally. FUSE provides notification messages for filesystem to inform kernel that metadata or data region of a file needs to be invalidated in local page cache. That provides the basis for filesystem implementations to invalidate kernel cache explicitly based on observed filesystem-specific events. FUSE has also "automatic" invalidation mode(*) when the kernel automatically invalidates data cache of a file if it sees mtime change. It also automatically invalidates whole data cache of a file if it sees file size being changed. The automatic mode has corresponding capability - FUSE_AUTO_INVAL_DATA. However, due to probably historical reason, that capability controls only whether mtime change should be resulting in automatic invalidation or not. A change in file size always results in invalidating whole data cache of a file irregardless of whether FUSE_AUTO_INVAL_DATA was negotiated(+). The filesystem I write[1] represents data arrays stored in networked database as local files suitable for mmap. It is read-only filesystem - changes to data are committed externally via database interfaces and the filesystem only glues data into contiguous file streams suitable for mmap and traditional array processing. The files are big - starting from hundreds gigabytes and more. The files change regularly, and frequently by data being appended to their end. The size of files thus changes frequently. If a file was accessed locally and some part of its data got into page cache, we want that data to stay cached unless there is memory pressure, or unless corresponding part of the file was actually changed. However current FUSE behaviour - when it sees file size change - is to invalidate the whole file. The data cache of the file is thus completely lost even on small size change, and despite that the filesystem server is careful to accurately translate database changes into FUSE invalidation messages to kernel. Let's fix it: if a filesystem, through new FUSE_EXPLICIT_INVAL_DATA capability, indicates to kernel that it is fully responsible for data cache invalidation, then the kernel won't invalidate files data cache on size change and only truncate that cache to new size in case the size decreased. (*) see 72d0d248 "fuse: add FUSE_AUTO_INVAL_DATA init flag", eed2179e "fuse: invalidate inode mapping if mtime changes" (+) in writeback mode the kernel does not invalidate data cache on file size change, but neither it allows the filesystem to set the size due to external event (see 8373200b "fuse: Trust kernel i_size only") [1] https://lab.nexedi.com/kirr/wendelin.core/blob/a50f1d9f/wcfs/wcfs.go#L20Signed-off-by: NKirill Smelkov <kirr@nexedi.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Kirill Smelkov 提交于
Functions, like pr_err, are a more modern variant of printing compared to printk. They could be used to denoise sources by using needed level in the print function name, and by automatically inserting per-driver / function / ... print prefix as defined by pr_fmt macro. pr_* are also said to be used in Documentation/process/coding-style.rst and more recent code - for example overlayfs - uses them instead of printk. Convert CUSE and FUSE to use the new pr_* functions. CUSE output stays completely unchanged, while FUSE output is amended a bit for "trying to steal weird page" warning - the second line now comes also with "fuse:" prefix. I hope it is ok. Suggested-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NKirill Smelkov <kirr@nexedi.com> Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 13 2月, 2019 7 次提交
-
-
由 Chad Austin 提交于
Allow filesystems to return ENOSYS from opendir, preventing the kernel from sending opendir and releasedir messages in the future. This avoids userspace transitions when filesystems don't need to keep track of state per directory handle. A new capability flag, FUSE_NO_OPENDIR_SUPPORT, parallels FUSE_NO_OPEN_SUPPORT, indicating the new semantics for returning ENOSYS from opendir. Signed-off-by: NChad Austin <chadaustin@fb.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
...to get rid of one more fc->lock use. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
The only caller that needs fc->aborted set is fuse_conn_abort_write(). Setting fc->aborted is now racy (fuse_abort_conn() may already be in progress or finished) but there's no reason to care. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Kirill Tkhai 提交于
This is rather natural action after previous patches, and it just decreases load of fc->lock. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Kirill Tkhai 提交于
To minimize contention of fc->lock, this patch introduces a new spinlock for protection fuse_inode metadata: fuse_inode: writectr writepages write_files queued_writes attr_version inode: i_size i_nlink i_mtime i_ctime Also, it protects the fields changed in fuse_change_attributes_common() (too many to list). Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Kirill Tkhai 提交于
This patch makes fc->attr_version of atomic64_t type, so fc->lock won't be needed to read or modify it anymore. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Kirill Tkhai 提交于
Here is preparation for next patches, which introduce new fi->lock for protection of ff->write_entry linked into fi->write_files. This patch just passes new argument to the function. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 12 12月, 2018 1 次提交
-
-
由 Chad Austin 提交于
When FUSE_OPEN returns ENOSYS, the no_open bit is set on the connection. Because the FUSE_RELEASE and FUSE_RELEASEDIR paths share code, this incorrectly caused the FUSE_RELEASEDIR request to be dropped and never sent to userspace. Pass an isdir bool to distinguish between FUSE_RELEASE and FUSE_RELEASEDIR inside of fuse_file_put. Fixes: 7678ac50 ("fuse: support clients that don't implement 'open'") Cc: <stable@vger.kernel.org> # v3.14 Signed-off-by: NChad Austin <chadaustin@fb.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 03 12月, 2018 1 次提交
-
-
由 Miklos Szeredi 提交于
Commit ab2257e9 ("fuse: reduce size of struct fuse_inode") moved parts of fields related to writeback on regular file and to directory caching into a union. However fuse_fsync_common() called from fuse_dir_fsync() touches some writeback related fields, resulting in a crash. Move writeback related parts from fuse_fsync_common() to fuse_fysnc(). Reported-by: NBrett Girton <btgirton@gmail.com> Tested-by: NBrett Girton <btgirton@gmail.com> Fixes: ab2257e9 ("fuse: reduce size of struct fuse_inode") Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 15 10月, 2018 2 次提交
-
-
由 Dan Schatzberg 提交于
FUSE file reads are cached in the page cache, but symlink reads are not. This patch enables FUSE READLINK operations to be cached which can improve performance of some FUSE workloads. In particular, I'm working on a FUSE filesystem for access to source code and discovered that about a 10% improvement to build times is achieved with this patch (there are a lot of symlinks in the source tree). Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
This patch adds the infrastructure for more fine grained attribute invalidation. Currently only 'atime' is invalidated separately. The use of this infrastructure is extended to the statx(2) interface, which for now means that if only 'atime' is invalid and STATX_ATIME is not specified in the mask argument, then no GETATTR request will be generated. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 01 10月, 2018 8 次提交
-
-
由 Miklos Szeredi 提交于
Writeback caching currently allocates requests with the maximum number of possible pages, while the actual number of pages per request depends on a couple of factors that cannot be determined when the request is allocated (whether page is already under writeback, whether page is contiguous with previous pages already added to a request). This patch allows such requests to start with no page allocation (all pages inline) and grow the page array on demand. If the max_pages tunable remains the default value, then this will mean just one allocation that is the same size as before. If the tunable is larger, then this adds at most 3 additional memory allocations (which is generously compensated by the improved performance from the larger request). Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Constantine Shulyupin 提交于
Replace FUSE_MAX_PAGES_PER_REQ with the configurable parameter max_pages to improve performance. Old RFC with detailed description of the problem and many fixes by Mitsuo Hayasaka (mitsuo.hayasaka.hu@hitachi.com): - https://lkml.org/lkml/2012/7/5/136 We've encountered performance degradation and fixed it on a big and complex virtual environment. Environment to reproduce degradation and improvement: 1. Add lag to user mode FUSE Add nanosleep(&(struct timespec){ 0, 1000 }, NULL); to xmp_write_buf in passthrough_fh.c 2. patch UM fuse with configurable max_pages parameter. The patch will be provided latter. 3. run test script and perform test on tmpfs fuse_test() { cd /tmp mkdir -p fusemnt passthrough_fh -o max_pages=$1 /tmp/fusemnt grep fuse /proc/self/mounts dd conv=fdatasync oflag=dsync if=/dev/zero of=fusemnt/tmp/tmp \ count=1K bs=1M 2>&1 | grep -v records rm fusemnt/tmp/tmp killall passthrough_fh } Test results: passthrough_fh /tmp/fusemnt fuse.passthrough_fh \ rw,nosuid,nodev,relatime,user_id=0,group_id=0 0 0 1073741824 bytes (1.1 GB) copied, 1.73867 s, 618 MB/s passthrough_fh /tmp/fusemnt fuse.passthrough_fh \ rw,nosuid,nodev,relatime,user_id=0,group_id=0,max_pages=256 0 0 1073741824 bytes (1.1 GB) copied, 1.15643 s, 928 MB/s Obviously with bigger lag the difference between 'before' and 'after' will be more significant. Mitsuo Hayasaka, in 2012 (https://lkml.org/lkml/2012/7/5/136), observed improvement from 400-550 to 520-740. Signed-off-by: NConstantine Shulyupin <const@MakeLinux.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Do this by grouping fields used for cached writes and putting them into a union with fileds used for cached readdir (with obviously no overlap, since we don't have hybrid objects). Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Use the internal iversion counter to make sure modifications of the directory through this filesystem are not missed by the mtime check (due to mtime granularity). Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Store the modification time of the directory in the cache, obtained before starting to fill the cache. When reading the cache, verify that the directory hasn't changed, by checking if current modification time is the same as the one stored in the cache. This only needs to be done when the current file position is at the beginning of the directory, as mandated by POSIX. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Allow the cache to be invalidated when page(s) have gone missing. In this case increment the version of the cache and reset to an empty state. Add a version number to the directory stream in struct fuse_file as well, indicating the version of the cache it's supposed to be reading. If the cache version doesn't match the stream's version, then reset the stream to the beginning of the cache. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
The cache is only used if it's completed, not while it's still being filled; this constraint could be lifted later, if it turns out to be useful. Introduce state in struct fuse_file that indicates the position within the cache. After a seek, reset the position to the beginning of the cache and search the cache for the current position. If the current position is not found in the cache, then fall back to uncached readdir. It can also happen that page(s) disappear from the cache, in which case we must also fall back to uncached readdir. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
This patch just adds the cache filling functions, which are invoked if FOPEN_CACHE_DIR flag is set in the OPENDIR reply. Cache reading and cache invalidation are added by subsequent patches. The directory cache uses the page cache. Directory entries are packed into a page in the same format as in the READDIR reply. A page only contains whole entries, the space at the end of the page is cleared. The page is locked while being modified. Multiple parallel readdirs on the same directory can fill the cache; the only constraint is that continuity must be maintained (d_off of last entry points to position of current entry). Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 28 9月, 2018 6 次提交
-
-
由 Miklos Szeredi 提交于
Directory reading code is about to grow larger, so split it out from dir.c into a new source file. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Kirill Tkhai 提交于
We noticed the performance bottleneck in FUSE running our Virtuozzo storage over rdma. On some types of workload we observe 20% of times spent in request_find() in profiler. This function is iterating over long requests list, and it scales bad. The patch introduces hash table to reduce the number of iterations, we do in this function. Hash generating algorithm is taken from hash_add() function, while 256 lines table is used to store pending requests. This fixes problem and improves the performance. Reported-by: NAlexey Kuznetsov <kuznet@virtuozzo.com> Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Kirill Tkhai 提交于
This field is not needed after the previous patch, since we can easily convert request ID to interrupt request ID and vice versa. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Kirill Tkhai 提交于
Currently, we take fc->lock there only to check for fc->connected. But this flag is changed only on connection abort, which is very rare operation. So allow checking fc->connected under just fc->bg_lock and use this lock (as well as fc->lock) when resetting fc->connected. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Kirill Tkhai 提交于
To reduce contention of fc->lock, this patch introduces bg_lock for protection of fields related to background queue. These are: max_background, congestion_threshold, num_background, active_background, bg_queue and blocked. This allows next patch to make async reads not requiring fc->lock, so async reads and writes will have better performance executed in parallel. Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Niels de Vos 提交于
There are several FUSE filesystems that can implement server-side copy or other efficient copy/duplication/clone methods. The copy_file_range() syscall is the standard interface that users have access to while not depending on external libraries that bypass FUSE. Signed-off-by: NNiels de Vos <ndevos@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 26 7月, 2018 1 次提交
-
-
由 Miklos Szeredi 提交于
If parallel dirops are enabled in FUSE_INIT reply, then first operation may leave fi->mutex held. Reported-by: Nsyzbot <syzbot+3f7b29af1baa9d0a55be@syzkaller.appspotmail.com> Fixes: 5c672ab3 ("fuse: serialize dirops by default") Cc: <stable@vger.kernel.org> # v4.7 Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-