- 06 9月, 2021 2 次提交
-
-
由 Miklos Szeredi 提交于
The struct fuse_conn argument is not used and can be removed. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
In case of fuse the MM subsystem doesn't guarantee that page writeback completes by the time ->sync_fs() is called. This is because fuse completes page writeback immediately to prevent DoS of memory reclaim by the userspace file server. This means that fuse itself must ensure that writes are synced before sending the SYNCFS request to the server. Introduce sync buckets, that hold a counter for the number of outstanding write requests. On syncfs replace the current bucket with a new one and wait until the old bucket's counter goes down to zero. It is possible to have multiple syncfs calls in parallel, in which case there could be more than one waited-on buckets. Descendant buckets must not complete until the parent completes. Add a count to the child (new) bucket until the (parent) old bucket completes. Use RCU protection to dereference the current bucket and to wake up an emptied bucket. Use fc->lock to protect against parallel assignments to the current bucket. This leaves just the counter to be a possible scalability issue. The fc->num_waiting counter has a similar issue, so both should be addressed at the same time. Reported-by: NAmir Goldstein <amir73il@gmail.com> Fixes: 2d82ab25 ("virtiofs: propagate sync() to file server") Cc: <stable@vger.kernel.org> # v5.14 Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 31 8月, 2021 1 次提交
-
-
由 Miklos Szeredi 提交于
Callers of fuse_writeback_range() assume that the file is ready for modification by the server in the supplied byte range after the call returns. If there's a write that extends the file beyond the end of the supplied range, then the file needs to be extended to at least the end of the range, but currently that's not done. There are at least two cases where this can cause problems: - copy_file_range() will return short count if the file is not extended up to end of the source range. - FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE will not extend the file, hence the region may not be fully allocated. Fix by flushing writes from the start of the range up to the end of the file. This could be optimized if the writes are non-extending, etc, but it's probably not worth the trouble. Fixes: a2bc9236 ("fuse: fix copy_file_range() in the writeback case") Fixes: 6b1bdb56 ("fuse: allow fallocate(FALLOC_FL_ZERO_RANGE)") Cc: <stable@vger.kernel.org> # v5.2 Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 19 8月, 2021 1 次提交
-
-
由 Miklos Szeredi 提交于
Add a rcu argument to the ->get_acl() callback to allow get_cached_acl_rcu() to call the ->get_acl() method in the next patch. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 18 8月, 2021 1 次提交
-
-
由 Miklos Szeredi 提交于
fuse_finish_open() will be called with FUSE_NOWRITE in case of atomic O_TRUNC. This can deadlock with fuse_wait_on_page_writeback() in fuse_launder_page() triggered by invalidate_inode_pages2(). Fix by replacing invalidate_inode_pages2() in fuse_finish_open() with a truncate_pagecache() call. This makes sense regardless of FOPEN_KEEP_CACHE or fc->writeback cache, so do it unconditionally. Reported-by: NXie Yongji <xieyongji@bytedance.com> Reported-and-tested-by: syzbot+bea44a5189836d956894@syzkaller.appspotmail.com Fixes: e4648309 ("fuse: truncate pending writes on O_TRUNC") Cc: <stable@vger.kernel.org> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 05 8月, 2021 2 次提交
-
-
由 Miklos Szeredi 提交于
Make it possible to create a new mount from a already working server. Here's a detailed description of the problem from Jakob: "The background for this question is occasional problems we see with our fuse filesystem [1] and mount namespaces. On a usual client, we have system-wide, autofs managed mountpoints. When a new mount namespace is created (which can be done unprivileged in combination with user namespaces), it can happen that a mountpoint is used inside the new namespace but idle in the root mount namespace. So autofs unmounts the parent, system-wide mountpoint. But the fuse module stays active and still serves mountpoint in the child mount namespace. Because the fuse daemon also blocks other system wide resources corresponding to the mountpoint, this situation effectively prevents new mounts until the child mount namespaces closes. [1] https://github.com/cvmfs/cvmfs" Reported-by: NJakob Blomer <jblomer@cern.ch> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Affected call chains: fuse_get_tree -> get_tree_(bdev|nodev) -> fuse_fill_super Needed for following patch. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 04 8月, 2021 3 次提交
-
-
由 Miklos Szeredi 提交于
Checking whether the "fd=", "rootmode=", "user_id=" and "group_id=" mount options are present can be moved from fuse_get_tree() into fuse_fill_super() where the value of the options are consumed. This relaxes semantics of reusing a fuse blockdev mount using the device name. Before this patch presence of these options were enforced but values ignored, after this patch these options are completely ignored in this case. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Naming convention under fs/fuse/: struct fuse_conn *fc; struct fs_context *fsc; Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
There is a potential race between fuse_read_interrupt() and fuse_request_end(). TASK1 in fuse_read_interrupt(): delete req->intr_entry (while holding fiq->lock) TASK2 in fuse_request_end(): req->intr_entry is empty -> skip fiq->lock wake up TASK3 TASK3 request is freed TASK1 in fuse_read_interrupt(): dereference req->in.h.unique ***BAM*** Fix by always grabbing fiq->lock if the request was ever interrupted (FR_INTERRUPTED set) thereby serializing with concurrent fuse_read_interrupt() calls. FR_INTERRUPTED is set before the request is queued on fiq->interrupts. Dequeing the request is done with list_del_init() but FR_INTERRUPTED is not cleared in this case. Reported-by: Nlijiazi <lijiazi@xiaomi.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 13 7月, 2021 1 次提交
-
-
由 Jan Kara 提交于
Use invalidate_lock instead of fuse's private i_mmap_sem. The intended purpose is exactly the same. By this conversion we fix a long standing race between hole punching and read(2) / readahead(2) paths that can lead to stale page cache contents. CC: Miklos Szeredi <miklos@szeredi.hu> Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 08 7月, 2021 1 次提交
-
-
由 Ira Weiny 提交于
fuse_dax_mem_range_init() does not need the address or the pfn of the memory requested in dax_direct_access(). It is only calling direct access to get the number of pages. Remove the unused variables and stop requesting the kaddr and pfn from dax_direct_access(). Reviewed-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NVivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20210525172428.3634316-2-ira.weiny@intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 30 6月, 2021 2 次提交
-
-
由 Matthew Wilcox (Oracle) 提交于
These functions implement the address_space ->set_page_dirty operation and should live in pagemap.h, not mm.h so that the rest of the kernel doesn't get funny ideas about calling them directly. Link: https://lkml.kernel.org/r/20210615162342.1669332-7-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Use __set_page_dirty_no_writeback() instead. This will set the dirty bit on the page, which will be used to avoid calling set_page_dirty() in the future. It will have no effect on actually writing the page back, as the pages are not on any LRU lists. [akpm@linux-foundation.org: export __set_page_dirty_no_writeback() to modules] Link: https://lkml.kernel.org/r/20210615162342.1669332-6-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 6月, 2021 11 次提交
-
-
由 Zheng Yongjun 提交于
Fix some spelling mistakes in comments: refernce ==> reference happnes ==> happens threhold ==> threshold splitted ==> split mached ==> matched Signed-off-by: NZheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Wu Bo 提交于
Replace open coded divisor calculations with the DIV_ROUND_UP kernel macro for better readability. Signed-off-by: NWu Bo <wubo40@huawei.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Amir Goldstein 提交于
Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...) with ourarg containing nodeid and generation. If a fuse inode is found in inode cache with the same nodeid but different generation, the existing fuse inode should be unhashed and marked "bad" and a new inode with the new generation should be hashed instead. This can happen, for example, with passhrough fuse filesystem that returns the real filesystem ino/generation on lookup and where real inode numbers can get recycled due to real files being unlinked not via the fuse passthrough filesystem. With current code, this situation will not be detected and an old fuse dentry that used to point to an older generation real inode, can be used to access a completely new inode, which should be accessed only via the new dentry. Note that because the FORGET message carries the nodeid w/o generation, the server should wait to get FORGET counts for the nlookup counts of the old and reused inodes combined, before it can free the resources associated to that nodeid. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Richard W.M. Jones 提交于
The current fuse module filters out fallocate(FALLOC_FL_ZERO_RANGE) returning -EOPNOTSUPP. libnbd's nbdfuse would like to translate FALLOC_FL_ZERO_RANGE requests into the NBD command NBD_CMD_WRITE_ZEROES which allows NBD servers that support it to do zeroing efficiently. This commit treats this flag exactly like FALLOC_FL_PUNCH_HOLE. A way to test this, requiring fuse >= 3, nbdkit >= 1.8 and the latest nbdfuse from https://gitlab.com/nbdkit/libnbd/-/tree/master/fuse is to create a file containing some data and "mirror" it to a fuse file: $ dd if=/dev/urandom of=disk.img bs=1M count=1 $ nbdkit file disk.img $ touch mirror.img $ nbdfuse mirror.img nbd://localhost & (mirror.img -> nbdfuse -> NBD over loopback -> nbdkit -> disk.img) You can then run commands such as: $ fallocate -z -o 1024 -l 1024 mirror.img and check that the content of the original file ("disk.img") stays synchronized. To show NBD commands, export LIBNBD_DEBUG=1 before running nbdfuse. To clean up: $ fusermount3 -u mirror.img $ killall nbdkit Signed-off-by: NRichard W.M. Jones <rjones@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Greg Kurz 提交于
This function used to be called from fuse_dentry_automount(). This code was moved to fuse_get_tree_submount() in the same file since then. It is unlikely there will ever be another user. No need to be extern in this case. Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NMax Reitz <mreitz@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Greg Kurz 提交于
fc_mount() already handles the vfs_get_tree(), sb->s_umount unlocking and vfs_create_mount() sequence. Using it greatly simplifies fuse_dentry_automount(). Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NMax Reitz <mreitz@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Greg Kurz 提交于
We recently fixed an infinite loop by setting the SB_BORN flag on submounts along with the write barrier needed by super_cache_count(). This is the job of vfs_get_tree() and FUSE shouldn't have to care about the barrier at all. Split out some code from fuse_dentry_automount() to the dedicated fuse_get_tree_submount() handler for submounts and call vfs_get_tree(). Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NMax Reitz <mreitz@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Greg Kurz 提交于
The creation of a submount is open-coded in fuse_dentry_automount(). This brings a lot of complexity and we recently had to fix bugs because we weren't setting SB_BORN or because we were unlocking sb->s_umount before sb was fully configured. Most of these could have been avoided by using the mount API instead of open-coding. Basically, this means coming up with a proper ->get_tree() implementation for submounts and call vfs_get_tree(), or better fc_mount(). The creation of the superblock for submounts is quite different from the root mount. Especially, it doesn't require to allocate a FUSE filesystem context, nor to parse parameters. Introduce a dedicated context ops for submounts to make this clear. This is just a placeholder for now, fuse_get_tree_submount() will be populated in a subsequent patch. Only visible change is that we stop allocating/freeing a useless FUSE filesystem context with submounts. Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NMax Reitz <mreitz@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Greg Kurz 提交于
Even if POSIX doesn't mandate it, linux users legitimately expect sync() to flush all data and metadata to physical storage when it is located on the same system. This isn't happening with virtiofs though: sync() inside the guest returns right away even though data still needs to be flushed from the host page cache. This is easily demonstrated by doing the following in the guest: $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync 5120+0 records in 5120+0 records out 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s sync() = 0 <0.024068> and start the following in the host when the 'dd' command completes in the guest: $ strace -T -e fsync /usr/bin/sync virtiofs/foo fsync(3) = 0 <10.371640> There are no good reasons not to honor the expected behavior of sync() actually: it gives an unrealistic impression that virtiofs is super fast and that data has safely landed on HW, which isn't the case obviously. Implement a ->sync_fs() superblock operation that sends a new FUSE_SYNCFS request type for this purpose. Provision a 64-bit placeholder for possible future extensions. Since the file server cannot handle the wait == 0 case, we skip it to avoid a gratuitous roundtrip. Note that this is per-superblock: a FUSE_SYNCFS is send for the root mount and for each submount. Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for FUSE_SYNCFS in the file server is treated as permanent success. This ensures compatibility with older file servers: the client will get the current behavior of sync() not being propagated to the file server. Note that such an operation allows the file server to DoS sync(). Since a typical FUSE file server is an untrusted piece of software running in userspace, this is disabled by default. Only enable it with virtiofs for now since virtiofsd is supposedly trusted by the guest kernel. Reported-by: NRobert Krawitz <rlk@redhat.com> Signed-off-by: NGreg Kurz <groug@kaod.org> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
Don't allow userspace to report errors that could be kernel-internal. Reported-by: NAnatoly Trosinenko <anatoly.trosinenko@gmail.com> Fixes: 334f485d ("[PATCH] FUSE - device functions") Cc: <stable@vger.kernel.org> # v2.6.14 Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
A request could end up on the fpq->io list after fuse_abort_conn() has reset fpq->connected and aborted requests on that list: Thread-1 Thread-2 ======== ======== ->fuse_simple_request() ->shutdown ->__fuse_request_send() ->queue_request() ->fuse_abort_conn() ->fuse_dev_do_read() ->acquire(fpq->lock) ->wait_for(fpq->lock) ->set err to all req's in fpq->io ->release(fpq->lock) ->acquire(fpq->lock) ->add req to fpq->io After the userspace copy is done the request will be ended, but req->out.h.error will remain uninitialized. Also the copy might block despite being already aborted. Fix both issues by not allowing the request to be queued on the fpq->io list after fuse_abort_conn() has processed this list. Reported-by: NPradeep P V K <pragalla@codeaurora.org> Fixes: fd22d62e ("fuse: no fc->lock for iqueue parts") Cc: <stable@vger.kernel.org> # v4.2 Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 19 6月, 2021 1 次提交
-
-
由 Miklos Szeredi 提交于
Fix the "fuse: trying to steal weird page" warning. Description from Johannes Weiner: "Think of it as similar to PG_active. It's just another usage/heat indicator of file and anon pages on the reclaim LRU that, unlike PG_active, persists across deactivation and even reclaim (we store it in the page cache / swapper cache tree until the page refaults). So if fuse accepts pages that can legally have PG_active set, PG_workingset is fine too." Reported-by: NThomas Lindroth <thomas.lindroth@gmail.com> Fixes: 1899ad18 ("mm: workingset: tell cache transitions from workingset thrashing") Cc: <stable@vger.kernel.org> # v4.20 Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 10 6月, 2021 1 次提交
-
-
由 Al Viro 提交于
Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the callers do *not* need to do iov_iter_advance() after it. In case when they end up consuming less than they'd been given they need to do iov_iter_revert() on everything they had not consumed. That, however, needs to be done only on slow paths. All in-tree callers converted. And that kills the last user of iterate_all_kinds() Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 09 6月, 2021 3 次提交
-
-
由 Greg Kurz 提交于
We don't set the SB_BORN flag on submounts. This is wrong as these superblocks are then considered as partially constructed or dying in the rest of the code and can break some assumptions. One such case is when you have a virtiofs filesystem with submounts and you try to mount it again : virtio_fs_get_tree() tries to obtain a superblock with sget_fc(). The logic in sget_fc() is to loop until it has either found an existing matching superblock with SB_BORN set or to create a brand new one. It is assumed that a superblock without SB_BORN is transient and the loop is restarted. Forgetting to set SB_BORN on submounts hence causes sget_fc() to retry forever. Setting SB_BORN requires special care, i.e. a write barrier for super_cache_count() which can check SB_BORN without taking any lock. We should call vfs_get_tree() to deal with that but this requires to have a proper ->get_tree() implementation for submounts, which is a bigger piece of work. Go for a simple bug fix in the meatime. Fixes: bf109c64 ("fuse: implement crossmounts") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NMax Reitz <mreitz@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Greg Kurz 提交于
As soon as fuse_dentry_automount() does up_write(&sb->s_umount), the superblock can theoretically be killed. If this happens before the submount was added to the &fc->mounts list, fuse_mount_remove() later crashes in list_del_init() because it assumes the submount to be already there. Add the submount before dropping sb->s_umount to fix the inconsistency. It is okay to nest fc->killsb under sb->s_umount, we already do this on the ->kill_sb() path. Signed-off-by: NGreg Kurz <groug@kaod.org> Fixes: bf109c64 ("fuse: implement crossmounts") Cc: stable@vger.kernel.org # v5.10+ Reviewed-by: NMax Reitz <mreitz@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Greg Kurz 提交于
If fuse_fill_super_submount() returns an error, the error path triggers a crash: [ 26.206673] BUG: kernel NULL pointer dereference, address: 0000000000000000 [...] [ 26.226362] RIP: 0010:__list_del_entry_valid+0x25/0x90 [...] [ 26.247938] Call Trace: [ 26.248300] fuse_mount_remove+0x2c/0x70 [fuse] [ 26.248892] virtio_kill_sb+0x22/0x160 [virtiofs] [ 26.249487] deactivate_locked_super+0x36/0xa0 [ 26.250077] fuse_dentry_automount+0x178/0x1a0 [fuse] The crash happens because fuse_mount_remove() assumes that the FUSE mount was already added to list under the FUSE connection, but this only done after fuse_fill_super_submount() has returned success. This means that until fuse_fill_super_submount() has returned success, the FUSE mount isn't actually owned by the superblock. We should thus reclaim ownership by clearing sb->s_fs_info, which will skip the call to fuse_mount_remove(), and perform rollback, like virtio_fs_get_tree() already does for the root sb. Fixes: bf109c64 ("fuse: implement crossmounts") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NMax Reitz <mreitz@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 03 6月, 2021 1 次提交
-
-
由 Al Viro 提交于
another rudiment of fault-in originally having been limited to the first segment, same as in generic_perform_write() and friends. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 4月, 2021 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 14 4月, 2021 8 次提交
-
-
由 Miklos Szeredi 提交于
Put extra reference early in cuse_channel_open(). Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
For cloned connections cuse_channel_release() will be called more than once, resulting in use after free. Prevent device cloning for CUSE, which does not make sense at this point, and highly unlikely to be used in real life. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Miklos Szeredi 提交于
get_user_ns() is done twice (once in virtio_fs_get_tree() and once in fuse_conn_init()), resulting in a reference leak. Also looks better to use fsc->user_ns (which *should* be the current_user_ns() at this point). Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Jiapeng Chong 提交于
Fix the following clang warning: fs/fuse/virtio_fs.c:130:35: warning: unused function 'vq_to_fpq' [-Wunused-function]. Reported-by: NAbaci Robot <abaci@linux.alibaba.com> Signed-off-by: NJiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Connor Kuehl 提交于
If an incoming FUSE request can't fit on the virtqueue, the request is placed onto a workqueue so a worker can try to resubmit it later where there will (hopefully) be space for it next time. This is fine for requests that aren't larger than a virtqueue's maximum capacity. However, if a request's size exceeds the maximum capacity of the virtqueue (even if the virtqueue is empty), it will be doomed to a life of being placed on the workqueue, removed, discovered it won't fit, and placed on the workqueue yet again. Furthermore, from section 2.6.5.3.1 (Driver Requirements: Indirect Descriptors) of the virtio spec: "A driver MUST NOT create a descriptor chain longer than the Queue Size of the device." To fix this, limit the number of pages FUSE will use for an overall request. This way, each request can realistically fit on the virtqueue when it is decomposed into a scattergather list and avoid violating section 2.6.5.3.1 of the virtio spec. Signed-off-by: NConnor Kuehl <ckuehl@redhat.com> Reviewed-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Luis Henriques 提交于
When accidentally passing twice the same tag to qemu, kmemleak ended up reporting a memory leak in virtiofs. Also, looking at the log I saw the following error (that's when I realised the duplicated tag): virtiofs: probe of virtio5 failed with error -17 Here's the kmemleak log for reference: unreferenced object 0xffff888103d47800 (size 1024): comm "systemd-udevd", pid 118, jiffies 4294893780 (age 18.340s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N.......... ff ff ff ff ff ff ff ff 80 90 02 a0 ff ff ff ff ................ backtrace: [<000000000ebb87c1>] virtio_fs_probe+0x171/0x7ae [virtiofs] [<00000000f8aca419>] virtio_dev_probe+0x15f/0x210 [<000000004d6baf3c>] really_probe+0xea/0x430 [<00000000a6ceeac8>] device_driver_attach+0xa8/0xb0 [<00000000196f47a7>] __driver_attach+0x98/0x140 [<000000000b20601d>] bus_for_each_dev+0x7b/0xc0 [<00000000399c7b7f>] bus_add_driver+0x11b/0x1f0 [<0000000032b09ba7>] driver_register+0x8f/0xe0 [<00000000cdd55998>] 0xffffffffa002c013 [<000000000ea196a2>] do_one_initcall+0x64/0x2e0 [<0000000008f727ce>] do_init_module+0x5c/0x260 [<000000003cdedab6>] __do_sys_finit_module+0xb5/0x120 [<00000000ad2f48c6>] do_syscall_64+0x33/0x40 [<00000000809526b5>] entry_SYSCALL_64_after_hwframe+0x44/0xae Cc: stable@vger.kernel.org Signed-off-by: NLuis Henriques <lhenriques@suse.de> Fixes: a62a8ef9 ("virtio-fs: add virtiofs filesystem") Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Vivek Goyal 提交于
In fuse when a direct/write-through write happens we invalidate attrs because that might have updated mtime/ctime on server and cached mtime/ctime will be stale. What about page writeback path. Looks like we don't invalidate attrs there. To be consistent, invalidate attrs in writeback path as well. Only exception is when writeback_cache is enabled. In that case we strust local mtime/ctime and there is no need to invalidate attrs. Recently users started experiencing failure of xfstests generic/080, geneirc/215 and generic/614 on virtiofs. This happened only newer "stat" utility and not older one. This patch fixes the issue. So what's the root cause of the issue. Here is detailed explanation. generic/080 test does mmap write to a file, closes the file and then checks if mtime has been updated or not. When file is closed, it leads to flushing of dirty pages (and that should update mtime/ctime on server). But we did not explicitly invalidate attrs after writeback finished. Still generic/080 passed so far and reason being that we invalidated atime in fuse_readpages_end(). This is called in fuse_readahead() path and always seems to trigger before mmaped write. So after mmaped write when lstat() is called, it sees that atleast one of the fields being asked for is invalid (atime) and that results in generating GETATTR to server and mtime/ctime also get updated and test passes. But newer /usr/bin/stat seems to have moved to using statx() syscall now (instead of using lstat()). And statx() allows it to query only ctime or mtime (and not rest of the basic stat fields). That means when querying for mtime, fuse_update_get_attr() sees that mtime is not invalid (only atime is invalid). So it does not generate a new GETATTR and fill stat with cached mtime/ctime. And that means updated mtime is not seen by xfstest and tests start failing. Invalidating attrs after writeback completion should solve this problem in a generic manner. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
由 Vivek Goyal 提交于
When posix access ACL is set, it can have an effect on file mode and it can also need to clear SGID if. - None of caller's group/supplementary groups match file owner group. AND - Caller is not priviliged (No CAP_FSETID). As of now fuser server is responsible for changing the file mode as well. But it does not know whether to clear SGID or not. So add a flag FUSE_SETXATTR_ACL_KILL_SGID and send this info with SETXATTR to let file server know that sgid needs to be cleared as well. Reported-by: NLuis Henriques <lhenriques@suse.de> Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-