提交 · 75b96f0ec5faf730128c32187e3e28441c27a094 · openeuler / Kernel

06 9月, 2021 2 次提交

fuse: remove unused arg in fuse_write_file_get() · a9667ac8

由 Miklos Szeredi 提交于 9月 01, 2021

The struct fuse_conn argument is not used and can be removed.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

a9667ac8

fuse: wait for writepages in syncfs · 660585b5

由 Miklos Szeredi 提交于 9月 01, 2021

In case of fuse the MM subsystem doesn't guarantee that page writeback
completes by the time ->sync_fs() is called.  This is because fuse
completes page writeback immediately to prevent DoS of memory reclaim by
the userspace file server.

This means that fuse itself must ensure that writes are synced before
sending the SYNCFS request to the server.

Introduce sync buckets, that hold a counter for the number of outstanding
write requests.  On syncfs replace the current bucket with a new one and
wait until the old bucket's counter goes down to zero.

It is possible to have multiple syncfs calls in parallel, in which case
there could be more than one waited-on buckets.  Descendant buckets must
not complete until the parent completes.  Add a count to the child (new)
bucket until the (parent) old bucket completes.

Use RCU protection to dereference the current bucket and to wake up an
emptied bucket.  Use fc->lock to protect against parallel assignments to
the current bucket.

This leaves just the counter to be a possible scalability issue.  The
fc->num_waiting counter has a similar issue, so both should be addressed at
the same time.
Reported-by: NAmir Goldstein <amir73il@gmail.com>
Fixes: 2d82ab25 ("virtiofs: propagate sync() to file server")
Cc: <stable@vger.kernel.org> # v5.14
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

660585b5

31 8月, 2021 1 次提交

fuse: flush extending writes · 59bda8ec

由 Miklos Szeredi 提交于 8月 31, 2021

Callers of fuse_writeback_range() assume that the file is ready for
modification by the server in the supplied byte range after the call
returns.

If there's a write that extends the file beyond the end of the supplied
range, then the file needs to be extended to at least the end of the range,
but currently that's not done.

There are at least two cases where this can cause problems:

 - copy_file_range() will return short count if the file is not extended
   up to end of the source range.

 - FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE will not extend the file,
   hence the region may not be fully allocated.

Fix by flushing writes from the start of the range up to the end of the
file.  This could be optimized if the writes are non-extending, etc, but
it's probably not worth the trouble.

Fixes: a2bc9236 ("fuse: fix copy_file_range() in the writeback case")
Fixes: 6b1bdb56 ("fuse: allow fallocate(FALLOC_FL_ZERO_RANGE)")
Cc: <stable@vger.kernel.org>  # v5.2
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

59bda8ec

19 8月, 2021 1 次提交

vfs: add rcu argument to ->get_acl() callback · 0cad6246

由 Miklos Szeredi 提交于 8月 18, 2021

Add a rcu argument to the ->get_acl() callback to allow
get_cached_acl_rcu() to call the ->get_acl() method in the next patch.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

0cad6246

18 8月, 2021 1 次提交

fuse: truncate pagecache on atomic_o_trunc · 76224355

由 Miklos Szeredi 提交于 8月 17, 2021

fuse_finish_open() will be called with FUSE_NOWRITE in case of atomic
O_TRUNC.  This can deadlock with fuse_wait_on_page_writeback() in
fuse_launder_page() triggered by invalidate_inode_pages2().

Fix by replacing invalidate_inode_pages2() in fuse_finish_open() with a
truncate_pagecache() call.  This makes sense regardless of FOPEN_KEEP_CACHE
or fc->writeback cache, so do it unconditionally.
Reported-by: NXie Yongji <xieyongji@bytedance.com>
Reported-and-tested-by: syzbot+bea44a5189836d956894@syzkaller.appspotmail.com
Fixes: e4648309 ("fuse: truncate pending writes on O_TRUNC")
Cc: <stable@vger.kernel.org>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

76224355

05 8月, 2021 2 次提交

fuse: allow sharing existing sb · 5d5b74aa

由 Miklos Szeredi 提交于 8月 05, 2021

Make it possible to create a new mount from a already working server.

Here's a detailed description of the problem from Jakob:

  "The background for this question is occasional problems we see with our
   fuse filesystem [1] and mount namespaces. On a usual client, we have
   system-wide, autofs managed mountpoints. When a new mount namespace is
   created (which can be done unprivileged in combination with user
   namespaces), it can happen that a mountpoint is used inside the new
   namespace but idle in the root mount namespace. So autofs unmounts the
   parent, system-wide mountpoint. But the fuse module stays active and
   still serves mountpoint in the child mount namespace. Because the fuse
   daemon also blocks other system wide resources corresponding to the
   mountpoint, this situation effectively prevents new mounts until the
   child mount namespaces closes.

   [1] https://github.com/cvmfs/cvmfs"
Reported-by: NJakob Blomer <jblomer@cern.ch>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

5d5b74aa

fuse: move fget() to fuse_get_tree() · 62dd1fc8

由 Miklos Szeredi 提交于 8月 05, 2021

Affected call chains:

fuse_get_tree
   -> get_tree_(bdev|nodev)
      -> fuse_fill_super

Needed for following patch.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

62dd1fc8

04 8月, 2021 3 次提交

fuse: move option checking into fuse_fill_super() · badc7414

由 Miklos Szeredi 提交于 8月 04, 2021

Checking whether the "fd=", "rootmode=", "user_id=" and "group_id=" mount
options are present can be moved from fuse_get_tree() into
fuse_fill_super() where the value of the options are consumed.

This relaxes semantics of reusing a fuse blockdev mount using the device
name.  Before this patch presence of these options were enforced but values
ignored, after this patch these options are completely ignored in this
case.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

badc7414

fuse: name fs_context consistently · 84c21507

由 Miklos Szeredi 提交于 8月 04, 2021

Naming convention under fs/fuse/:

	struct fuse_conn *fc;
	struct fs_context *fsc;
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

84c21507

fuse: fix use after free in fuse_read_interrupt() · e1e71c16

由 Miklos Szeredi 提交于 8月 04, 2021

There is a potential race between fuse_read_interrupt() and
fuse_request_end().

TASK1
  in fuse_read_interrupt(): delete req->intr_entry (while holding
  fiq->lock)

TASK2
  in fuse_request_end(): req->intr_entry is empty -> skip fiq->lock
  wake up TASK3

TASK3
  request is freed

TASK1
  in fuse_read_interrupt(): dereference req->in.h.unique ***BAM***


Fix by always grabbing fiq->lock if the request was ever interrupted
(FR_INTERRUPTED set) thereby serializing with concurrent
fuse_read_interrupt() calls.

FR_INTERRUPTED is set before the request is queued on fiq->interrupts.
Dequeing the request is done with list_del_init() but FR_INTERRUPTED is not
cleared in this case.
Reported-by: Nlijiazi <lijiazi@xiaomi.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

e1e71c16

13 7月, 2021 1 次提交

fuse: Convert to using invalidate_lock · 8bcbbe9c

由 Jan Kara 提交于 4月 21, 2021

Use invalidate_lock instead of fuse's private i_mmap_sem. The intended
purpose is exactly the same. By this conversion we fix a long standing
race between hole punching and read(2) / readahead(2) paths that can
lead to stale page cache contents.

CC: Miklos Szeredi <miklos@szeredi.hu>
Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>

8bcbbe9c

08 7月, 2021 1 次提交

fs/fuse: Remove unneeded kaddr parameter · 2e29be2e

由 Ira Weiny 提交于 5月 25, 2021

fuse_dax_mem_range_init() does not need the address or the pfn of the
memory requested in dax_direct_access().  It is only calling direct
access to get the number of pages.

Remove the unused variables and stop requesting the kaddr and pfn from
dax_direct_access().
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NVivek Goyal <vgoyal@redhat.com>
Link: https://lore.kernel.org/r/20210525172428.3634316-2-ira.weiny@intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>

2e29be2e

30 6月, 2021 2 次提交

mm: move page dirtying prototypes from mm.h · 3a6b2162

由 Matthew Wilcox (Oracle) 提交于 6月 28, 2021

These functions implement the address_space ->set_page_dirty operation and
should live in pagemap.h, not mm.h so that the rest of the kernel doesn't
get funny ideas about calling them directly.

Link: https://lkml.kernel.org/r/20210615162342.1669332-7-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3a6b2162

fs: remove noop_set_page_dirty() · b82a96c9

由 Matthew Wilcox (Oracle) 提交于 6月 28, 2021

Use __set_page_dirty_no_writeback() instead.  This will set the dirty bit
on the page, which will be used to avoid calling set_page_dirty() in the
future.  It will have no effect on actually writing the page back, as the
pages are not on any LRU lists.

[akpm@linux-foundation.org: export __set_page_dirty_no_writeback() to modules]

Link: https://lkml.kernel.org/r/20210615162342.1669332-6-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b82a96c9

22 6月, 2021 11 次提交

virtiofs: Fix spelling mistakes · c4e0cd4e

由 Zheng Yongjun 提交于 6月 04, 2021

Fix some spelling mistakes in comments:
refernce  ==> reference
happnes  ==> happens
threhold  ==> threshold
splitted  ==> split
mached  ==> matched
Signed-off-by: NZheng Yongjun <zhengyongjun3@huawei.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

c4e0cd4e

fuse: use DIV_ROUND_UP helper macro for calculations · 6c88632b

由 Wu Bo 提交于 5月 25, 2021

Replace open coded divisor calculations with the DIV_ROUND_UP kernel macro
for better readability.
Signed-off-by: NWu Bo <wubo40@huawei.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

6c88632b

fuse: fix illegal access to inode with reused nodeid · 15db1683

由 Amir Goldstein 提交于 6月 21, 2021

Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
with ourarg containing nodeid and generation.

If a fuse inode is found in inode cache with the same nodeid but different
generation, the existing fuse inode should be unhashed and marked "bad" and
a new inode with the new generation should be hashed instead.

This can happen, for example, with passhrough fuse filesystem that returns
the real filesystem ino/generation on lookup and where real inode numbers
can get recycled due to real files being unlinked not via the fuse
passthrough filesystem.

With current code, this situation will not be detected and an old fuse
dentry that used to point to an older generation real inode, can be used to
access a completely new inode, which should be accessed only via the new
dentry.

Note that because the FORGET message carries the nodeid w/o generation, the
server should wait to get FORGET counts for the nlookup counts of the old
and reused inodes combined, before it can free the resources associated to
that nodeid.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

15db1683

fuse: allow fallocate(FALLOC_FL_ZERO_RANGE) · 6b1bdb56

由 Richard W.M. Jones 提交于 5月 12, 2021

The current fuse module filters out fallocate(FALLOC_FL_ZERO_RANGE)
returning -EOPNOTSUPP.  libnbd's nbdfuse would like to translate
FALLOC_FL_ZERO_RANGE requests into the NBD command
NBD_CMD_WRITE_ZEROES which allows NBD servers that support it to do
zeroing efficiently.

This commit treats this flag exactly like FALLOC_FL_PUNCH_HOLE.

A way to test this, requiring fuse >= 3, nbdkit >= 1.8 and the latest
nbdfuse from https://gitlab.com/nbdkit/libnbd/-/tree/master/fuse is to
create a file containing some data and "mirror" it to a fuse file:

  $ dd if=/dev/urandom of=disk.img bs=1M count=1
  $ nbdkit file disk.img
  $ touch mirror.img
  $ nbdfuse mirror.img nbd://localhost &

(mirror.img -> nbdfuse -> NBD over loopback -> nbdkit -> disk.img)

You can then run commands such as:

  $ fallocate -z -o 1024 -l 1024 mirror.img

and check that the content of the original file ("disk.img") stays
synchronized.  To show NBD commands, export LIBNBD_DEBUG=1 before
running nbdfuse.  To clean up:

  $ fusermount3 -u mirror.img
  $ killall nbdkit
Signed-off-by: NRichard W.M. Jones <rjones@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

6b1bdb56

fuse: Make fuse_fill_super_submount() static · 1b539917

由 Greg Kurz 提交于 6月 04, 2021

This function used to be called from fuse_dentry_automount(). This code
was moved to fuse_get_tree_submount() in the same file since then. It
is unlikely there will ever be another user. No need to be extern in
this case.
Signed-off-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

1b539917

fuse: Switch to fc_mount() for submounts · 29e0e4df

由 Greg Kurz 提交于 6月 04, 2021

fc_mount() already handles the vfs_get_tree(), sb->s_umount
unlocking and vfs_create_mount() sequence. Using it greatly
simplifies fuse_dentry_automount().
Signed-off-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

29e0e4df

fuse: Call vfs_get_tree() for submounts · 266eb3f2

由 Greg Kurz 提交于 6月 04, 2021

We recently fixed an infinite loop by setting the SB_BORN flag on
submounts along with the write barrier needed by super_cache_count().
This is the job of vfs_get_tree() and FUSE shouldn't have to care
about the barrier at all.

Split out some code from fuse_dentry_automount() to the dedicated
fuse_get_tree_submount() handler for submounts and call vfs_get_tree().
Signed-off-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

266eb3f2

fuse: add dedicated filesystem context ops for submounts · fe0a7bd8

由 Greg Kurz 提交于 6月 04, 2021

The creation of a submount is open-coded in fuse_dentry_automount().
This brings a lot of complexity and we recently had to fix bugs
because we weren't setting SB_BORN or because we were unlocking
sb->s_umount before sb was fully configured. Most of these could
have been avoided by using the mount API instead of open-coding.

Basically, this means coming up with a proper ->get_tree()
implementation for submounts and call vfs_get_tree(), or better
fc_mount().

The creation of the superblock for submounts is quite different from
the root mount. Especially, it doesn't require to allocate a FUSE
filesystem context, nor to parse parameters.

Introduce a dedicated context ops for submounts to make this clear.
This is just a placeholder for now, fuse_get_tree_submount() will
be populated in a subsequent patch.

Only visible change is that we stop allocating/freeing a useless FUSE
filesystem context with submounts.
Signed-off-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

fe0a7bd8

virtiofs: propagate sync() to file server · 2d82ab25

由 Greg Kurz 提交于 5月 20, 2021

Even if POSIX doesn't mandate it, linux users legitimately expect sync() to
flush all data and metadata to physical storage when it is located on the
same system.  This isn't happening with virtiofs though: sync() inside the
guest returns right away even though data still needs to be flushed from
the host page cache.

This is easily demonstrated by doing the following in the guest:

$ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync
5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s
sync()                                  = 0 <0.024068>

and start the following in the host when the 'dd' command completes
in the guest:

$ strace -T -e fsync /usr/bin/sync virtiofs/foo
fsync(3)                                = 0 <10.371640>

There are no good reasons not to honor the expected behavior of sync()
actually: it gives an unrealistic impression that virtiofs is super fast
and that data has safely landed on HW, which isn't the case obviously.

Implement a ->sync_fs() superblock operation that sends a new FUSE_SYNCFS
request type for this purpose.  Provision a 64-bit placeholder for possible
future extensions.  Since the file server cannot handle the wait == 0 case,
we skip it to avoid a gratuitous roundtrip.  Note that this is
per-superblock: a FUSE_SYNCFS is send for the root mount and for each
submount.

Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for FUSE_SYNCFS in
the file server is treated as permanent success.  This ensures
compatibility with older file servers: the client will get the current
behavior of sync() not being propagated to the file server.

Note that such an operation allows the file server to DoS sync().  Since a
typical FUSE file server is an untrusted piece of software running in
userspace, this is disabled by default.  Only enable it with virtiofs for
now since virtiofsd is supposedly trusted by the guest kernel.
Reported-by: NRobert Krawitz <rlk@redhat.com>
Signed-off-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

2d82ab25

fuse: reject internal errno · 49221cf8

由 Miklos Szeredi 提交于 6月 22, 2021

Don't allow userspace to report errors that could be kernel-internal.
Reported-by: NAnatoly Trosinenko <anatoly.trosinenko@gmail.com>
Fixes: 334f485d ("[PATCH] FUSE - device functions")
Cc: <stable@vger.kernel.org> # v2.6.14
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

49221cf8

fuse: check connected before queueing on fpq->io · 80ef0867

由 Miklos Szeredi 提交于 6月 22, 2021

A request could end up on the fpq->io list after fuse_abort_conn() has
reset fpq->connected and aborted requests on that list:

Thread-1			  Thread-2
========			  ========
->fuse_simple_request()           ->shutdown
  ->__fuse_request_send()
    ->queue_request()		->fuse_abort_conn()
->fuse_dev_do_read()                ->acquire(fpq->lock)
  ->wait_for(fpq->lock) 	  ->set err to all req's in fpq->io
				  ->release(fpq->lock)
  ->acquire(fpq->lock)
  ->add req to fpq->io

After the userspace copy is done the request will be ended, but
req->out.h.error will remain uninitialized.  Also the copy might block
despite being already aborted.

Fix both issues by not allowing the request to be queued on the fpq->io
list after fuse_abort_conn() has processed this list.
Reported-by: NPradeep P V K <pragalla@codeaurora.org>
Fixes: fd22d62e ("fuse: no fc->lock for iqueue parts")
Cc: <stable@vger.kernel.org> # v4.2
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

80ef0867

19 6月, 2021 1 次提交

fuse: ignore PG_workingset after stealing · b89ecd60

由 Miklos Szeredi 提交于 6月 18, 2021

Fix the "fuse: trying to steal weird page" warning.

Description from Johannes Weiner:

  "Think of it as similar to PG_active. It's just another usage/heat
   indicator of file and anon pages on the reclaim LRU that, unlike
   PG_active, persists across deactivation and even reclaim (we store it in
   the page cache / swapper cache tree until the page refaults).

   So if fuse accepts pages that can legally have PG_active set,
   PG_workingset is fine too."
Reported-by: NThomas Lindroth <thomas.lindroth@gmail.com>
Fixes: 1899ad18 ("mm: workingset: tell cache transitions from workingset thrashing")
Cc: <stable@vger.kernel.org> # v4.20
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

b89ecd60

10 6月, 2021 1 次提交

iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant · f0b65f39

由 Al Viro 提交于 4月 30, 2021

Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the
callers do *not* need to do iov_iter_advance() after it. In case when they end
up consuming less than they'd been given they need to do iov_iter_revert() on
everything they had not consumed. That, however, needs to be done only on slow
paths.

All in-tree callers converted. And that kills the last user of iterate_all_kinds()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f0b65f39

09 6月, 2021 3 次提交

fuse: Fix infinite loop in sget_fc() · e4a9ccdd

由 Greg Kurz 提交于 6月 04, 2021

We don't set the SB_BORN flag on submounts. This is wrong as these
superblocks are then considered as partially constructed or dying
in the rest of the code and can break some assumptions.

One such case is when you have a virtiofs filesystem with submounts
and you try to mount it again : virtio_fs_get_tree() tries to obtain
a superblock with sget_fc(). The logic in sget_fc() is to loop until
it has either found an existing matching superblock with SB_BORN set
or to create a brand new one. It is assumed that a superblock without
SB_BORN is transient and the loop is restarted. Forgetting to set
SB_BORN on submounts hence causes sget_fc() to retry forever.

Setting SB_BORN requires special care, i.e. a write barrier for
super_cache_count() which can check SB_BORN without taking any lock.
We should call vfs_get_tree() to deal with that but this requires
to have a proper ->get_tree() implementation for submounts, which
is a bigger piece of work. Go for a simple bug fix in the meatime.

Fixes: bf109c64 ("fuse: implement crossmounts")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

e4a9ccdd

fuse: Fix crash if superblock of submount gets killed early · e3a43f2a

由 Greg Kurz 提交于 6月 04, 2021

As soon as fuse_dentry_automount() does up_write(&sb->s_umount), the
superblock can theoretically be killed. If this happens before the
submount was added to the &fc->mounts list, fuse_mount_remove() later
crashes in list_del_init() because it assumes the submount to be
already there.

Add the submount before dropping sb->s_umount to fix the inconsistency.
It is okay to nest fc->killsb under sb->s_umount, we already do this
on the ->kill_sb() path.
Signed-off-by: NGreg Kurz <groug@kaod.org>
Fixes: bf109c64 ("fuse: implement crossmounts")
Cc: stable@vger.kernel.org # v5.10+
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

e3a43f2a

fuse: Fix crash in fuse_dentry_automount() error path · d92d88f0

由 Greg Kurz 提交于 6月 04, 2021

If fuse_fill_super_submount() returns an error, the error path
triggers a crash:

[   26.206673] BUG: kernel NULL pointer dereference, address: 0000000000000000
[...]
[   26.226362] RIP: 0010:__list_del_entry_valid+0x25/0x90
[...]
[   26.247938] Call Trace:
[   26.248300]  fuse_mount_remove+0x2c/0x70 [fuse]
[   26.248892]  virtio_kill_sb+0x22/0x160 [virtiofs]
[   26.249487]  deactivate_locked_super+0x36/0xa0
[   26.250077]  fuse_dentry_automount+0x178/0x1a0 [fuse]

The crash happens because fuse_mount_remove() assumes that the FUSE
mount was already added to list under the FUSE connection, but this
only done after fuse_fill_super_submount() has returned success.

This means that until fuse_fill_super_submount() has returned success,
the FUSE mount isn't actually owned by the superblock. We should thus
reclaim ownership by clearing sb->s_fs_info, which will skip the call
to fuse_mount_remove(), and perform rollback, like virtio_fs_get_tree()
already does for the root sb.

Fixes: bf109c64 ("fuse: implement crossmounts")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

d92d88f0

03 6月, 2021 1 次提交

fuse_fill_write_pages(): don't bother with iov_iter_single_seg_count() · 8959a239

由 Al Viro 提交于 6月 03, 2021

another rudiment of fault-in originally having been limited to the
first segment, same as in generic_perform_write() and friends.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8959a239

16 4月, 2021 1 次提交
- A
  useful constants: struct qstr for ".." · 80e5d1ff
  由 Al Viro 提交于 4月 15, 2021
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  80e5d1ff
14 4月, 2021 8 次提交

cuse: simplify refcount · 3c9c1433

由 Miklos Szeredi 提交于 4月 14, 2021

Put extra reference early in cuse_channel_open().
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

3c9c1433

cuse: prevent clone · 8217673d

由 Miklos Szeredi 提交于 4月 14, 2021

For cloned connections cuse_channel_release() will be called more than
once, resulting in use after free.

Prevent device cloning for CUSE, which does not make sense at this point,
and highly unlikely to be used in real life.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

8217673d

virtiofs: fix userns · 0a7419c6

由 Miklos Szeredi 提交于 4月 14, 2021

get_user_ns() is done twice (once in virtio_fs_get_tree() and once in
fuse_conn_init()), resulting in a reference leak.

Also looks better to use fsc->user_ns (which *should* be the
current_user_ns() at this point).
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

0a7419c6

virtiofs: remove useless function · 07595bfa

由 Jiapeng Chong 提交于 4月 13, 2021

Fix the following clang warning:

fs/fuse/virtio_fs.c:130:35: warning: unused function 'vq_to_fpq'
[-Wunused-function].
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Signed-off-by: NJiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

07595bfa

virtiofs: split requests that exceed virtqueue size · a7f0d7aa

由 Connor Kuehl 提交于 3月 18, 2021

If an incoming FUSE request can't fit on the virtqueue, the request is
placed onto a workqueue so a worker can try to resubmit it later where
there will (hopefully) be space for it next time.

This is fine for requests that aren't larger than a virtqueue's maximum
capacity.  However, if a request's size exceeds the maximum capacity of the
virtqueue (even if the virtqueue is empty), it will be doomed to a life of
being placed on the workqueue, removed, discovered it won't fit, and placed
on the workqueue yet again.

Furthermore, from section 2.6.5.3.1 (Driver Requirements: Indirect
Descriptors) of the virtio spec:

  "A driver MUST NOT create a descriptor chain longer than the Queue
  Size of the device."

To fix this, limit the number of pages FUSE will use for an overall
request.  This way, each request can realistically fit on the virtqueue
when it is decomposed into a scattergather list and avoid violating section
2.6.5.3.1 of the virtio spec.
Signed-off-by: NConnor Kuehl <ckuehl@redhat.com>
Reviewed-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

a7f0d7aa

virtiofs: fix memory leak in virtio_fs_probe() · c79c5e01

由 Luis Henriques 提交于 3月 17, 2021

When accidentally passing twice the same tag to qemu, kmemleak ended up
reporting a memory leak in virtiofs.  Also, looking at the log I saw the
following error (that's when I realised the duplicated tag):

  virtiofs: probe of virtio5 failed with error -17

Here's the kmemleak log for reference:

unreferenced object 0xffff888103d47800 (size 1024):
  comm "systemd-udevd", pid 118, jiffies 4294893780 (age 18.340s)
  hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
    ff ff ff ff ff ff ff ff 80 90 02 a0 ff ff ff ff  ................
  backtrace:
    [<000000000ebb87c1>] virtio_fs_probe+0x171/0x7ae [virtiofs]
    [<00000000f8aca419>] virtio_dev_probe+0x15f/0x210
    [<000000004d6baf3c>] really_probe+0xea/0x430
    [<00000000a6ceeac8>] device_driver_attach+0xa8/0xb0
    [<00000000196f47a7>] __driver_attach+0x98/0x140
    [<000000000b20601d>] bus_for_each_dev+0x7b/0xc0
    [<00000000399c7b7f>] bus_add_driver+0x11b/0x1f0
    [<0000000032b09ba7>] driver_register+0x8f/0xe0
    [<00000000cdd55998>] 0xffffffffa002c013
    [<000000000ea196a2>] do_one_initcall+0x64/0x2e0
    [<0000000008f727ce>] do_init_module+0x5c/0x260
    [<000000003cdedab6>] __do_sys_finit_module+0xb5/0x120
    [<00000000ad2f48c6>] do_syscall_64+0x33/0x40
    [<00000000809526b5>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Cc: stable@vger.kernel.org
Signed-off-by: NLuis Henriques <lhenriques@suse.de>
Fixes: a62a8ef9 ("virtio-fs: add virtiofs filesystem")
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

c79c5e01

fuse: invalidate attrs when page writeback completes · 3466958b

由 Vivek Goyal 提交于 4月 06, 2021

In fuse when a direct/write-through write happens we invalidate attrs
because that might have updated mtime/ctime on server and cached
mtime/ctime will be stale.

What about page writeback path. Looks like we don't invalidate attrs
there. To be consistent, invalidate attrs in writeback path as well. Only
exception is when writeback_cache is enabled. In that case we strust local
mtime/ctime and there is no need to invalidate attrs.

Recently users started experiencing failure of xfstests generic/080,
geneirc/215 and generic/614 on virtiofs. This happened only newer "stat"
utility and not older one. This patch fixes the issue.

So what's the root cause of the issue. Here is detailed explanation.

generic/080 test does mmap write to a file, closes the file and then checks
if mtime has been updated or not. When file is closed, it leads to
flushing of dirty pages (and that should update mtime/ctime on server).
But we did not explicitly invalidate attrs after writeback finished. Still
generic/080 passed so far and reason being that we invalidated atime in
fuse_readpages_end(). This is called in fuse_readahead() path and always
seems to trigger before mmaped write.

So after mmaped write when lstat() is called, it sees that atleast one of
the fields being asked for is invalid (atime) and that results in
generating GETATTR to server and mtime/ctime also get updated and test
passes.

But newer /usr/bin/stat seems to have moved to using statx() syscall now
(instead of using lstat()). And statx() allows it to query only ctime or
mtime (and not rest of the basic stat fields). That means when querying
for mtime, fuse_update_get_attr() sees that mtime is not invalid (only
atime is invalid). So it does not generate a new GETATTR and fill stat
with cached mtime/ctime. And that means updated mtime is not seen by
xfstest and tests start failing.

Invalidating attrs after writeback completion should solve this problem in
a generic manner.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

3466958b

fuse: add a flag FUSE_SETXATTR_ACL_KILL_SGID to kill SGID · 550a7d3b

由 Vivek Goyal 提交于 3月 25, 2021

When posix access ACL is set, it can have an effect on file mode and it can
also need to clear SGID if.

- None of caller's group/supplementary groups match file owner group.
AND
- Caller is not priviliged (No CAP_FSETID).

As of now fuser server is responsible for changing the file mode as
well. But it does not know whether to clear SGID or not.

So add a flag FUSE_SETXATTR_ACL_KILL_SGID and send this info with SETXATTR
to let file server know that sgid needs to be cleared as well.
Reported-by: NLuis Henriques <lhenriques@suse.de>
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

550a7d3b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功