提交 · 356b720e56133cd41478fc09b0de26b403fb5145 · openeuler / Kernel

19 10月, 2021 1 次提交

new helper: inode_wrong_type() · 356b720e

由 Al Viro 提交于 10月 19, 2021

stable inclusion
from stable-5.10.63
commit 40ba433a85dbbf5b2e58f2ac6b161ce37ac872fc
bugzilla: 182231 https://gitee.com/openeuler/kernel/issues/I4EFS1

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=40ba433a85dbbf5b2e58f2ac6b161ce37ac872fc

--------------------------------

commit 6e3e2c43 upstream.

inode_wrong_type(inode, mode) returns true if setting inode->i_mode
to given value would've changed the inode type.  We have enough of
those checks open-coded to make a helper worthwhile.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

356b720e

27 1月, 2021 1 次提交

fuse: fix bad inode · ebbe4b9f

由 Miklos Szeredi 提交于 1月 18, 2021

stable inclusion
from stable-5.10.6
commit 36cf9ae54b0ead0daab7701a994de3dcd9ef605d
bugzilla: 47418

--------------------------------

[ Upstream commit 5d069dbe ]

Jan Kara's analysis of the syzbot report (edited):

  The reproducer opens a directory on FUSE filesystem, it then attaches
  dnotify mark to the open directory.  After that a fuse_do_getattr() call
  finds that attributes returned by the server are inconsistent, and calls
  make_bad_inode() which, among other things does:

          inode->i_mode = S_IFREG;

  This then confuses dnotify which doesn't tear down its structures
  properly and eventually crashes.

Avoid calling make_bad_inode() on a live inode: switch to a private flag on
the fuse inode.  Also add the test to ops which the bad_inode_ops would
have caught.

This bug goes back to the initial merge of fuse in 2.6.14...

Reported-by: syzbot+f427adf9324b92652ccc@syzkaller.appspotmail.com
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Tested-by: NJan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

ebbe4b9f

12 10月, 2020 1 次提交

fuse: connection remove fix · 413daa1a

由 Miklos Szeredi 提交于 10月 09, 2020

Re-add lost removal of fc from fuse_conn_list and the control filesystem.
Reported-by: Nkernel test robot <rong.a.chen@intel.com>
Fixes: fcee216b ("fuse: split fuse_mount off of fuse_conn")
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

413daa1a

09 10月, 2020 1 次提交

fuse: implement crossmounts · bf109c64

由 Max Reitz 提交于 4月 21, 2020

FUSE servers can indicate crossmount points by setting FUSE_ATTR_SUBMOUNT
in fuse_attr.flags.  The inode will then be marked as S_AUTOMOUNT, and the
.d_automount implementation creates a new submount at that location, so
that the submount gets a distinct st_dev value.

Note that all submounts get a distinct superblock and a distinct st_dev
value, so for virtio-fs, even if the same filesystem is mounted more than
once on the host, none of its mount points will have the same st_dev.  We
need distinct superblocks because the superblock points to the root node,
but the different host mounts may show different trees (e.g. due to
submounts in some of them, but not in others).

Right now, this behavior is only enabled when fuse_conn.auto_submounts is
set, which is the case only for virtio-fs.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

bf109c64

25 9月, 2020 2 次提交

bdi: invert BDI_CAP_NO_ACCT_WB · 823423ef

由 Christoph Hellwig 提交于 9月 24, 2020

Replace BDI_CAP_NO_ACCT_WB with a positive BDI_CAP_WRITEBACK_ACCT to
make the checks more obvious.  Also remove the pointless
bdi_cap_account_writeback wrapper that just obsfucates the check.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

823423ef

bdi: initialize ->ra_pages and ->io_pages in bdi_init · 55b2598e

由 Christoph Hellwig 提交于 9月 24, 2020

Set up a readahead size by default, as very few users have a good
reason to change it.  This means code, ecryptfs, and orangefs now
set up the values while they were previously missing it, while ubifs,
mtd and vboxsf manually set it to 0 to avoid readahead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Acked-by: Richard Weinberger <richard@nod.at> [ubifs, mtd]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

55b2598e

18 9月, 2020 2 次提交

fuse: Allow fuse_fill_super_common() for submounts · 1866d779

由 Max Reitz 提交于 9月 09, 2020

Submounts have their own superblock, which needs to be initialized.
However, they do not have a fuse_fs_context associated with them, and
the root node's attributes should be taken from the mountpoint's node.

Extend fuse_fill_super_common() to work for submounts by making the @ctx
parameter optional, and by adding a @submount_finode parameter.

(There is a plain "unsigned" in an existing code block that is being
indented by this commit.  Extend it to "unsigned int" so checkpatch does
not complain.)
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

1866d779

fuse: split fuse_mount off of fuse_conn · fcee216b

由 Max Reitz 提交于 5月 06, 2020

We want to allow submounts for the same fuse_conn, but with different
superblocks so that each of the submounts has its own device ID.  To do
so, we need to split all mount-specific information off of fuse_conn
into a new fuse_mount structure, so that multiple mounts can share a
single fuse_conn.

We need to take care only to perform connection-level actions once (i.e.
when the fuse_conn and thus the first fuse_mount are established, or
when the last fuse_mount and thus the fuse_conn are destroyed).  For
example, fuse_sb_destroy() must invoke fuse_send_destroy() until the
last superblock is released.

To do so, we keep track of which fuse_mount is the root mount and
perform all fuse_conn-level actions only when this fuse_mount is
involved.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

fcee216b

10 9月, 2020 5 次提交

virtiofs: serialize truncate/punch_hole and dax fault path · 6ae330ca

由 Vivek Goyal 提交于 8月 19, 2020

Currently in fuse we don't seem have any lock which can serialize fault
path with truncate/punch_hole path. With dax support I need one for
following reasons.

1. Dax requirement

  DAX fault code relies on inode size being stable for the duration of
  fault and want to serialize with truncate/punch_hole and they explicitly
  mention it.

  static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
                               const struct iomap_ops *ops)
        /*
         * Check whether offset isn't beyond end of file now. Caller is
         * supposed to hold locks serializing us with truncate / punch hole so
         * this is a reliable test.
         */
        max_pgoff = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);

2. Make sure there are no users of pages being truncated/punch_hole

  get_user_pages() might take references to page and then do some DMA
  to said pages. Filesystem might truncate those pages without knowing
  that a DMA is in progress or some I/O is in progress. So use
  dax_layout_busy_page() to make sure there are no such references
  and I/O is not in progress on said pages before moving ahead with
  truncation.

3. Limitation of kvm page fault error reporting

  If we are truncating file on host first and then removing mappings in
  guest lateter (truncate page cache etc), then this could lead to a
  problem with KVM. Say a mapping is in place in guest and truncation
  happens on host. Now if guest accesses that mapping, then host will
  take a fault and kvm will either exit to qemu or spin infinitely.

  IOW, before we do truncation on host, we need to make sure that guest
  inode does not have any mapping in that region or whole file.

4. virtiofs memory range reclaim

 Soon I will introduce the notion of being able to reclaim dax memory
 ranges from a fuse dax inode. There also I need to make sure that
 no I/O or fault is going on in the reclaimed range and nobody is using
 it so that range can be reclaimed without issues.

Currently if we take inode lock, that serializes read/write. But it does
not do anything for faults. So I add another semaphore fuse_inode->i_mmap_sem
for this purpose.  It can be used to serialize with faults.

As of now, I am adding taking this semaphore only in dax fault path and
not regular fault path because existing code does not have one. May
be existing code can benefit from it as well to take care of some
races, but that we can fix later if need be. For now, I am just focussing
only on DAX path which is new path.

Also added logic to take fuse_inode->i_mmap_sem in
truncate/punch_hole/open(O_TRUNC) path to make sure file truncation and
fuse dax fault are mutually exlusive and avoid all the above problems.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

6ae330ca

virtiofs: implement dax read/write operations · c2d0ad00

由 Vivek Goyal 提交于 8月 19, 2020

This patch implements basic DAX support. mmap() is not implemented
yet and will come in later patches. This patch looks into implemeting
read/write.

We make use of interval tree to keep track of per inode dax mappings.

Do not use dax for file extending writes, instead just send WRITE message
to daemon (like we do for direct I/O path). This will keep write and
i_size change atomic w.r.t crash.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NPeng Tao <tao.peng@linux.alibaba.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

c2d0ad00

virtiofs: implement FUSE_INIT map_alignment field · fd1a1dc6

由 Stefan Hajnoczi 提交于 8月 19, 2020

The device communicates FUSE_SETUPMAPPING/FUSE_REMOVMAPPING alignment
constraints via the FUST_INIT map_alignment field.  Parse this field and
ensure our DAX mappings meet the alignment constraints.

We don't actually align anything differently since our mappings are
already 2MB aligned.  Just check the value when the connection is
established.  If it becomes necessary to honor arbitrary alignments in
the future we'll have to adjust how mappings are sized.

The upshot of this commit is that we can be confident that mappings will
work even when emulating x86 on Power and similar combinations where the
host page sizes are different.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

fd1a1dc6

virtiofs: add a mount option to enable dax · 1dd53957

由 Vivek Goyal 提交于 8月 19, 2020

Add a mount option to allow using dax with virtio_fs.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

1dd53957

virtiofs: get rid of no_mount_options · f4fd4ae3

由 Vivek Goyal 提交于 8月 19, 2020

This option was introduced so that for virtio_fs we don't show any mounts
options fuse_show_options(). Because we don't offer any of these options
to be controlled by mounter.

Very soon we are planning to introduce option "dax" which mounter should
be able to specify. And no_mount_options does not work anymore.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

f4fd4ae3

14 7月, 2020 3 次提交

fuse: reject options on reconfigure via fsconfig(2) · b330966f

由 Miklos Szeredi 提交于 7月 14, 2020

Previous patch changed handling of remount/reconfigure to ignore all
options, including those that are unknown to the fuse kernel fs.  This was
done for backward compatibility, but this likely only affects the old
mount(2) API.

The new fsconfig(2) based reconfiguration could possibly be improved.  This
would make the new API less of a drop in replacement for the old, OTOH this
is a good chance to get rid of some weirdnesses in the old API.

Several other behaviors might make sense:

 1) unknown options are rejected, known options are ignored

 2) unknown options are rejected, known options are rejected if the value
 is changed, allowed otherwise

 3) all options are rejected

Prior to the backward compatibility fix to ignore all options all known
options were accepted (1), even if they change the value of a mount
parameter; fuse_reconfigure() does not look at the config values set by
fuse_parse_param().

To fix that we'd need to verify that the value provided is the same as set
in the initial configuration (2).  The major drawback is that this is much
more complex than just rejecting all attempts at changing options (3);
i.e. all options signify initial configuration values and don't make sense
on reconfigure.

This patch opts for (3) with the rationale that no mount options are
reconfigurable in fuse.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

b330966f

fuse: ignore 'data' argument of mount(..., MS_REMOUNT) · e8b20a47

由 Miklos Szeredi 提交于 7月 14, 2020

The command

  mount -o remount -o unknownoption /mnt/fuse

succeeds on kernel versions prior to v5.4 and fails on kernel version at or
after.  This is because fuse_parse_param() rejects any unrecognised options
in case of FS_CONTEXT_FOR_RECONFIGURE, just as for FS_CONTEXT_FOR_MOUNT.

This causes a regression in case the fuse filesystem is in fstab, since
remount sends all options found there to the kernel; even ones that are
meant for the initial mount and are consumed by the userspace fuse server.

Fix this by ignoring mount options, just as fuse_remount_fs() did prior to
the conversion to the new API.
Reported-by: NStefan Priebe <s.priebe@profihost.ag>
Fixes: c30da2e9 ("fuse: convert to use the new mount API")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

e8b20a47

fuse: use ->reconfigure() instead of ->remount_fs() · 0189a2d3

由 Miklos Szeredi 提交于 7月 14, 2020

s_op->remount_fs() is only called from legacy_reconfigure(), which is not
used after being converted to the new API.

Convert to using ->reconfigure().  This restores the previous behavior of
syncing the filesystem and rejecting MS_MANDLOCK on remount.

Fixes: c30da2e9 ("fuse: convert to use the new mount API")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

0189a2d3

19 5月, 2020 2 次提交

fuse: update attr_version counter on fuse_notify_inval_inode() · 5ddd9ced

由 Miklos Szeredi 提交于 5月 19, 2020

A GETATTR request can race with FUSE_NOTIFY_INVAL_INODE, resulting in the
attribute cache being updated with stale information after the
invalidation.

Fix this by bumping the attribute version in fuse_reverse_inval_inode().
Reported-by: NKrzysztof Rusek <rusek@9livesdata.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

5ddd9ced

virtiofs: do not use fuse_fill_super_common() for device installation · 7fd3abfa

由 Vivek Goyal 提交于 5月 04, 2020

fuse_fill_super_common() allocates and installs one fuse_device.  Hence
virtiofs allocates and install all fuse devices by itself except one.

This makes logic little twisted.  There does not seem to be any real need
that why virtiofs can't allocate and install all fuse devices itself.

So opt out of fuse device allocation and installation while calling
fuse_fill_super_common().

Regular fuse still wants fuse_fill_super_common() to install fuse_device.
It needs to prevent against races where two mounters are trying to mount
fuse using same fd.  In that case one will succeed while other will get
-EINVAL.

virtiofs does not have this issue because sget_fc() resolves the race
w.r.t multiple mounters and only one instance of virtio_fs_fill_super()
should be in progress for same filesystem.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

7fd3abfa

08 2月, 2020 3 次提交

A
fuse: switch to use errorfc() et.al. · 2e28c49e
由 Al Viro 提交于 12月 21, 2019
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2e28c49e

fs_parse: fold fs_parameter_desc/fs_parameter_spec · d7167b14

由 Al Viro 提交于 9月 07, 2019

The former contains nothing but a pointer to an array of the latter...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d7167b14

fs_parser: remove fs_parameter_description name field · 96cafb9c

由 Eric Sandeen 提交于 12月 06, 2019

Unused now.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

96cafb9c

06 2月, 2020 1 次提交

fuse: use true,false for bool variable · cabdb4fa

由 zhengbin 提交于 1月 14, 2020

Fixes coccicheck warning:

fs/fuse/readdir.c:335:1-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/file.c:1398:2-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/file.c:1400:2-20: WARNING: Assignment of 0/1 to bool variable
fs/fuse/cuse.c:454:1-20: WARNING: Assignment of 0/1 to bool variable
fs/fuse/cuse.c:455:1-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:497:2-17: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:504:2-23: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:511:2-22: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:518:2-23: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:522:2-26: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:526:2-18: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:1000:1-20: WARNING: Assignment of 0/1 to bool variable
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

cabdb4fa

15 10月, 2019 1 次提交

virtio-fs: don't show mount options · 3f22c746

由 Miklos Szeredi 提交于 10月 15, 2019

Virtio-fs does not accept any mount options, so it's confusing and wrong to
show any in /proc/mounts.

Reported-by: Stefan Hajnoczi <stefanha@redhat.com> 
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

3f22c746

24 9月, 2019 1 次提交

fuse: kmemcg account fs data · dc69e98c

由 Khazhismel Kumykov 提交于 9月 17, 2019

account per-file, dentry, and inode data

blockdev/superblock and temporary per-request data was left alone, as
this usually isn't accounted
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Signed-off-by: NKhazhismel Kumykov <khazhy@google.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

dc69e98c

19 9月, 2019 1 次提交

virtio-fs: add virtiofs filesystem · a62a8ef9

由 Stefan Hajnoczi 提交于 6月 12, 2018

Add a basic file system module for virtio-fs.  This does not yet contain
shared data support between host and guest or metadata coherency speedups.
However it is already significantly faster than virtio-9p.

Design Overview
===============

With the goal of designing something with better performance and local file
system semantics, a bunch of ideas were proposed.

 - Use fuse protocol (instead of 9p) for communication between guest and
   host.  Guest kernel will be fuse client and a fuse server will run on
   host to serve the requests.

 - For data access inside guest, mmap portion of file in QEMU address space
   and guest accesses this memory using dax.  That way guest page cache is
   bypassed and there is only one copy of data (on host).  This will also
   enable mmap(MAP_SHARED) between guests.

 - For metadata coherency, there is a shared memory region which contains
   version number associated with metadata and any guest changing metadata
   updates version number and other guests refresh metadata on next access.
   This is yet to be implemented.

How virtio-fs differs from existing approaches
==============================================

The unique idea behind virtio-fs is to take advantage of the co-location of
the virtual machine and hypervisor to avoid communication (vmexits).

DAX allows file contents to be accessed without communication with the
hypervisor.  The shared memory region for metadata avoids communication in
the common case where metadata is unchanged.

By replacing expensive communication with cheaper shared memory accesses,
we expect to achieve better performance than approaches based on network
file system protocols.  In addition, this also makes it easier to achieve
local file system semantics (coherency).

These techniques are not applicable to network file system protocols since
the communications channel is bypassed by taking advantage of shared memory
on a local machine.  This is why we decided to build virtio-fs rather than
focus on 9P or NFS.

Caching Modes
=============

Like virtio-9p, different caching modes are supported which determine the
coherency level as well.  The “cache=FOO” and “writeback” options control
the level of coherence between the guest and host filesystems.

 - cache=none
   metadata, data and pathname lookup are not cached in guest.  They are
   always fetched from host and any changes are immediately pushed to host.

 - cache=always
   metadata, data and pathname lookup are cached in guest and never expire.

 - cache=auto
   metadata and pathname lookup cache expires after a configured amount of
   time (default is 1 second).  Data is cached while the file is open
   (close to open consistency).

 - writeback/no_writeback
   These options control the writeback strategy.  If writeback is disabled,
   then normal writes will immediately be synchronized with the host fs.
   If writeback is enabled, then writes may be cached in the guest until
   the file is closed or an fsync(2) performed.  This option has no effect
   on mmap-ed writes or writes going through the DAX mechanism.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

a62a8ef9

12 9月, 2019 7 次提交

fuse: allow skipping control interface and forced unmount · 15c8e72e

由 Vivek Goyal 提交于 5月 06, 2019

virtio-fs does not support aborting requests which are being
processed. That is requests which have been sent to fuse daemon on host.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

15c8e72e

fuse: dissociate DESTROY from fuseblk · 783863d6

由 Miklos Szeredi 提交于 8月 29, 2019

Allow virtio-fs to also send DESTROY request.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

783863d6

fuse: separate fuse device allocation and installation in fuse_conn · 0cd1eb9a

由 Vivek Goyal 提交于 3月 06, 2019

As of now fuse_dev_alloc() both allocates a fuse device and installs it in
fuse_conn list. fuse_dev_alloc() can fail if fuse_device allocation fails.

virtio-fs needs to initialize multiple fuse devices (one per virtio queue).
It initializes one fuse device as part of call to fuse_fill_super_common()
and rest of the devices are allocated and installed after that.

But, we can't afford to fail after calling fuse_fill_super_common() as we
don't have a way to undo all the actions done by fuse_fill_super_common().
So to avoid failures after the call to fuse_fill_super_common(),
pre-allocate all fuse devices early and install them into fuse connection
later.

This patch provides two separate helpers for fuse device allocation and
fuse device installation in fuse_conn.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

0cd1eb9a

fuse: add fuse_iqueue_ops callbacks · ae3aad77

由 Stefan Hajnoczi 提交于 6月 18, 2018

The /dev/fuse device uses fiq->waitq and fasync to signal that requests are
available.  These mechanisms do not apply to virtio-fs.  This patch
introduces callbacks so alternative behavior can be used.

Note that queue_interrupt() changes along these lines:

  spin_lock(&fiq->waitq.lock);
  wake_up_locked(&fiq->waitq);
+ kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
  spin_unlock(&fiq->waitq.lock);
- kill_fasync(&fiq->fasync, SIGIO, POLL_IN);

Since queue_request() and queue_forget() also call kill_fasync() inside
the spinlock this should be safe.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

ae3aad77

fuse: extract fuse_fill_super_common() · 0cc2656c

由 Stefan Hajnoczi 提交于 6月 13, 2018

fuse_fill_super() includes code to process the fd= option and link the
struct fuse_dev to the fd's struct file.  In virtio-fs there is no file
descriptor because /dev/fuse is not used.

This patch extracts fuse_fill_super_common() so that both classic fuse and
virtio-fs can share the code to initialize a mount.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

0cc2656c

fuse: export fuse_send_init_request() · 95a84cdb

由 Vivek Goyal 提交于 3月 06, 2019

This will be used by virtio-fs to send init request to fuse server after
initialization of virt queues.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

95a84cdb

fuse: fix request limit · f22f812d

由 Miklos Szeredi 提交于 9月 12, 2019

The size of struct fuse_req was reduced from 392B to 144B on a non-debug
config, thus the sanitize_global_limit() helper was setting a larger
default limit.  This doesn't really reflect reduction in the memory used by
requests, since the fields removed from fuse_req were added to fuse_args
derived structs; e.g. sizeof(struct fuse_writepages_args) is 248B, thus
resulting in slightly more memory being used for writepage requests
overalll (due to using 256B slabs).

Make the calculatation ignore the size of fuse_req and use the old 392B
value.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

f22f812d

10 9月, 2019 5 次提交

fuse: convert init to simple api · 615047ef

由 Miklos Szeredi 提交于 9月 10, 2019

Bypass the fc->initialized check by setting the force flag.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

615047ef

fuse: convert destroy to simple api · 1ccd1ea2

由 Miklos Szeredi 提交于 9月 10, 2019

We can use the "force" flag to make sure the DESTROY request is always sent
to userspace.  So no need to keep it allocated during the lifetime of the
filesystem.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

1ccd1ea2

fuse: simplify 'nofail' request · 40ac7ab2

由 Miklos Szeredi 提交于 9月 10, 2019

Instead of complex games with a reserved request, just use __GFP_NOFAIL.

Both calers (flush, readdir) guarantee that connection was already
initialized, so no need to wait for fc->initialized.

Also remove unneeded clearing of FR_BACKGROUND flag.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

40ac7ab2

fuse: flatten 'struct fuse_args' · d5b48543

由 Miklos Szeredi 提交于 9月 10, 2019

...to make future expansion simpler.  The hiearachical structure is a
historical thing that does not serve any practical purpose.

The generated code is excatly the same before and after the patch.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

d5b48543

fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock · 76e43c8c

由 Eric Biggers 提交于 9月 08, 2019

When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs
and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock.

This may have to wait for fuse_iqueue::waitq.lock to be released by one
of many places that take it with IRQs enabled.  Since the IRQ handler
may take kioctx::ctx_lock, lockdep reports that a deadlock is possible.

Fix it by protecting the state of struct fuse_iqueue with a separate
spinlock, and only accessing fuse_iqueue::waitq using the versions of
the waitqueue functions which do IRQ-safe locking internally.

Reproducer:

	#include <fcntl.h>
	#include <stdio.h>
	#include <sys/mount.h>
	#include <sys/stat.h>
	#include <sys/syscall.h>
	#include <unistd.h>
	#include <linux/aio_abi.h>

	int main()
	{
		char opts[128];
		int fd = open("/dev/fuse", O_RDWR);
		aio_context_t ctx = 0;
		struct iocb cb = { .aio_lio_opcode = IOCB_CMD_POLL, .aio_fildes = fd };
		struct iocb *cbp = &cb;

		sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd);
		mkdir("mnt", 0700);
		mount("foo",  "mnt", "fuse", 0, opts);
		syscall(__NR_io_setup, 1, &ctx);
		syscall(__NR_io_submit, ctx, 1, &cbp);
	}

Beginning of lockdep output:

	=====================================================
	WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
	5.3.0-rc5 #9 Not tainted
	-----------------------------------------------------
	syz_fuse/135 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
	000000003590ceda (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
	000000003590ceda (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1751 [inline]
	000000003590ceda (&fiq->waitq){+.+.}, at: __io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825

	and this task is already holding:
	0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:363 [inline]
	0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1749 [inline]
	0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one.constprop.0+0x1f4/0x5b0 fs/aio.c:1825
	which would create a new lock dependency:
	 (&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.}

	but this new dependency connects a SOFTIRQ-irq-safe lock:
	 (&(&ctx->ctx_lock)->rlock){..-.}

	[...]

Reported-by: syzbot+af05535bb79520f95431@syzkaller.appspotmail.com
Reported-by: syzbot+d86c4426a01f60feddc7@syzkaller.appspotmail.com
Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL")
Cc: <stable@vger.kernel.org> # v4.19+
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

76e43c8c

07 9月, 2019 2 次提交

vfs: subtype handling moved to fuse · c7eb6869

由 David Howells 提交于 3月 25, 2019

The unused vfs code can be removed.  Don't pass empty subtype (same as if
->parse callback isn't called).

The bits that are left involve determining whether it's permitted to split the
filesystem type string passed in to mount(2).  Consequently, this means that we
cannot get rid of the FS_HAS_SUBTYPE flag unless we define that a type string
with a dot in it always indicates a subtype specification.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

c7eb6869

fuse: convert to use the new mount API · c30da2e9

由 David Howells 提交于 3月 25, 2019

Convert the fuse filesystem to the new internal mount API as the old
one will be obsoleted and removed.  This allows greater flexibility in
communication of mount parameters between userspace, the VFS and the
filesystem.

See Documentation/filesystems/mount_api.txt for more information.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

c30da2e9

08 5月, 2019 1 次提交

fuse: clean up fuse_alloc_inode · 9031a69c

由 zhangliguang 提交于 5月 06, 2019

This patch cleans up fuse_alloc_inode function, just simply the code, no
logic change.
Signed-off-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

9031a69c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功