1. 14 7月, 2020 3 次提交
    • M
      fuse: reject options on reconfigure via fsconfig(2) · b330966f
      Miklos Szeredi 提交于
      Previous patch changed handling of remount/reconfigure to ignore all
      options, including those that are unknown to the fuse kernel fs.  This was
      done for backward compatibility, but this likely only affects the old
      mount(2) API.
      
      The new fsconfig(2) based reconfiguration could possibly be improved.  This
      would make the new API less of a drop in replacement for the old, OTOH this
      is a good chance to get rid of some weirdnesses in the old API.
      
      Several other behaviors might make sense:
      
       1) unknown options are rejected, known options are ignored
      
       2) unknown options are rejected, known options are rejected if the value
       is changed, allowed otherwise
      
       3) all options are rejected
      
      Prior to the backward compatibility fix to ignore all options all known
      options were accepted (1), even if they change the value of a mount
      parameter; fuse_reconfigure() does not look at the config values set by
      fuse_parse_param().
      
      To fix that we'd need to verify that the value provided is the same as set
      in the initial configuration (2).  The major drawback is that this is much
      more complex than just rejecting all attempts at changing options (3);
      i.e. all options signify initial configuration values and don't make sense
      on reconfigure.
      
      This patch opts for (3) with the rationale that no mount options are
      reconfigurable in fuse.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      b330966f
    • M
      fuse: ignore 'data' argument of mount(..., MS_REMOUNT) · e8b20a47
      Miklos Szeredi 提交于
      The command
      
        mount -o remount -o unknownoption /mnt/fuse
      
      succeeds on kernel versions prior to v5.4 and fails on kernel version at or
      after.  This is because fuse_parse_param() rejects any unrecognised options
      in case of FS_CONTEXT_FOR_RECONFIGURE, just as for FS_CONTEXT_FOR_MOUNT.
      
      This causes a regression in case the fuse filesystem is in fstab, since
      remount sends all options found there to the kernel; even ones that are
      meant for the initial mount and are consumed by the userspace fuse server.
      
      Fix this by ignoring mount options, just as fuse_remount_fs() did prior to
      the conversion to the new API.
      Reported-by: NStefan Priebe <s.priebe@profihost.ag>
      Fixes: c30da2e9 ("fuse: convert to use the new mount API")
      Cc: <stable@vger.kernel.org> # v5.4
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e8b20a47
    • M
      fuse: use ->reconfigure() instead of ->remount_fs() · 0189a2d3
      Miklos Szeredi 提交于
      s_op->remount_fs() is only called from legacy_reconfigure(), which is not
      used after being converted to the new API.
      
      Convert to using ->reconfigure().  This restores the previous behavior of
      syncing the filesystem and rejecting MS_MANDLOCK on remount.
      
      Fixes: c30da2e9 ("fuse: convert to use the new mount API")
      Cc: <stable@vger.kernel.org> # v5.4
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      0189a2d3
  2. 19 5月, 2020 2 次提交
    • M
      fuse: update attr_version counter on fuse_notify_inval_inode() · 5ddd9ced
      Miklos Szeredi 提交于
      A GETATTR request can race with FUSE_NOTIFY_INVAL_INODE, resulting in the
      attribute cache being updated with stale information after the
      invalidation.
      
      Fix this by bumping the attribute version in fuse_reverse_inval_inode().
      Reported-by: NKrzysztof Rusek <rusek@9livesdata.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5ddd9ced
    • V
      virtiofs: do not use fuse_fill_super_common() for device installation · 7fd3abfa
      Vivek Goyal 提交于
      fuse_fill_super_common() allocates and installs one fuse_device.  Hence
      virtiofs allocates and install all fuse devices by itself except one.
      
      This makes logic little twisted.  There does not seem to be any real need
      that why virtiofs can't allocate and install all fuse devices itself.
      
      So opt out of fuse device allocation and installation while calling
      fuse_fill_super_common().
      
      Regular fuse still wants fuse_fill_super_common() to install fuse_device.
      It needs to prevent against races where two mounters are trying to mount
      fuse using same fd.  In that case one will succeed while other will get
      -EINVAL.
      
      virtiofs does not have this issue because sget_fc() resolves the race
      w.r.t multiple mounters and only one instance of virtio_fs_fill_super()
      should be in progress for same filesystem.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      7fd3abfa
  3. 08 2月, 2020 3 次提交
  4. 06 2月, 2020 1 次提交
    • Z
      fuse: use true,false for bool variable · cabdb4fa
      zhengbin 提交于
      Fixes coccicheck warning:
      
      fs/fuse/readdir.c:335:1-19: WARNING: Assignment of 0/1 to bool variable
      fs/fuse/file.c:1398:2-19: WARNING: Assignment of 0/1 to bool variable
      fs/fuse/file.c:1400:2-20: WARNING: Assignment of 0/1 to bool variable
      fs/fuse/cuse.c:454:1-20: WARNING: Assignment of 0/1 to bool variable
      fs/fuse/cuse.c:455:1-19: WARNING: Assignment of 0/1 to bool variable
      fs/fuse/inode.c:497:2-17: WARNING: Assignment of 0/1 to bool variable
      fs/fuse/inode.c:504:2-23: WARNING: Assignment of 0/1 to bool variable
      fs/fuse/inode.c:511:2-22: WARNING: Assignment of 0/1 to bool variable
      fs/fuse/inode.c:518:2-23: WARNING: Assignment of 0/1 to bool variable
      fs/fuse/inode.c:522:2-26: WARNING: Assignment of 0/1 to bool variable
      fs/fuse/inode.c:526:2-18: WARNING: Assignment of 0/1 to bool variable
      fs/fuse/inode.c:1000:1-20: WARNING: Assignment of 0/1 to bool variable
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      cabdb4fa
  5. 15 10月, 2019 1 次提交
  6. 24 9月, 2019 1 次提交
  7. 19 9月, 2019 1 次提交
    • S
      virtio-fs: add virtiofs filesystem · a62a8ef9
      Stefan Hajnoczi 提交于
      Add a basic file system module for virtio-fs.  This does not yet contain
      shared data support between host and guest or metadata coherency speedups.
      However it is already significantly faster than virtio-9p.
      
      Design Overview
      ===============
      
      With the goal of designing something with better performance and local file
      system semantics, a bunch of ideas were proposed.
      
       - Use fuse protocol (instead of 9p) for communication between guest and
         host.  Guest kernel will be fuse client and a fuse server will run on
         host to serve the requests.
      
       - For data access inside guest, mmap portion of file in QEMU address space
         and guest accesses this memory using dax.  That way guest page cache is
         bypassed and there is only one copy of data (on host).  This will also
         enable mmap(MAP_SHARED) between guests.
      
       - For metadata coherency, there is a shared memory region which contains
         version number associated with metadata and any guest changing metadata
         updates version number and other guests refresh metadata on next access.
         This is yet to be implemented.
      
      How virtio-fs differs from existing approaches
      ==============================================
      
      The unique idea behind virtio-fs is to take advantage of the co-location of
      the virtual machine and hypervisor to avoid communication (vmexits).
      
      DAX allows file contents to be accessed without communication with the
      hypervisor.  The shared memory region for metadata avoids communication in
      the common case where metadata is unchanged.
      
      By replacing expensive communication with cheaper shared memory accesses,
      we expect to achieve better performance than approaches based on network
      file system protocols.  In addition, this also makes it easier to achieve
      local file system semantics (coherency).
      
      These techniques are not applicable to network file system protocols since
      the communications channel is bypassed by taking advantage of shared memory
      on a local machine.  This is why we decided to build virtio-fs rather than
      focus on 9P or NFS.
      
      Caching Modes
      =============
      
      Like virtio-9p, different caching modes are supported which determine the
      coherency level as well.  The “cache=FOO” and “writeback” options control
      the level of coherence between the guest and host filesystems.
      
       - cache=none
         metadata, data and pathname lookup are not cached in guest.  They are
         always fetched from host and any changes are immediately pushed to host.
      
       - cache=always
         metadata, data and pathname lookup are cached in guest and never expire.
      
       - cache=auto
         metadata and pathname lookup cache expires after a configured amount of
         time (default is 1 second).  Data is cached while the file is open
         (close to open consistency).
      
       - writeback/no_writeback
         These options control the writeback strategy.  If writeback is disabled,
         then normal writes will immediately be synchronized with the host fs.
         If writeback is enabled, then writes may be cached in the guest until
         the file is closed or an fsync(2) performed.  This option has no effect
         on mmap-ed writes or writes going through the DAX mechanism.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      a62a8ef9
  8. 12 9月, 2019 7 次提交
  9. 10 9月, 2019 5 次提交
    • M
      fuse: convert init to simple api · 615047ef
      Miklos Szeredi 提交于
      Bypass the fc->initialized check by setting the force flag.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      615047ef
    • M
      fuse: convert destroy to simple api · 1ccd1ea2
      Miklos Szeredi 提交于
      We can use the "force" flag to make sure the DESTROY request is always sent
      to userspace.  So no need to keep it allocated during the lifetime of the
      filesystem.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      1ccd1ea2
    • M
      fuse: simplify 'nofail' request · 40ac7ab2
      Miklos Szeredi 提交于
      Instead of complex games with a reserved request, just use __GFP_NOFAIL.
      
      Both calers (flush, readdir) guarantee that connection was already
      initialized, so no need to wait for fc->initialized.
      
      Also remove unneeded clearing of FR_BACKGROUND flag.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      40ac7ab2
    • M
      fuse: flatten 'struct fuse_args' · d5b48543
      Miklos Szeredi 提交于
      ...to make future expansion simpler.  The hiearachical structure is a
      historical thing that does not serve any practical purpose.
      
      The generated code is excatly the same before and after the patch.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d5b48543
    • E
      fuse: fix deadlock with aio poll and fuse_iqueue::waitq.lock · 76e43c8c
      Eric Biggers 提交于
      When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs
      and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock.
      
      This may have to wait for fuse_iqueue::waitq.lock to be released by one
      of many places that take it with IRQs enabled.  Since the IRQ handler
      may take kioctx::ctx_lock, lockdep reports that a deadlock is possible.
      
      Fix it by protecting the state of struct fuse_iqueue with a separate
      spinlock, and only accessing fuse_iqueue::waitq using the versions of
      the waitqueue functions which do IRQ-safe locking internally.
      
      Reproducer:
      
      	#include <fcntl.h>
      	#include <stdio.h>
      	#include <sys/mount.h>
      	#include <sys/stat.h>
      	#include <sys/syscall.h>
      	#include <unistd.h>
      	#include <linux/aio_abi.h>
      
      	int main()
      	{
      		char opts[128];
      		int fd = open("/dev/fuse", O_RDWR);
      		aio_context_t ctx = 0;
      		struct iocb cb = { .aio_lio_opcode = IOCB_CMD_POLL, .aio_fildes = fd };
      		struct iocb *cbp = &cb;
      
      		sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd);
      		mkdir("mnt", 0700);
      		mount("foo",  "mnt", "fuse", 0, opts);
      		syscall(__NR_io_setup, 1, &ctx);
      		syscall(__NR_io_submit, ctx, 1, &cbp);
      	}
      
      Beginning of lockdep output:
      
      	=====================================================
      	WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
      	5.3.0-rc5 #9 Not tainted
      	-----------------------------------------------------
      	syz_fuse/135 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      	000000003590ceda (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline]
      	000000003590ceda (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1751 [inline]
      	000000003590ceda (&fiq->waitq){+.+.}, at: __io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825
      
      	and this task is already holding:
      	0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:363 [inline]
      	0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1749 [inline]
      	0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one.constprop.0+0x1f4/0x5b0 fs/aio.c:1825
      	which would create a new lock dependency:
      	 (&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.}
      
      	but this new dependency connects a SOFTIRQ-irq-safe lock:
      	 (&(&ctx->ctx_lock)->rlock){..-.}
      
      	[...]
      
      Reported-by: syzbot+af05535bb79520f95431@syzkaller.appspotmail.com
      Reported-by: syzbot+d86c4426a01f60feddc7@syzkaller.appspotmail.com
      Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL")
      Cc: <stable@vger.kernel.org> # v4.19+
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      76e43c8c
  10. 07 9月, 2019 2 次提交
  11. 08 5月, 2019 1 次提交
  12. 02 5月, 2019 1 次提交
    • A
      fuse: switch to ->free_inode() · 9baf28bb
      Al Viro 提交于
      fuse_destroy_inode() is gone - sanity checks that need the stack
      trace of the caller get moved into ->evict_inode(), the rest joins
      the RCU-delayed part which becomes ->free_inode().
      
      While we are at it, don't just pass the address of what happens
      to be the first member of structure to kmem_cache_free() -
      get_fuse_inode() is there for purpose and it gives the proper
      container_of() use.  No behaviour change, but verifying correctness
      is easier that way.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9baf28bb
  13. 24 4月, 2019 2 次提交
    • K
      fuse: allow filesystems to have precise control over data cache · ad2ba64d
      Kirill Smelkov 提交于
      On networked filesystems file data can be changed externally.  FUSE
      provides notification messages for filesystem to inform kernel that
      metadata or data region of a file needs to be invalidated in local page
      cache. That provides the basis for filesystem implementations to invalidate
      kernel cache explicitly based on observed filesystem-specific events.
      
      FUSE has also "automatic" invalidation mode(*) when the kernel
      automatically invalidates data cache of a file if it sees mtime change.  It
      also automatically invalidates whole data cache of a file if it sees file
      size being changed.
      
      The automatic mode has corresponding capability - FUSE_AUTO_INVAL_DATA.
      However, due to probably historical reason, that capability controls only
      whether mtime change should be resulting in automatic invalidation or
      not. A change in file size always results in invalidating whole data cache
      of a file irregardless of whether FUSE_AUTO_INVAL_DATA was negotiated(+).
      
      The filesystem I write[1] represents data arrays stored in networked
      database as local files suitable for mmap. It is read-only filesystem -
      changes to data are committed externally via database interfaces and the
      filesystem only glues data into contiguous file streams suitable for mmap
      and traditional array processing. The files are big - starting from
      hundreds gigabytes and more. The files change regularly, and frequently by
      data being appended to their end. The size of files thus changes
      frequently.
      
      If a file was accessed locally and some part of its data got into page
      cache, we want that data to stay cached unless there is memory pressure, or
      unless corresponding part of the file was actually changed. However current
      FUSE behaviour - when it sees file size change - is to invalidate the whole
      file. The data cache of the file is thus completely lost even on small size
      change, and despite that the filesystem server is careful to accurately
      translate database changes into FUSE invalidation messages to kernel.
      
      Let's fix it: if a filesystem, through new FUSE_EXPLICIT_INVAL_DATA
      capability, indicates to kernel that it is fully responsible for data cache
      invalidation, then the kernel won't invalidate files data cache on size
      change and only truncate that cache to new size in case the size decreased.
      
      (*) see 72d0d248 "fuse: add FUSE_AUTO_INVAL_DATA init flag",
      eed2179e "fuse: invalidate inode mapping if mtime changes"
      
      (+) in writeback mode the kernel does not invalidate data cache on file
      size change, but neither it allows the filesystem to set the size due to
      external event (see 8373200b "fuse: Trust kernel i_size only")
      
      [1] https://lab.nexedi.com/kirr/wendelin.core/blob/a50f1d9f/wcfs/wcfs.go#L20Signed-off-by: NKirill Smelkov <kirr@nexedi.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      ad2ba64d
    • K
      fuse: convert printk -> pr_* · f2294482
      Kirill Smelkov 提交于
      Functions, like pr_err, are a more modern variant of printing compared to
      printk. They could be used to denoise sources by using needed level in
      the print function name, and by automatically inserting per-driver /
      function / ... print prefix as defined by pr_fmt macro. pr_* are also
      said to be used in Documentation/process/coding-style.rst and more
      recent code - for example overlayfs - uses them instead of printk.
      
      Convert CUSE and FUSE to use the new pr_* functions.
      
      CUSE output stays completely unchanged, while FUSE output is amended a
      bit for "trying to steal weird page" warning - the second line now comes
      also with "fuse:" prefix. I hope it is ok.
      Suggested-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NKirill Smelkov <kirr@nexedi.com>
      Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      f2294482
  14. 13 3月, 2019 1 次提交
  15. 13 2月, 2019 6 次提交
  16. 16 1月, 2019 1 次提交
  17. 29 12月, 2018 1 次提交
  18. 10 12月, 2018 1 次提交
    • T
      fuse: Fix memory leak in fuse_dev_free() · d72f70da
      Takeshi Misawa 提交于
      When ntfs is unmounted, the following leak is
      reported by kmemleak.
      
      kmemleak report:
      
      unreferenced object 0xffff880052bf4400 (size 4096):
        comm "mount.ntfs", pid 16530, jiffies 4294861127 (age 3215.836s)
        hex dump (first 32 bytes):
          00 44 bf 52 00 88 ff ff 00 44 bf 52 00 88 ff ff  .D.R.....D.R....
          10 44 bf 52 00 88 ff ff 10 44 bf 52 00 88 ff ff  .D.R.....D.R....
        backtrace:
          [<00000000bf4a2f8d>] fuse_fill_super+0xb22/0x1da0 [fuse]
          [<000000004dde0f0c>] mount_bdev+0x263/0x320
          [<0000000025aebc66>] mount_fs+0x82/0x2bf
          [<0000000042c5a6be>] vfs_kern_mount.part.33+0xbf/0x480
          [<00000000ed10cd5b>] do_mount+0x3de/0x2ad0
          [<00000000d59ff068>] ksys_mount+0xba/0xd0
          [<000000001bda1bcc>] __x64_sys_mount+0xba/0x150
          [<00000000ebe26304>] do_syscall_64+0x151/0x490
          [<00000000d25f2b42>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
          [<000000002e0abd2c>] 0xffffffffffffffff
      
      fuse_dev_alloc() allocate fud->pq.processing.
      But this hash table is not freed.
      
      Fix this by freeing fud->pq.processing.
      Signed-off-by: NTakeshi Misawa <jeliantsurux@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Fixes: be2ff42c ("fuse: Use hash table to link processing request")
      d72f70da