1. 06 9月, 2021 1 次提交
    • M
      fuse: wait for writepages in syncfs · 660585b5
      Miklos Szeredi 提交于
      In case of fuse the MM subsystem doesn't guarantee that page writeback
      completes by the time ->sync_fs() is called.  This is because fuse
      completes page writeback immediately to prevent DoS of memory reclaim by
      the userspace file server.
      
      This means that fuse itself must ensure that writes are synced before
      sending the SYNCFS request to the server.
      
      Introduce sync buckets, that hold a counter for the number of outstanding
      write requests.  On syncfs replace the current bucket with a new one and
      wait until the old bucket's counter goes down to zero.
      
      It is possible to have multiple syncfs calls in parallel, in which case
      there could be more than one waited-on buckets.  Descendant buckets must
      not complete until the parent completes.  Add a count to the child (new)
      bucket until the (parent) old bucket completes.
      
      Use RCU protection to dereference the current bucket and to wake up an
      emptied bucket.  Use fc->lock to protect against parallel assignments to
      the current bucket.
      
      This leaves just the counter to be a possible scalability issue.  The
      fc->num_waiting counter has a similar issue, so both should be addressed at
      the same time.
      Reported-by: NAmir Goldstein <amir73il@gmail.com>
      Fixes: 2d82ab25 ("virtiofs: propagate sync() to file server")
      Cc: <stable@vger.kernel.org> # v5.14
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      660585b5
  2. 05 8月, 2021 2 次提交
    • M
      fuse: allow sharing existing sb · 5d5b74aa
      Miklos Szeredi 提交于
      Make it possible to create a new mount from a already working server.
      
      Here's a detailed description of the problem from Jakob:
      
        "The background for this question is occasional problems we see with our
         fuse filesystem [1] and mount namespaces. On a usual client, we have
         system-wide, autofs managed mountpoints. When a new mount namespace is
         created (which can be done unprivileged in combination with user
         namespaces), it can happen that a mountpoint is used inside the new
         namespace but idle in the root mount namespace. So autofs unmounts the
         parent, system-wide mountpoint. But the fuse module stays active and
         still serves mountpoint in the child mount namespace. Because the fuse
         daemon also blocks other system wide resources corresponding to the
         mountpoint, this situation effectively prevents new mounts until the
         child mount namespaces closes.
      
         [1] https://github.com/cvmfs/cvmfs"
      Reported-by: NJakob Blomer <jblomer@cern.ch>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5d5b74aa
    • M
      fuse: move fget() to fuse_get_tree() · 62dd1fc8
      Miklos Szeredi 提交于
      Affected call chains:
      
      fuse_get_tree
         -> get_tree_(bdev|nodev)
            -> fuse_fill_super
      
      Needed for following patch.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      62dd1fc8
  3. 04 8月, 2021 2 次提交
  4. 13 7月, 2021 1 次提交
  5. 22 6月, 2021 5 次提交
    • A
      fuse: fix illegal access to inode with reused nodeid · 15db1683
      Amir Goldstein 提交于
      Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
      with ourarg containing nodeid and generation.
      
      If a fuse inode is found in inode cache with the same nodeid but different
      generation, the existing fuse inode should be unhashed and marked "bad" and
      a new inode with the new generation should be hashed instead.
      
      This can happen, for example, with passhrough fuse filesystem that returns
      the real filesystem ino/generation on lookup and where real inode numbers
      can get recycled due to real files being unlinked not via the fuse
      passthrough filesystem.
      
      With current code, this situation will not be detected and an old fuse
      dentry that used to point to an older generation real inode, can be used to
      access a completely new inode, which should be accessed only via the new
      dentry.
      
      Note that because the FORGET message carries the nodeid w/o generation, the
      server should wait to get FORGET counts for the nlookup counts of the old
      and reused inodes combined, before it can free the resources associated to
      that nodeid.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      15db1683
    • G
      fuse: Make fuse_fill_super_submount() static · 1b539917
      Greg Kurz 提交于
      This function used to be called from fuse_dentry_automount(). This code
      was moved to fuse_get_tree_submount() in the same file since then. It
      is unlikely there will ever be another user. No need to be extern in
      this case.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      1b539917
    • G
      fuse: Call vfs_get_tree() for submounts · 266eb3f2
      Greg Kurz 提交于
      We recently fixed an infinite loop by setting the SB_BORN flag on
      submounts along with the write barrier needed by super_cache_count().
      This is the job of vfs_get_tree() and FUSE shouldn't have to care
      about the barrier at all.
      
      Split out some code from fuse_dentry_automount() to the dedicated
      fuse_get_tree_submount() handler for submounts and call vfs_get_tree().
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      266eb3f2
    • G
      fuse: add dedicated filesystem context ops for submounts · fe0a7bd8
      Greg Kurz 提交于
      The creation of a submount is open-coded in fuse_dentry_automount().
      This brings a lot of complexity and we recently had to fix bugs
      because we weren't setting SB_BORN or because we were unlocking
      sb->s_umount before sb was fully configured. Most of these could
      have been avoided by using the mount API instead of open-coding.
      
      Basically, this means coming up with a proper ->get_tree()
      implementation for submounts and call vfs_get_tree(), or better
      fc_mount().
      
      The creation of the superblock for submounts is quite different from
      the root mount. Especially, it doesn't require to allocate a FUSE
      filesystem context, nor to parse parameters.
      
      Introduce a dedicated context ops for submounts to make this clear.
      This is just a placeholder for now, fuse_get_tree_submount() will
      be populated in a subsequent patch.
      
      Only visible change is that we stop allocating/freeing a useless FUSE
      filesystem context with submounts.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      fe0a7bd8
    • G
      virtiofs: propagate sync() to file server · 2d82ab25
      Greg Kurz 提交于
      Even if POSIX doesn't mandate it, linux users legitimately expect sync() to
      flush all data and metadata to physical storage when it is located on the
      same system.  This isn't happening with virtiofs though: sync() inside the
      guest returns right away even though data still needs to be flushed from
      the host page cache.
      
      This is easily demonstrated by doing the following in the guest:
      
      $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync
      5120+0 records in
      5120+0 records out
      5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s
      sync()                                  = 0 <0.024068>
      
      and start the following in the host when the 'dd' command completes
      in the guest:
      
      $ strace -T -e fsync /usr/bin/sync virtiofs/foo
      fsync(3)                                = 0 <10.371640>
      
      There are no good reasons not to honor the expected behavior of sync()
      actually: it gives an unrealistic impression that virtiofs is super fast
      and that data has safely landed on HW, which isn't the case obviously.
      
      Implement a ->sync_fs() superblock operation that sends a new FUSE_SYNCFS
      request type for this purpose.  Provision a 64-bit placeholder for possible
      future extensions.  Since the file server cannot handle the wait == 0 case,
      we skip it to avoid a gratuitous roundtrip.  Note that this is
      per-superblock: a FUSE_SYNCFS is send for the root mount and for each
      submount.
      
      Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for FUSE_SYNCFS in
      the file server is treated as permanent success.  This ensures
      compatibility with older file servers: the client will get the current
      behavior of sync() not being propagated to the file server.
      
      Note that such an operation allows the file server to DoS sync().  Since a
      typical FUSE file server is an untrusted piece of software running in
      userspace, this is disabled by default.  Only enable it with virtiofs for
      now since virtiofsd is supposedly trusted by the guest kernel.
      Reported-by: NRobert Krawitz <rlk@redhat.com>
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2d82ab25
  6. 16 4月, 2021 1 次提交
  7. 14 4月, 2021 2 次提交
    • C
      virtiofs: split requests that exceed virtqueue size · a7f0d7aa
      Connor Kuehl 提交于
      If an incoming FUSE request can't fit on the virtqueue, the request is
      placed onto a workqueue so a worker can try to resubmit it later where
      there will (hopefully) be space for it next time.
      
      This is fine for requests that aren't larger than a virtqueue's maximum
      capacity.  However, if a request's size exceeds the maximum capacity of the
      virtqueue (even if the virtqueue is empty), it will be doomed to a life of
      being placed on the workqueue, removed, discovered it won't fit, and placed
      on the workqueue yet again.
      
      Furthermore, from section 2.6.5.3.1 (Driver Requirements: Indirect
      Descriptors) of the virtio spec:
      
        "A driver MUST NOT create a descriptor chain longer than the Queue
        Size of the device."
      
      To fix this, limit the number of pages FUSE will use for an overall
      request.  This way, each request can realistically fit on the virtqueue
      when it is decomposed into a scattergather list and avoid violating section
      2.6.5.3.1 of the virtio spec.
      Signed-off-by: NConnor Kuehl <ckuehl@redhat.com>
      Reviewed-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      a7f0d7aa
    • V
      fuse: extend FUSE_SETXATTR request · 52a4c95f
      Vivek Goyal 提交于
      Fuse client needs to send additional information to file server when it
      calls SETXATTR(system.posix_acl_access), so add extra flags field to the
      structure.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      52a4c95f
  8. 08 3月, 2021 1 次提交
  9. 10 12月, 2020 1 次提交
    • M
      fuse: fix bad inode · 5d069dbe
      Miklos Szeredi 提交于
      Jan Kara's analysis of the syzbot report (edited):
      
        The reproducer opens a directory on FUSE filesystem, it then attaches
        dnotify mark to the open directory.  After that a fuse_do_getattr() call
        finds that attributes returned by the server are inconsistent, and calls
        make_bad_inode() which, among other things does:
      
                inode->i_mode = S_IFREG;
      
        This then confuses dnotify which doesn't tear down its structures
        properly and eventually crashes.
      
      Avoid calling make_bad_inode() on a live inode: switch to a private flag on
      the fuse inode.  Also add the test to ops which the bad_inode_ops would
      have caught.
      
      This bug goes back to the initial merge of fuse in 2.6.14...
      
      Reported-by: syzbot+f427adf9324b92652ccc@syzkaller.appspotmail.com
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Tested-by: NJan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org>
      5d069dbe
  10. 12 11月, 2020 5 次提交
    • V
      fuse: support SB_NOSEC flag to improve write performance · 9d769e6a
      Vivek Goyal 提交于
      Virtiofs can be slow with small writes if xattr are enabled and we are
      doing cached writes (No direct I/O).  Ganesh Mahalingam noticed this.
      
      Some debugging showed that file_remove_privs() is called in cached write
      path on every write.  And everytime it calls security_inode_need_killpriv()
      which results in call to __vfs_getxattr(XATTR_NAME_CAPS).  And this goes to
      file server to fetch xattr.  This extra round trip for every write slows
      down writes tremendously.
      
      Normally to avoid paying this penalty on every write, vfs has the notion of
      caching this information in inode (S_NOSEC).  So vfs sets S_NOSEC, if
      filesystem opted for it using super block flag SB_NOSEC.  And S_NOSEC is
      cleared when setuid/setgid bit is set or when security xattr is set on
      inode so that next time a write happens, we check inode again for clearing
      setuid/setgid bits as well clear any security.capability xattr.
      
      This seems to work well for local file systems but for remote file systems
      it is possible that VFS does not have full picture and a different client
      sets setuid/setgid bit or security.capability xattr on file and that means
      VFS information about S_NOSEC on another client will be stale.  So for
      remote filesystems SB_NOSEC was disabled by default.
      
      Commit 9e1f1de0 ("more conservative S_NOSEC handling") mentioned that
      these filesystems can still make use of SB_NOSEC as long as they clear
      S_NOSEC when they are refreshing inode attriutes from server.
      
      So this patch tries to enable SB_NOSEC on fuse (regular fuse as well as
      virtiofs).  And clear SB_NOSEC when we are refreshing inode attributes.
      
      This is enabled only if server supports FUSE_HANDLE_KILLPRIV_V2.  This says
      that server will clear setuid/setgid/security.capability on
      chown/truncate/write as apporpriate.
      
      This should provide tighter coherency because now suid/sgid/
      security.capability will be cleared even if fuse client cache has not seen
      these attrs.
      
      Basic idea is that fuse client will trigger suid/sgid/security.capability
      clearing based on its attr cache.  But even if cache has gone stale, it is
      fine because FUSE_HANDLE_KILLPRIV_V2 will make sure WRITE clear
      suid/sgid/security.capability.
      
      We make this change only if server supports FUSE_HANDLE_KILLPRIV_V2.  This
      should make sure that existing filesystems which might be relying on
      seucurity.capability always being queried from server are not impacted.
      
      This tighter coherency relies on WRITE showing up on server (and not being
      cached in guest).  So writeback_cache mode will not provide that tight
      coherency and it is not recommended to use two together.  Having said that
      it might work reasonably well for lot of use cases.
      
      This change improves random write performance very significantly.  Running
      virtiofsd with cache=auto and following fio command:
      
      fio --ioengine=libaio --direct=1  --name=test --filename=/mnt/virtiofs/random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randwrite
      
      Bandwidth increases from around 50MB/s to around 250MB/s as a result of
      applying this patch.  So improvement is very significant.
      
      Link: https://github.com/kata-containers/runtime/issues/2815Reported-by: N"Mahalingam, Ganesh" <ganesh.mahalingam@intel.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      9d769e6a
    • V
      fuse: introduce the notion of FUSE_HANDLE_KILLPRIV_V2 · 63f9909f
      Vivek Goyal 提交于
      We already have FUSE_HANDLE_KILLPRIV flag that says that file server will
      remove suid/sgid/caps on truncate/chown/write. But that's little different
      from what Linux VFS implements.
      
      To be consistent with Linux VFS behavior what we want is.
      
      - caps are always cleared on chown/write/truncate
      - suid is always cleared on chown, while for truncate/write it is cleared
        only if caller does not have CAP_FSETID.
      - sgid is always cleared on chown, while for truncate/write it is cleared
        only if caller does not have CAP_FSETID as well as file has group execute
        permission.
      
      As previous flag did not provide above semantics. Implement a V2 of the
      protocol with above said constraints.
      
      Server does not know if caller has CAP_FSETID or not. So for the case
      of write()/truncate(), client will send information in special flag to
      indicate whether to kill priviliges or not. These changes are in subsequent
      patches.
      
      FUSE_HANDLE_KILLPRIV_V2 relies on WRITE being sent to server to clear
      suid/sgid/security.capability. But with ->writeback_cache, WRITES are
      cached in guest. So it is not recommended to use FUSE_HANDLE_KILLPRIV_V2
      and writeback_cache together. Though it probably might be good enough
      for lot of use cases.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      63f9909f
    • M
      fuse: add fuse_sb_destroy() helper · 6a68d1e1
      Miklos Szeredi 提交于
      This is to avoid minor code duplication between fuse_kill_sb_anon() and
      fuse_kill_sb_blk().
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      6a68d1e1
    • M
      fuse: get rid of fuse_mount refcount · 514b5e3f
      Miklos Szeredi 提交于
      Fuse mount now only ever has a refcount of one (before being freed) so the
      count field is unnecessary.
      
      Remove the refcounting and fold fuse_mount_put() into callers.  The only
      caller of fuse_mount_put() where fm->fc was NULL is fuse_dentry_automount()
      and here the fuse_conn_put() can simply be omitted.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      514b5e3f
    • M
      virtiofs: simplify sb setup · b19d3d00
      Miklos Szeredi 提交于
      Currently when acquiring an sb for virtiofs fuse_mount_get() is being
      called from virtio_fs_set_super() if a new sb is being filled and
      fuse_mount_put() is called unconditionally after sget_fc() returns.
      
      The exact same result can be obtained by checking whether
      fs_contex->s_fs_info was set to NULL (ref trasferred to sb->s_fs_info) and
      only calling fuse_mount_put() if the ref wasn't transferred (error or
      matching sb found).
      
      This allows getting rid of virtio_fs_set_super() and fuse_mount_get().
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      b19d3d00
  11. 12 10月, 2020 1 次提交
  12. 09 10月, 2020 1 次提交
    • M
      fuse: implement crossmounts · bf109c64
      Max Reitz 提交于
      FUSE servers can indicate crossmount points by setting FUSE_ATTR_SUBMOUNT
      in fuse_attr.flags.  The inode will then be marked as S_AUTOMOUNT, and the
      .d_automount implementation creates a new submount at that location, so
      that the submount gets a distinct st_dev value.
      
      Note that all submounts get a distinct superblock and a distinct st_dev
      value, so for virtio-fs, even if the same filesystem is mounted more than
      once on the host, none of its mount points will have the same st_dev.  We
      need distinct superblocks because the superblock points to the root node,
      but the different host mounts may show different trees (e.g. due to
      submounts in some of them, but not in others).
      
      Right now, this behavior is only enabled when fuse_conn.auto_submounts is
      set, which is the case only for virtio-fs.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      bf109c64
  13. 25 9月, 2020 2 次提交
  14. 18 9月, 2020 2 次提交
    • M
      fuse: Allow fuse_fill_super_common() for submounts · 1866d779
      Max Reitz 提交于
      Submounts have their own superblock, which needs to be initialized.
      However, they do not have a fuse_fs_context associated with them, and
      the root node's attributes should be taken from the mountpoint's node.
      
      Extend fuse_fill_super_common() to work for submounts by making the @ctx
      parameter optional, and by adding a @submount_finode parameter.
      
      (There is a plain "unsigned" in an existing code block that is being
      indented by this commit.  Extend it to "unsigned int" so checkpatch does
      not complain.)
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      1866d779
    • M
      fuse: split fuse_mount off of fuse_conn · fcee216b
      Max Reitz 提交于
      We want to allow submounts for the same fuse_conn, but with different
      superblocks so that each of the submounts has its own device ID.  To do
      so, we need to split all mount-specific information off of fuse_conn
      into a new fuse_mount structure, so that multiple mounts can share a
      single fuse_conn.
      
      We need to take care only to perform connection-level actions once (i.e.
      when the fuse_conn and thus the first fuse_mount are established, or
      when the last fuse_mount and thus the fuse_conn are destroyed).  For
      example, fuse_sb_destroy() must invoke fuse_send_destroy() until the
      last superblock is released.
      
      To do so, we keep track of which fuse_mount is the root mount and
      perform all fuse_conn-level actions only when this fuse_mount is
      involved.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      fcee216b
  15. 10 9月, 2020 5 次提交
    • V
      virtiofs: serialize truncate/punch_hole and dax fault path · 6ae330ca
      Vivek Goyal 提交于
      Currently in fuse we don't seem have any lock which can serialize fault
      path with truncate/punch_hole path. With dax support I need one for
      following reasons.
      
      1. Dax requirement
      
        DAX fault code relies on inode size being stable for the duration of
        fault and want to serialize with truncate/punch_hole and they explicitly
        mention it.
      
        static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
                                     const struct iomap_ops *ops)
              /*
               * Check whether offset isn't beyond end of file now. Caller is
               * supposed to hold locks serializing us with truncate / punch hole so
               * this is a reliable test.
               */
              max_pgoff = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
      
      2. Make sure there are no users of pages being truncated/punch_hole
      
        get_user_pages() might take references to page and then do some DMA
        to said pages. Filesystem might truncate those pages without knowing
        that a DMA is in progress or some I/O is in progress. So use
        dax_layout_busy_page() to make sure there are no such references
        and I/O is not in progress on said pages before moving ahead with
        truncation.
      
      3. Limitation of kvm page fault error reporting
      
        If we are truncating file on host first and then removing mappings in
        guest lateter (truncate page cache etc), then this could lead to a
        problem with KVM. Say a mapping is in place in guest and truncation
        happens on host. Now if guest accesses that mapping, then host will
        take a fault and kvm will either exit to qemu or spin infinitely.
      
        IOW, before we do truncation on host, we need to make sure that guest
        inode does not have any mapping in that region or whole file.
      
      4. virtiofs memory range reclaim
      
       Soon I will introduce the notion of being able to reclaim dax memory
       ranges from a fuse dax inode. There also I need to make sure that
       no I/O or fault is going on in the reclaimed range and nobody is using
       it so that range can be reclaimed without issues.
      
      Currently if we take inode lock, that serializes read/write. But it does
      not do anything for faults. So I add another semaphore fuse_inode->i_mmap_sem
      for this purpose.  It can be used to serialize with faults.
      
      As of now, I am adding taking this semaphore only in dax fault path and
      not regular fault path because existing code does not have one. May
      be existing code can benefit from it as well to take care of some
      races, but that we can fix later if need be. For now, I am just focussing
      only on DAX path which is new path.
      
      Also added logic to take fuse_inode->i_mmap_sem in
      truncate/punch_hole/open(O_TRUNC) path to make sure file truncation and
      fuse dax fault are mutually exlusive and avoid all the above problems.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      6ae330ca
    • V
      virtiofs: implement dax read/write operations · c2d0ad00
      Vivek Goyal 提交于
      This patch implements basic DAX support. mmap() is not implemented
      yet and will come in later patches. This patch looks into implemeting
      read/write.
      
      We make use of interval tree to keep track of per inode dax mappings.
      
      Do not use dax for file extending writes, instead just send WRITE message
      to daemon (like we do for direct I/O path). This will keep write and
      i_size change atomic w.r.t crash.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: NPeng Tao <tao.peng@linux.alibaba.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      c2d0ad00
    • S
      virtiofs: implement FUSE_INIT map_alignment field · fd1a1dc6
      Stefan Hajnoczi 提交于
      The device communicates FUSE_SETUPMAPPING/FUSE_REMOVMAPPING alignment
      constraints via the FUST_INIT map_alignment field.  Parse this field and
      ensure our DAX mappings meet the alignment constraints.
      
      We don't actually align anything differently since our mappings are
      already 2MB aligned.  Just check the value when the connection is
      established.  If it becomes necessary to honor arbitrary alignments in
      the future we'll have to adjust how mappings are sized.
      
      The upshot of this commit is that we can be confident that mappings will
      work even when emulating x86 on Power and similar combinations where the
      host page sizes are different.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      fd1a1dc6
    • V
      virtiofs: add a mount option to enable dax · 1dd53957
      Vivek Goyal 提交于
      Add a mount option to allow using dax with virtio_fs.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      1dd53957
    • V
      virtiofs: get rid of no_mount_options · f4fd4ae3
      Vivek Goyal 提交于
      This option was introduced so that for virtio_fs we don't show any mounts
      options fuse_show_options(). Because we don't offer any of these options
      to be controlled by mounter.
      
      Very soon we are planning to introduce option "dax" which mounter should
      be able to specify. And no_mount_options does not work anymore.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      f4fd4ae3
  16. 14 7月, 2020 3 次提交
    • M
      fuse: reject options on reconfigure via fsconfig(2) · b330966f
      Miklos Szeredi 提交于
      Previous patch changed handling of remount/reconfigure to ignore all
      options, including those that are unknown to the fuse kernel fs.  This was
      done for backward compatibility, but this likely only affects the old
      mount(2) API.
      
      The new fsconfig(2) based reconfiguration could possibly be improved.  This
      would make the new API less of a drop in replacement for the old, OTOH this
      is a good chance to get rid of some weirdnesses in the old API.
      
      Several other behaviors might make sense:
      
       1) unknown options are rejected, known options are ignored
      
       2) unknown options are rejected, known options are rejected if the value
       is changed, allowed otherwise
      
       3) all options are rejected
      
      Prior to the backward compatibility fix to ignore all options all known
      options were accepted (1), even if they change the value of a mount
      parameter; fuse_reconfigure() does not look at the config values set by
      fuse_parse_param().
      
      To fix that we'd need to verify that the value provided is the same as set
      in the initial configuration (2).  The major drawback is that this is much
      more complex than just rejecting all attempts at changing options (3);
      i.e. all options signify initial configuration values and don't make sense
      on reconfigure.
      
      This patch opts for (3) with the rationale that no mount options are
      reconfigurable in fuse.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      b330966f
    • M
      fuse: ignore 'data' argument of mount(..., MS_REMOUNT) · e8b20a47
      Miklos Szeredi 提交于
      The command
      
        mount -o remount -o unknownoption /mnt/fuse
      
      succeeds on kernel versions prior to v5.4 and fails on kernel version at or
      after.  This is because fuse_parse_param() rejects any unrecognised options
      in case of FS_CONTEXT_FOR_RECONFIGURE, just as for FS_CONTEXT_FOR_MOUNT.
      
      This causes a regression in case the fuse filesystem is in fstab, since
      remount sends all options found there to the kernel; even ones that are
      meant for the initial mount and are consumed by the userspace fuse server.
      
      Fix this by ignoring mount options, just as fuse_remount_fs() did prior to
      the conversion to the new API.
      Reported-by: NStefan Priebe <s.priebe@profihost.ag>
      Fixes: c30da2e9 ("fuse: convert to use the new mount API")
      Cc: <stable@vger.kernel.org> # v5.4
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e8b20a47
    • M
      fuse: use ->reconfigure() instead of ->remount_fs() · 0189a2d3
      Miklos Szeredi 提交于
      s_op->remount_fs() is only called from legacy_reconfigure(), which is not
      used after being converted to the new API.
      
      Convert to using ->reconfigure().  This restores the previous behavior of
      syncing the filesystem and rejecting MS_MANDLOCK on remount.
      
      Fixes: c30da2e9 ("fuse: convert to use the new mount API")
      Cc: <stable@vger.kernel.org> # v5.4
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      0189a2d3
  17. 19 5月, 2020 2 次提交
    • M
      fuse: update attr_version counter on fuse_notify_inval_inode() · 5ddd9ced
      Miklos Szeredi 提交于
      A GETATTR request can race with FUSE_NOTIFY_INVAL_INODE, resulting in the
      attribute cache being updated with stale information after the
      invalidation.
      
      Fix this by bumping the attribute version in fuse_reverse_inval_inode().
      Reported-by: NKrzysztof Rusek <rusek@9livesdata.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5ddd9ced
    • V
      virtiofs: do not use fuse_fill_super_common() for device installation · 7fd3abfa
      Vivek Goyal 提交于
      fuse_fill_super_common() allocates and installs one fuse_device.  Hence
      virtiofs allocates and install all fuse devices by itself except one.
      
      This makes logic little twisted.  There does not seem to be any real need
      that why virtiofs can't allocate and install all fuse devices itself.
      
      So opt out of fuse device allocation and installation while calling
      fuse_fill_super_common().
      
      Regular fuse still wants fuse_fill_super_common() to install fuse_device.
      It needs to prevent against races where two mounters are trying to mount
      fuse using same fd.  In that case one will succeed while other will get
      -EINVAL.
      
      virtiofs does not have this issue because sget_fc() resolves the race
      w.r.t multiple mounters and only one instance of virtio_fs_fill_super()
      should be in progress for same filesystem.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      7fd3abfa
  18. 08 2月, 2020 3 次提交