1. 22 6月, 2021 3 次提交
    • A
      fuse: fix illegal access to inode with reused nodeid · 15db1683
      Amir Goldstein 提交于
      Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
      with ourarg containing nodeid and generation.
      
      If a fuse inode is found in inode cache with the same nodeid but different
      generation, the existing fuse inode should be unhashed and marked "bad" and
      a new inode with the new generation should be hashed instead.
      
      This can happen, for example, with passhrough fuse filesystem that returns
      the real filesystem ino/generation on lookup and where real inode numbers
      can get recycled due to real files being unlinked not via the fuse
      passthrough filesystem.
      
      With current code, this situation will not be detected and an old fuse
      dentry that used to point to an older generation real inode, can be used to
      access a completely new inode, which should be accessed only via the new
      dentry.
      
      Note that because the FORGET message carries the nodeid w/o generation, the
      server should wait to get FORGET counts for the nlookup counts of the old
      and reused inodes combined, before it can free the resources associated to
      that nodeid.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      15db1683
    • G
      fuse: Switch to fc_mount() for submounts · 29e0e4df
      Greg Kurz 提交于
      fc_mount() already handles the vfs_get_tree(), sb->s_umount
      unlocking and vfs_create_mount() sequence. Using it greatly
      simplifies fuse_dentry_automount().
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      29e0e4df
    • G
      fuse: Call vfs_get_tree() for submounts · 266eb3f2
      Greg Kurz 提交于
      We recently fixed an infinite loop by setting the SB_BORN flag on
      submounts along with the write barrier needed by super_cache_count().
      This is the job of vfs_get_tree() and FUSE shouldn't have to care
      about the barrier at all.
      
      Split out some code from fuse_dentry_automount() to the dedicated
      fuse_get_tree_submount() handler for submounts and call vfs_get_tree().
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      266eb3f2
  2. 09 6月, 2021 3 次提交
    • G
      fuse: Fix infinite loop in sget_fc() · e4a9ccdd
      Greg Kurz 提交于
      We don't set the SB_BORN flag on submounts. This is wrong as these
      superblocks are then considered as partially constructed or dying
      in the rest of the code and can break some assumptions.
      
      One such case is when you have a virtiofs filesystem with submounts
      and you try to mount it again : virtio_fs_get_tree() tries to obtain
      a superblock with sget_fc(). The logic in sget_fc() is to loop until
      it has either found an existing matching superblock with SB_BORN set
      or to create a brand new one. It is assumed that a superblock without
      SB_BORN is transient and the loop is restarted. Forgetting to set
      SB_BORN on submounts hence causes sget_fc() to retry forever.
      
      Setting SB_BORN requires special care, i.e. a write barrier for
      super_cache_count() which can check SB_BORN without taking any lock.
      We should call vfs_get_tree() to deal with that but this requires
      to have a proper ->get_tree() implementation for submounts, which
      is a bigger piece of work. Go for a simple bug fix in the meatime.
      
      Fixes: bf109c64 ("fuse: implement crossmounts")
      Cc: stable@vger.kernel.org # v5.10+
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e4a9ccdd
    • G
      fuse: Fix crash if superblock of submount gets killed early · e3a43f2a
      Greg Kurz 提交于
      As soon as fuse_dentry_automount() does up_write(&sb->s_umount), the
      superblock can theoretically be killed. If this happens before the
      submount was added to the &fc->mounts list, fuse_mount_remove() later
      crashes in list_del_init() because it assumes the submount to be
      already there.
      
      Add the submount before dropping sb->s_umount to fix the inconsistency.
      It is okay to nest fc->killsb under sb->s_umount, we already do this
      on the ->kill_sb() path.
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Fixes: bf109c64 ("fuse: implement crossmounts")
      Cc: stable@vger.kernel.org # v5.10+
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e3a43f2a
    • G
      fuse: Fix crash in fuse_dentry_automount() error path · d92d88f0
      Greg Kurz 提交于
      If fuse_fill_super_submount() returns an error, the error path
      triggers a crash:
      
      [   26.206673] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [...]
      [   26.226362] RIP: 0010:__list_del_entry_valid+0x25/0x90
      [...]
      [   26.247938] Call Trace:
      [   26.248300]  fuse_mount_remove+0x2c/0x70 [fuse]
      [   26.248892]  virtio_kill_sb+0x22/0x160 [virtiofs]
      [   26.249487]  deactivate_locked_super+0x36/0xa0
      [   26.250077]  fuse_dentry_automount+0x178/0x1a0 [fuse]
      
      The crash happens because fuse_mount_remove() assumes that the FUSE
      mount was already added to list under the FUSE connection, but this
      only done after fuse_fill_super_submount() has returned success.
      
      This means that until fuse_fill_super_submount() has returned success,
      the FUSE mount isn't actually owned by the superblock. We should thus
      reclaim ownership by clearing sb->s_fs_info, which will skip the call
      to fuse_mount_remove(), and perform rollback, like virtio_fs_get_tree()
      already does for the root sb.
      
      Fixes: bf109c64 ("fuse: implement crossmounts")
      Cc: stable@vger.kernel.org # v5.10+
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d92d88f0
  3. 12 4月, 2021 2 次提交
    • M
      fuse: convert to fileattr · 72227eac
      Miklos Szeredi 提交于
      Since fuse just passes ioctl args through to/from server, converting to the
      fileattr API is more involved, than most other filesystems.
      
      Both .fileattr_set() and .fileattr_get() need to obtain an open file to
      operate on.  The simplest way is with the following sequence:
      
        FUSE_OPEN
        FUSE_IOCTL
        FUSE_RELEASE
      
      If this turns out to be a performance problem, it could be optimized for
      the case when there's already a file (any file) open for the inode.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      72227eac
    • M
      fuse: unsigned open flags · 54d601cb
      Miklos Szeredi 提交于
      Release helpers used signed int.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      54d601cb
  4. 08 3月, 2021 1 次提交
  5. 24 1月, 2021 4 次提交
  6. 10 12月, 2020 1 次提交
    • M
      fuse: fix bad inode · 5d069dbe
      Miklos Szeredi 提交于
      Jan Kara's analysis of the syzbot report (edited):
      
        The reproducer opens a directory on FUSE filesystem, it then attaches
        dnotify mark to the open directory.  After that a fuse_do_getattr() call
        finds that attributes returned by the server are inconsistent, and calls
        make_bad_inode() which, among other things does:
      
                inode->i_mode = S_IFREG;
      
        This then confuses dnotify which doesn't tear down its structures
        properly and eventually crashes.
      
      Avoid calling make_bad_inode() on a live inode: switch to a private flag on
      the fuse inode.  Also add the test to ops which the bad_inode_ops would
      have caught.
      
      This bug goes back to the initial merge of fuse in 2.6.14...
      
      Reported-by: syzbot+f427adf9324b92652ccc@syzkaller.appspotmail.com
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Tested-by: NJan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org>
      5d069dbe
  7. 12 11月, 2020 5 次提交
    • V
      fuse: add a flag FUSE_OPEN_KILL_SUIDGID for open() request · 643a666a
      Vivek Goyal 提交于
      With FUSE_HANDLE_KILLPRIV_V2 support, server will need to kill suid/sgid/
      security.capability on open(O_TRUNC), if server supports
      FUSE_ATOMIC_O_TRUNC.
      
      But server needs to kill suid/sgid only if caller does not have CAP_FSETID.
      Given server does not have this information, client needs to send this info
      to server.
      
      So add a flag FUSE_OPEN_KILL_SUIDGID to fuse_open_in request which tells
      server to kill suid/sgid (only if group execute is set).
      
      This flag is added to the FUSE_OPEN request, as well as the FUSE_CREATE
      request if the create was non-exclusive, since that might result in an
      existing file being opened/truncated.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      643a666a
    • V
      fuse: don't send ATTR_MODE to kill suid/sgid for handle_killpriv_v2 · 8981bdfd
      Vivek Goyal 提交于
      If client does a write() on a suid/sgid file, VFS will first call
      fuse_setattr() with ATTR_KILL_S[UG]ID set.  This requires sending setattr
      to file server with ATTR_MODE set to kill suid/sgid.  But to do that client
      needs to know latest mode otherwise it is racy.
      
      To reduce the race window, current code first call fuse_do_getattr() to get
      latest ->i_mode and then resets suid/sgid bits and sends rest to server
      with setattr(ATTR_MODE).  This does not reduce the race completely but
      narrows race window significantly.
      
      With fc->handle_killpriv_v2 enabled, it should be possible to remove this
      race completely.  Do not kill suid/sgid with ATTR_MODE at all.  It will be
      killed by server when WRITE request is sent to server soon.  This is
      similar to fc->handle_killpriv logic.  V2 is just more refined version of
      protocol.  Hence this patch does not send ATTR_MODE to kill suid/sgid if
      fc->handle_killpriv_v2 is enabled.
      
      This creates an issue if fc->writeback_cache is enabled.  In that case
      WRITE can be cached in guest and server might not see WRITE request and
      hence will not kill suid/sgid.  Miklos suggested that in such cases, we
      should fallback to a writethrough WRITE instead and that will generate
      WRITE request and kill suid/sgid.  This patch implements that too.
      
      But this relies on client seeing the suid/sgid set.  If another client sets
      suid/sgid and this client does not see it immideately, then we will not
      fallback to writethrough WRITE.  So this is one limitation with both
      fc->handle_killpriv_v2 and fc->writeback_cache enabled.  Both the options
      are not fully compatible.  But might be good enough for many use cases.
      
      Note: This patch is not checking whether security.capability is set or not
            when falling back to writethrough path.  If suid/sgid is not set and
            only security.capability is set, that will be taken care of by
            file_remove_privs() call in ->writeback_cache path.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8981bdfd
    • V
      fuse: setattr should set FATTR_KILL_SUIDGID · 31792161
      Vivek Goyal 提交于
      If fc->handle_killpriv_v2 is enabled, we expect file server to clear
      suid/sgid/security.capbility upon chown/truncate/write as appropriate.
      
      Upon truncate (ATTR_SIZE), suid/sgid are cleared only if caller does not
      have CAP_FSETID.  File server does not know whether caller has CAP_FSETID
      or not.  Hence set FATTR_KILL_SUIDGID upon truncate to let file server know
      that caller does not have CAP_FSETID and it should kill suid/sgid as
      appropriate.
      
      On chown (ATTR_UID/ATTR_GID) suid/sgid need to be cleared irrespective of
      capabilities of calling process, so set FATTR_KILL_SUIDGID unconditionally
      in that case.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      31792161
    • M
      fuse: always revalidate if exclusive create · df8629af
      Miklos Szeredi 提交于
      Failure to do so may result in EEXIST even if the file only exists in the
      cache and not in the filesystem.
      
      The atomic nature of O_EXCL mandates that the cached state should be
      ignored and existence verified anew.
      Reported-by: NKen Schalk <kschalk@nvidia.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      df8629af
    • M
      fuse: get rid of fuse_mount refcount · 514b5e3f
      Miklos Szeredi 提交于
      Fuse mount now only ever has a refcount of one (before being freed) so the
      count field is unnecessary.
      
      Remove the refcounting and fold fuse_mount_put() into callers.  The only
      caller of fuse_mount_put() where fm->fc was NULL is fuse_dentry_automount()
      and here the fuse_conn_put() can simply be omitted.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      514b5e3f
  8. 09 10月, 2020 1 次提交
    • M
      fuse: implement crossmounts · bf109c64
      Max Reitz 提交于
      FUSE servers can indicate crossmount points by setting FUSE_ATTR_SUBMOUNT
      in fuse_attr.flags.  The inode will then be marked as S_AUTOMOUNT, and the
      .d_automount implementation creates a new submount at that location, so
      that the submount gets a distinct st_dev value.
      
      Note that all submounts get a distinct superblock and a distinct st_dev
      value, so for virtio-fs, even if the same filesystem is mounted more than
      once on the host, none of its mount points will have the same st_dev.  We
      need distinct superblocks because the superblock points to the root node,
      but the different host mounts may show different trees (e.g. due to
      submounts in some of them, but not in others).
      
      Right now, this behavior is only enabled when fuse_conn.auto_submounts is
      set, which is the case only for virtio-fs.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      bf109c64
  9. 18 9月, 2020 1 次提交
    • M
      fuse: split fuse_mount off of fuse_conn · fcee216b
      Max Reitz 提交于
      We want to allow submounts for the same fuse_conn, but with different
      superblocks so that each of the submounts has its own device ID.  To do
      so, we need to split all mount-specific information off of fuse_conn
      into a new fuse_mount structure, so that multiple mounts can share a
      single fuse_conn.
      
      We need to take care only to perform connection-level actions once (i.e.
      when the fuse_conn and thus the first fuse_mount are established, or
      when the last fuse_mount and thus the fuse_conn are destroyed).  For
      example, fuse_sb_destroy() must invoke fuse_send_destroy() until the
      last superblock is released.
      
      To do so, we keep track of which fuse_mount is the root mount and
      perform all fuse_conn-level actions only when this fuse_mount is
      involved.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      fcee216b
  10. 10 9月, 2020 1 次提交
    • V
      virtiofs: serialize truncate/punch_hole and dax fault path · 6ae330ca
      Vivek Goyal 提交于
      Currently in fuse we don't seem have any lock which can serialize fault
      path with truncate/punch_hole path. With dax support I need one for
      following reasons.
      
      1. Dax requirement
      
        DAX fault code relies on inode size being stable for the duration of
        fault and want to serialize with truncate/punch_hole and they explicitly
        mention it.
      
        static vm_fault_t dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
                                     const struct iomap_ops *ops)
              /*
               * Check whether offset isn't beyond end of file now. Caller is
               * supposed to hold locks serializing us with truncate / punch hole so
               * this is a reliable test.
               */
              max_pgoff = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
      
      2. Make sure there are no users of pages being truncated/punch_hole
      
        get_user_pages() might take references to page and then do some DMA
        to said pages. Filesystem might truncate those pages without knowing
        that a DMA is in progress or some I/O is in progress. So use
        dax_layout_busy_page() to make sure there are no such references
        and I/O is not in progress on said pages before moving ahead with
        truncation.
      
      3. Limitation of kvm page fault error reporting
      
        If we are truncating file on host first and then removing mappings in
        guest lateter (truncate page cache etc), then this could lead to a
        problem with KVM. Say a mapping is in place in guest and truncation
        happens on host. Now if guest accesses that mapping, then host will
        take a fault and kvm will either exit to qemu or spin infinitely.
      
        IOW, before we do truncation on host, we need to make sure that guest
        inode does not have any mapping in that region or whole file.
      
      4. virtiofs memory range reclaim
      
       Soon I will introduce the notion of being able to reclaim dax memory
       ranges from a fuse dax inode. There also I need to make sure that
       no I/O or fault is going on in the reclaimed range and nobody is using
       it so that range can be reclaimed without issues.
      
      Currently if we take inode lock, that serializes read/write. But it does
      not do anything for faults. So I add another semaphore fuse_inode->i_mmap_sem
      for this purpose.  It can be used to serialize with faults.
      
      As of now, I am adding taking this semaphore only in dax fault path and
      not regular fault path because existing code does not have one. May
      be existing code can benefit from it as well to take care of some
      races, but that we can fix later if need be. For now, I am just focussing
      only on DAX path which is new path.
      
      Also added logic to take fuse_inode->i_mmap_sem in
      truncate/punch_hole/open(O_TRUNC) path to make sure file truncation and
      fuse dax fault are mutually exlusive and avoid all the above problems.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      6ae330ca
  11. 19 5月, 2020 1 次提交
  12. 06 2月, 2020 1 次提交
  13. 12 11月, 2019 2 次提交
  14. 23 10月, 2019 1 次提交
  15. 21 10月, 2019 1 次提交
  16. 24 9月, 2019 2 次提交
  17. 12 9月, 2019 1 次提交
    • M
      fuse: delete dentry if timeout is zero · 8fab0106
      Miklos Szeredi 提交于
      Don't hold onto dentry in lru list if need to re-lookup it anyway at next
      access.  Only do this if explicitly enabled, otherwise it could result in
      performance regression.
      
      More advanced version of this patch would periodically flush out dentries
      from the lru which have gone stale.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      8fab0106
  18. 10 9月, 2019 2 次提交
  19. 13 2月, 2019 4 次提交
  20. 12 12月, 2018 1 次提交
  21. 03 12月, 2018 2 次提交