1. 27 7月, 2020 4 次提交
  2. 25 7月, 2020 1 次提交
  3. 24 7月, 2020 2 次提交
  4. 23 7月, 2020 1 次提交
  5. 22 7月, 2020 4 次提交
    • B
      btrfs: fix mount failure caused by race with umount · 48cfa61b
      Boris Burkov 提交于
      It is possible to cause a btrfs mount to fail by racing it with a slow
      umount. The crux of the sequence is generic_shutdown_super not yet
      calling sop->put_super before btrfs_mount_root calls btrfs_open_devices.
      If that occurs, btrfs_open_devices will decide the opened counter is
      non-zero, increment it, and skip resetting fs_devices->total_rw_bytes to
      0. From here, mount will call sget which will result in grab_super
      trying to take the super block umount semaphore. That semaphore will be
      held by the slow umount, so mount will block. Before up-ing the
      semaphore, umount will delete the super block, resulting in mount's sget
      reliably allocating a new one, which causes the mount path to dutifully
      fill it out, and increment total_rw_bytes a second time, which causes
      the mount to fail, as we see double the expected bytes.
      
      Here is the sequence laid out in greater detail:
      
      CPU0                                                    CPU1
      down_write sb->s_umount
      btrfs_kill_super
        kill_anon_super(sb)
          generic_shutdown_super(sb);
            shrink_dcache_for_umount(sb);
            sync_filesystem(sb);
            evict_inodes(sb); // SLOW
      
                                                    btrfs_mount_root
                                                      btrfs_scan_one_device
                                                      fs_devices = device->fs_devices
                                                      fs_info->fs_devices = fs_devices
                                                      // fs_devices-opened makes this a no-op
                                                      btrfs_open_devices(fs_devices, mode, fs_type)
                                                      s = sget(fs_type, test, set, flags, fs_info);
                                                        find sb in s_instances
                                                        grab_super(sb);
                                                          down_write(&s->s_umount); // blocks
      
            sop->put_super(sb)
              // sb->fs_devices->opened == 2; no-op
            spin_lock(&sb_lock);
            hlist_del_init(&sb->s_instances);
            spin_unlock(&sb_lock);
            up_write(&sb->s_umount);
                                                          return 0;
                                                        retry lookup
                                                        don't find sb in s_instances (deleted by CPU0)
                                                        s = alloc_super
                                                        return s;
                                                      btrfs_fill_super(s, fs_devices, data)
                                                        open_ctree // fs_devices total_rw_bytes improperly set!
                                                          btrfs_read_chunk_tree
                                                            read_one_dev // increment total_rw_bytes again!!
                                                            super_total_bytes < fs_devices->total_rw_bytes // ERROR!!!
      
      To fix this, we clear total_rw_bytes from within btrfs_read_chunk_tree
      before the calls to read_one_dev, while holding the sb umount semaphore
      and the uuid mutex.
      
      To reproduce, it is sufficient to dirty a decent number of inodes, then
      quickly umount and mount.
      
        for i in $(seq 0 500)
        do
          dd if=/dev/zero of="/mnt/foo/$i" bs=1M count=1
        done
        umount /mnt/foo&
        mount /mnt/foo
      
      does the trick for me.
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      48cfa61b
    • R
      btrfs: fix page leaks after failure to lock page for delalloc · 5909ca11
      Robbie Ko 提交于
      When locking pages for delalloc, we check if it's dirty and mapping still
      matches. If it does not match, we need to return -EAGAIN and release all
      pages. Only the current page was put though, iterate over all the
      remaining pages too.
      
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NRobbie Ko <robbieko@synology.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5909ca11
    • Q
      btrfs: qgroup: fix data leak caused by race between writeback and truncate · fa91e4aa
      Qu Wenruo 提交于
      [BUG]
      When running tests like generic/013 on test device with btrfs quota
      enabled, it can normally lead to data leak, detected at unmount time:
      
        BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096
        ------------[ cut here ]------------
        WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs]
        RIP: 0010:close_ctree+0x1dc/0x323 [btrfs]
        Call Trace:
         btrfs_put_super+0x15/0x17 [btrfs]
         generic_shutdown_super+0x72/0x110
         kill_anon_super+0x18/0x30
         btrfs_kill_super+0x17/0x30 [btrfs]
         deactivate_locked_super+0x3b/0xa0
         deactivate_super+0x40/0x50
         cleanup_mnt+0x135/0x190
         __cleanup_mnt+0x12/0x20
         task_work_run+0x64/0xb0
         __prepare_exit_to_usermode+0x1bc/0x1c0
         __syscall_return_slowpath+0x47/0x230
         do_syscall_64+0x64/0xb0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        ---[ end trace caf08beafeca2392 ]---
        BTRFS error (device dm-3): qgroup reserved space leaked
      
      [CAUSE]
      In the offending case, the offending operations are:
      2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0
      2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0
      
      The following sequence of events could happen after the writev():
      	CPU1 (writeback)		|		CPU2 (truncate)
      -----------------------------------------------------------------
      btrfs_writepages()			|
      |- extent_write_cache_pages()		|
         |- Got page for 1003520		|
         |  1003520 is Dirty, no writeback	|
         |  So (!clear_page_dirty_for_io())   |
         |  gets called for it		|
         |- Now page 1003520 is Clean.	|
         |					| btrfs_setattr()
         |					| |- btrfs_setsize()
         |					|    |- truncate_setsize()
         |					|       New i_size is 18388
         |- __extent_writepage()		|
         |  |- page_offset() > i_size		|
            |- btrfs_invalidatepage()		|
      	 |- Page is clean, so no qgroup |
      	    callback executed
      
      This means, the qgroup reserved data space is not properly released in
      btrfs_invalidatepage() as the page is Clean.
      
      [FIX]
      Instead of checking the dirty bit of a page, call
      btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage().
      
      As qgroup rsv are completely bound to the QGROUP_RESERVED bit of
      io_tree, not bound to page status, thus we won't cause double freeing
      anyway.
      
      Fixes: 0b34c261 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero")
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fa91e4aa
    • F
      btrfs: fix double free on ulist after backref resolution failure · 580c079b
      Filipe Manana 提交于
      At btrfs_find_all_roots_safe() we allocate a ulist and set the **roots
      argument to point to it. However if later we fail due to an error returned
      by find_parent_nodes(), we free that ulist but leave a dangling pointer in
      the **roots argument. Upon receiving the error, a caller of this function
      can attempt to free the same ulist again, resulting in an invalid memory
      access.
      
      One such scenario is during qgroup accounting:
      
      btrfs_qgroup_account_extents()
      
       --> calls btrfs_find_all_roots() passes &new_roots (a stack allocated
           pointer) to btrfs_find_all_roots()
      
         --> btrfs_find_all_roots() just calls btrfs_find_all_roots_safe()
             passing &new_roots to it
      
           --> allocates ulist and assigns its address to **roots (which
               points to new_roots from btrfs_qgroup_account_extents())
      
           --> find_parent_nodes() returns an error, so we free the ulist
               and leave **roots pointing to it after returning
      
       --> btrfs_qgroup_account_extents() sees btrfs_find_all_roots() returned
           an error and jumps to the label 'cleanup', which just tries to
           free again the same ulist
      
      Stack trace example:
      
       ------------[ cut here ]------------
       BTRFS: tree first key check failed
       WARNING: CPU: 1 PID: 1763215 at fs/btrfs/disk-io.c:422 btrfs_verify_level_key+0xe0/0x180 [btrfs]
       Modules linked in: dm_snapshot dm_thin_pool (...)
       CPU: 1 PID: 1763215 Comm: fsstress Tainted: G        W         5.8.0-rc3-btrfs-next-64 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:btrfs_verify_level_key+0xe0/0x180 [btrfs]
       Code: 28 5b 5d (...)
       RSP: 0018:ffffb89b473779a0 EFLAGS: 00010286
       RAX: 0000000000000000 RBX: ffff90397759bf08 RCX: 0000000000000000
       RDX: 0000000000000001 RSI: 0000000000000027 RDI: 00000000ffffffff
       RBP: ffff9039a419c000 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: ffffb89b43301000 R12: 000000000000005e
       R13: ffffb89b47377a2e R14: ffffb89b473779af R15: 0000000000000000
       FS:  00007fc47e1e1000(0000) GS:ffff9039ac200000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fc47e1df000 CR3: 00000003d9e4e001 CR4: 00000000003606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        read_block_for_search+0xf6/0x350 [btrfs]
        btrfs_next_old_leaf+0x242/0x650 [btrfs]
        resolve_indirect_refs+0x7cf/0x9e0 [btrfs]
        find_parent_nodes+0x4ea/0x12c0 [btrfs]
        btrfs_find_all_roots_safe+0xbf/0x130 [btrfs]
        btrfs_qgroup_account_extents+0x9d/0x390 [btrfs]
        btrfs_commit_transaction+0x4f7/0xb20 [btrfs]
        btrfs_sync_file+0x3d4/0x4d0 [btrfs]
        do_fsync+0x38/0x70
        __x64_sys_fdatasync+0x13/0x20
        do_syscall_64+0x5c/0xe0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7fc47e2d72e3
       Code: Bad RIP value.
       RSP: 002b:00007fffa32098c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
       RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc47e2d72e3
       RDX: 00007fffa3209830 RSI: 00007fffa3209830 RDI: 0000000000000003
       RBP: 000000000000072e R08: 0000000000000001 R09: 0000000000000003
       R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000003e8
       R13: 0000000051eb851f R14: 00007fffa3209970 R15: 00005607c4ac8b50
       irq event stamp: 0
       hardirqs last  enabled at (0): [<0000000000000000>] 0x0
       hardirqs last disabled at (0): [<ffffffffb8eb5e85>] copy_process+0x755/0x1eb0
       softirqs last  enabled at (0): [<ffffffffb8eb5e85>] copy_process+0x755/0x1eb0
       softirqs last disabled at (0): [<0000000000000000>] 0x0
       ---[ end trace 8639237550317b48 ]---
       BTRFS error (device sdc): tree first key mismatch detected, bytenr=62324736 parent_transid=94 key expected=(262,108,1351680) has=(259,108,1921024)
       general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
       CPU: 2 PID: 1763215 Comm: fsstress Tainted: G        W         5.8.0-rc3-btrfs-next-64 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:ulist_release+0x14/0x60 [btrfs]
       Code: c7 07 00 (...)
       RSP: 0018:ffffb89b47377d60 EFLAGS: 00010282
       RAX: 6b6b6b6b6b6b6b6b RBX: ffff903959b56b90 RCX: 0000000000000000
       RDX: 0000000000000001 RSI: 0000000000270024 RDI: ffff9036e2adc840
       RBP: ffff9036e2adc848 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000000 R12: ffff9036e2adc840
       R13: 0000000000000015 R14: ffff9039a419ccf8 R15: ffff90395d605840
       FS:  00007fc47e1e1000(0000) GS:ffff9039ac600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f8c1c0a51c8 CR3: 00000003d9e4e004 CR4: 00000000003606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        ulist_free+0x13/0x20 [btrfs]
        btrfs_qgroup_account_extents+0xf3/0x390 [btrfs]
        btrfs_commit_transaction+0x4f7/0xb20 [btrfs]
        btrfs_sync_file+0x3d4/0x4d0 [btrfs]
        do_fsync+0x38/0x70
        __x64_sys_fdatasync+0x13/0x20
        do_syscall_64+0x5c/0xe0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7fc47e2d72e3
       Code: Bad RIP value.
       RSP: 002b:00007fffa32098c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
       RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc47e2d72e3
       RDX: 00007fffa3209830 RSI: 00007fffa3209830 RDI: 0000000000000003
       RBP: 000000000000072e R08: 0000000000000001 R09: 0000000000000003
       R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000003e8
       R13: 0000000051eb851f R14: 00007fffa3209970 R15: 00005607c4ac8b50
       Modules linked in: dm_snapshot dm_thin_pool (...)
       ---[ end trace 8639237550317b49 ]---
       RIP: 0010:ulist_release+0x14/0x60 [btrfs]
       Code: c7 07 00 (...)
       RSP: 0018:ffffb89b47377d60 EFLAGS: 00010282
       RAX: 6b6b6b6b6b6b6b6b RBX: ffff903959b56b90 RCX: 0000000000000000
       RDX: 0000000000000001 RSI: 0000000000270024 RDI: ffff9036e2adc840
       RBP: ffff9036e2adc848 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000000 R12: ffff9036e2adc840
       R13: 0000000000000015 R14: ffff9039a419ccf8 R15: ffff90395d605840
       FS:  00007fc47e1e1000(0000) GS:ffff9039ad200000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f6a776f7d40 CR3: 00000003d9e4e002 CR4: 00000000003606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fix this by making btrfs_find_all_roots_safe() set *roots to NULL after
      it frees the ulist.
      
      Fixes: 8da6d581 ("Btrfs: added btrfs_find_all_roots()")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      580c079b
  6. 21 7月, 2020 4 次提交
  7. 20 7月, 2020 2 次提交
  8. 19 7月, 2020 1 次提交
  9. 18 7月, 2020 2 次提交
    • J
      io_uring: ensure double poll additions work with both request types · 807abcb0
      Jens Axboe 提交于
      The double poll additions were centered around doing POLL_ADD on file
      descriptors that use more than one waitqueue (typically one for read,
      one for write) when being polled. However, it can also end up being
      triggered for when we use poll triggered retry. For that case, we cannot
      safely use req->io, as that could be used by the request type itself.
      
      Add a second io_poll_iocb pointer in the structure we allocate for poll
      based retry, and ensure we use the right one from the two paths.
      
      Fixes: 18bceab1 ("io_uring: allow POLL_ADD with double poll_wait() users")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      807abcb0
    • O
      SUNRPC reverting d03727b2 ("NFSv4 fix CLOSE not waiting for direct IO compeletion") · 65caafd0
      Olga Kornievskaia 提交于
      Reverting commit d03727b2 "NFSv4 fix CLOSE not waiting for
      direct IO compeletion". This patch made it so that fput() by calling
      inode_dio_done() in nfs_file_release() would wait uninterruptably
      for any outstanding directIO to the file (but that wait on IO should
      be killable).
      
      The problem the patch was also trying to address was REMOVE returning
      ERR_ACCESS because the file is still opened, is supposed to be resolved
      by server returning ERR_FILE_OPEN and not ERR_ACCESS.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      65caafd0
  10. 16 7月, 2020 12 次提交
    • A
      ovl: fix lookup of indexed hardlinks with metacopy · 4518dfcf
      Amir Goldstein 提交于
      We recently moved setting inode flag OVL_UPPERDATA to ovl_lookup().
      
      When looking up an overlay dentry, upperdentry may be found by index
      and not by name.  In that case, we fail to read the metacopy xattr
      and falsly set the OVL_UPPERDATA on the overlay inode.
      
      This caused a regression in xfstest overlay/033 when run with
      OVERLAY_MOUNT_OPTIONS="-o metacopy=on".
      
      Fixes: 28166ab3 ("ovl: initialize OVL_UPPERDATA in ovl_lookup()")
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      4518dfcf
    • A
      ovl: fix unneeded call to ovl_change_flags() · 81a33c1e
      Amir Goldstein 提交于
      The check if user has changed the overlay file was wrong, causing unneeded
      call to ovl_change_flags() including taking f_lock on every file access.
      
      Fixes: d9899030 ("ovl: do not generate duplicate fsnotify events for "fake" path")
      Cc: <stable@vger.kernel.org> # v4.19+
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      81a33c1e
    • D
      afs: Fix interruption of operations · 811f04ba
      David Howells 提交于
      The afs filesystem driver allows unstarted operations to be cancelled by
      signal, but most of these can easily be restarted (mkdir for example).  The
      primary culprits for reproducing this are those applications that use
      SIGALRM to display a progress counter.
      
      File lock-extension operation is marked uninterruptible as we have a
      limited time in which to do it, and the release op is marked
      uninterruptible also as if we fail to unlock a file, we'll have to wait 20
      mins before anyone can lock it again.
      
      The store operation logs a warning if it gets interruption, e.g.:
      
      	kAFS: Unexpected error from FS.StoreData -4
      
      because it's run from the background - but it can also be run from
      fdatasync()-type things.  However, store options aren't marked
      interruptible at the moment.
      
      Fix this in the following ways:
      
       (1) Mark store operations as uninterruptible.  It might make sense to
           relax this for certain situations, but I'm not sure how to make sure
           that background store ops aren't affected by signals to foreground
           processes that happen to trigger them.
      
       (2) In afs_get_io_locks(), where we're getting the serialisation lock for
           talking to the fileserver, return ERESTARTSYS rather than EINTR
           because a lot of the operations (e.g. mkdir) are restartable if we
           haven't yet started sending the op to the server.
      
      Fixes: e49c7b2f ("afs: Build an abstraction around an "operation" concept")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      811f04ba
    • A
      ovl: fix mount option checks for nfs_export with no upperdir · f0e1266e
      Amir Goldstein 提交于
      Without upperdir mount option, there is no index dir and the dependency
      checks nfs_export => index for mount options parsing are incorrect.
      
      Allow the combination nfs_export=on,index=off with no upperdir and move
      the check for dependency redirect_dir=nofollow for non-upper mount case
      to mount options parsing.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      f0e1266e
    • A
      ovl: force read-only sb on failure to create index dir · 470c1563
      Amir Goldstein 提交于
      With index feature enabled, on failure to create index dir, overlay is
      being mounted read-only.  However, we do not forbid user to remount overlay
      read-write.  Fix that by setting ofs->workdir to NULL, which prevents
      remount read-write.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      470c1563
    • A
      ovl: fix regression with re-formatted lower squashfs · a888db31
      Amir Goldstein 提交于
      Commit 9df085f3 ("ovl: relax requirement for non null uuid of lower
      fs") relaxed the requirement for non null uuid with single lower layer to
      allow enabling index and nfs_export features with single lower squashfs.
      
      Fabian reported a regression in a setup when overlay re-uses an existing
      upper layer and re-formats the lower squashfs image.  Because squashfs
      has no uuid, the origin xattr in upper layer are decoded from the new
      lower layer where they may resolve to a wrong origin file and user may
      get an ESTALE or EIO error on lookup.
      
      To avoid the reported regression while still allowing the new features
      with single lower squashfs, do not allow decoding origin with lower null
      uuid unless user opted-in to one of the new features that require
      following the lower inode of non-dir upper (index, xino, metacopy).
      Reported-by: NFabian <godi.beat@gmx.net>
      Link: https://lore.kernel.org/linux-unionfs/32532923.JtPX5UtSzP@fgdesktop/
      Fixes: 9df085f3 ("ovl: relax requirement for non null uuid of lower fs")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      a888db31
    • A
      ovl: fix oops in ovl_indexdir_cleanup() with nfs_export=on · 20396365
      Amir Goldstein 提交于
      Mounting with nfs_export=on, xfstests overlay/031 triggers a kernel panic
      since v5.8-rc1 overlayfs updates.
      
       overlayfs: orphan index entry (index/00fb1..., ftype=4000, nlink=2)
       BUG: kernel NULL pointer dereference, address: 0000000000000030
       RIP: 0010:ovl_cleanup_and_whiteout+0x28/0x220 [overlay]
      
      Bisect point at commit c21c839b ("ovl: whiteout inode sharing")
      
      Minimal reproducer:
      --------------------------------------------------
      rm -rf l u w m
      mkdir -p l u w m
      mkdir -p l/testdir
      touch l/testdir/testfile
      mount -t overlay -o lowerdir=l,upperdir=u,workdir=w,nfs_export=on overlay m
      echo 1 > m/testdir/testfile
      umount m
      rm -rf u/testdir
      mount -t overlay -o lowerdir=l,upperdir=u,workdir=w,nfs_export=on overlay m
      umount m
      --------------------------------------------------
      
      When mount with nfs_export=on, and fail to verify an orphan index, we're
      cleaning this index from indexdir by calling ovl_cleanup_and_whiteout().
      This dereferences ofs->workdir, that was earlier set to NULL.
      
      The design was that ovl->workdir will point at ovl->indexdir, but we are
      assigning ofs->indexdir to ofs->workdir only after ovl_indexdir_cleanup().
      There is no reason not to do it sooner, because once we get success from
      ofs->indexdir = ovl_workdir_create(... there is no turning back.
      Reported-and-tested-by: NMurphy Zhou <jencce.kernel@gmail.com>
      Fixes: c21c839b ("ovl: whiteout inode sharing")
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      20396365
    • A
      ovl: relax WARN_ON() when decoding lower directory file handle · 124c2de2
      Amir Goldstein 提交于
      Decoding a lower directory file handle to overlay path with cold
      inode/dentry cache may go as follows:
      
      1. Decode real lower file handle to lower dir path
      2. Check if lower dir is indexed (was copied up)
      3. If indexed, get the upper dir path from index
      4. Lookup upper dir path in overlay
      5. If overlay path found, verify that overlay lower is the lower dir
         from step 1
      
      On failure to verify step 5 above, user will get an ESTALE error and a
      WARN_ON will be printed.
      
      A mismatch in step 5 could be a result of lower directory that was renamed
      while overlay was offline, after that lower directory has been copied up
      and indexed.
      
      This is a scripted reproducer based on xfstest overlay/052:
      
        # Create lower subdir
        create_dirs
        create_test_files $lower/lowertestdir/subdir
        mount_dirs
        # Copy up lower dir and encode lower subdir file handle
        touch $SCRATCH_MNT/lowertestdir
        test_file_handles $SCRATCH_MNT/lowertestdir/subdir -p -o $tmp.fhandle
        # Rename lower dir offline
        unmount_dirs
        mv $lower/lowertestdir $lower/lowertestdir.new/
        mount_dirs
        # Attempt to decode lower subdir file handle
        test_file_handles $SCRATCH_MNT -p -i $tmp.fhandle
      
      Since this WARN_ON() can be triggered by user we need to relax it.
      
      Fixes: 4b91c30a ("ovl: lookup connected ancestor of dir in inode cache")
      Cc: <stable@vger.kernel.org> # v4.16+
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      124c2de2
    • Y
      ovl: remove not used argument in ovl_check_origin · d78a0dcf
      youngjun 提交于
      ovl_check_origin outparam 'ctrp' argument not used by caller.  So remove
      this argument.
      Signed-off-by: Nyoungjun <her0gyugyu@gmail.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      d78a0dcf
    • Y
      ovl: change ovl_copy_up_flags static · 5ac8e802
      youngjun 提交于
      "ovl_copy_up_flags" is used in copy_up.c.
      so, change it static.
      Signed-off-by: Nyoungjun <her0gyugyu@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5ac8e802
    • Y
      ovl: inode reference leak in ovl_is_inuse true case. · 24f14009
      youngjun 提交于
      When "ovl_is_inuse" true case, trap inode reference not put.  plus adding
      the comment explaining sequence of ovl_is_inuse after ovl_setup_trap.
      
      Fixes: 0be0bfd2 ("ovl: fix regression caused by overlapping layers detection")
      Cc: <stable@vger.kernel.org> # v4.19+
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: Nyoungjun <her0gyugyu@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      24f14009
    • P
      io_uring: fix recvmsg memory leak with buffer selection · 681fda8d
      Pavel Begunkov 提交于
      io_recvmsg() doesn't free memory allocated for struct io_buffer. This can
      causes a leak when used with automatic buffer selection.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      681fda8d
  11. 15 7月, 2020 1 次提交
  12. 14 7月, 2020 6 次提交
    • V
      fuse: don't ignore errors from fuse_writepages_fill() · 7779b047
      Vasily Averin 提交于
      fuse_writepages() ignores some errors taken from fuse_writepages_fill() I
      believe it is a bug: if .writepages is called with WB_SYNC_ALL it should
      either guarantee that all data was successfully saved or return error.
      
      Fixes: 26d614df ("fuse: Implement writepages callback")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      7779b047
    • M
      fuse: clean up condition for writepage sending · 6ddf3af9
      Miklos Szeredi 提交于
      fuse_writepages_fill uses following construction:
      
      if (wpa && ap->num_pages &&
          (A || B || C)) {
              action;
      } else if (wpa && D) {
              if (E) {
                      the same action;
              }
      }
      
       - ap->num_pages check is always true and can be removed
      
       - "if" and "else if" calls the same action and can be merged.
      
      Move checking A, B, C, D, E conditions to a helper, add comments.
      Original-patch-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      6ddf3af9
    • M
      fuse: reject options on reconfigure via fsconfig(2) · b330966f
      Miklos Szeredi 提交于
      Previous patch changed handling of remount/reconfigure to ignore all
      options, including those that are unknown to the fuse kernel fs.  This was
      done for backward compatibility, but this likely only affects the old
      mount(2) API.
      
      The new fsconfig(2) based reconfiguration could possibly be improved.  This
      would make the new API less of a drop in replacement for the old, OTOH this
      is a good chance to get rid of some weirdnesses in the old API.
      
      Several other behaviors might make sense:
      
       1) unknown options are rejected, known options are ignored
      
       2) unknown options are rejected, known options are rejected if the value
       is changed, allowed otherwise
      
       3) all options are rejected
      
      Prior to the backward compatibility fix to ignore all options all known
      options were accepted (1), even if they change the value of a mount
      parameter; fuse_reconfigure() does not look at the config values set by
      fuse_parse_param().
      
      To fix that we'd need to verify that the value provided is the same as set
      in the initial configuration (2).  The major drawback is that this is much
      more complex than just rejecting all attempts at changing options (3);
      i.e. all options signify initial configuration values and don't make sense
      on reconfigure.
      
      This patch opts for (3) with the rationale that no mount options are
      reconfigurable in fuse.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      b330966f
    • M
      fuse: ignore 'data' argument of mount(..., MS_REMOUNT) · e8b20a47
      Miklos Szeredi 提交于
      The command
      
        mount -o remount -o unknownoption /mnt/fuse
      
      succeeds on kernel versions prior to v5.4 and fails on kernel version at or
      after.  This is because fuse_parse_param() rejects any unrecognised options
      in case of FS_CONTEXT_FOR_RECONFIGURE, just as for FS_CONTEXT_FOR_MOUNT.
      
      This causes a regression in case the fuse filesystem is in fstab, since
      remount sends all options found there to the kernel; even ones that are
      meant for the initial mount and are consumed by the userspace fuse server.
      
      Fix this by ignoring mount options, just as fuse_remount_fs() did prior to
      the conversion to the new API.
      Reported-by: NStefan Priebe <s.priebe@profihost.ag>
      Fixes: c30da2e9 ("fuse: convert to use the new mount API")
      Cc: <stable@vger.kernel.org> # v5.4
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      e8b20a47
    • M
      fuse: use ->reconfigure() instead of ->remount_fs() · 0189a2d3
      Miklos Szeredi 提交于
      s_op->remount_fs() is only called from legacy_reconfigure(), which is not
      used after being converted to the new API.
      
      Convert to using ->reconfigure().  This restores the previous behavior of
      syncing the filesystem and rejecting MS_MANDLOCK on remount.
      
      Fixes: c30da2e9 ("fuse: convert to use the new mount API")
      Cc: <stable@vger.kernel.org> # v5.4
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      0189a2d3
    • M
      fuse: fix warning in tree_insert() and clean up writepage insertion · c146024e
      Miklos Szeredi 提交于
      fuse_writepages_fill() calls tree_insert() with ap->num_pages = 0 which
      triggers the following warning:
      
       WARNING: CPU: 1 PID: 17211 at fs/fuse/file.c:1728 tree_insert+0xab/0xc0 [fuse]
       RIP: 0010:tree_insert+0xab/0xc0 [fuse]
       Call Trace:
        fuse_writepages_fill+0x5da/0x6a0 [fuse]
        write_cache_pages+0x171/0x470
        fuse_writepages+0x8a/0x100 [fuse]
        do_writepages+0x43/0xe0
      
      Fix up the warning and clean up the code around rb-tree insertion:
      
       - Rename tree_insert() to fuse_insert_writeback() and make it return the
         conflicting entry in case of failure
      
       - Re-add tree_insert() as a wrapper around fuse_insert_writeback()
      
       - Rename fuse_writepage_in_flight() to fuse_writepage_add() and reverse
         the meaning of the return value to mean
      
          + "true" in case the writepage entry was successfully added
      
          + "false" in case it was in-fligt queued on an existing writepage
             entry's auxiliary list or the existing writepage entry's temporary
             page updated
      
         Switch from fuse_find_writeback() + tree_insert() to
         fuse_insert_writeback()
      
       - Move setting orig_pages to before inserting/updating the entry; this may
         result in the orig_pages value being discarded later in case of an
         in-flight request
      
       - In case of a new writepage entry use fuse_writepage_add()
         unconditionally, only set data->wpa if the entry was added.
      
      Fixes: 6b2fb799 ("fuse: optimize writepages search")
      Reported-by: Nkernel test robot <rong.a.chen@intel.com>
      Original-path-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      c146024e