提交 · 89d7da9bc592aa6a341d00f2d949615a89bb1eb7 · openeuler / Kernel

27 7月, 2020 4 次提交

btrfs: get mapping tree directly from fsinfo in find_first_block_group · 89d7da9b

由 Johannes Thumshirn 提交于 6月 02, 2020

We already have an fs_info in our function parameters, there's no need
to do the maths again and get fs_info from the extent_root just to get
the mapping_tree.

Instead directly grab the mapping_tree from fs_info.
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

89d7da9b

btrfs: simplify checks when adding excluded ranges · 96f9b0f2

由 Nikolay Borisov 提交于 4月 03, 2020

Adresses held in 'logical' array are always guaranteed to fall within
the boundaries of the block group. That is, 'start' can never be
smaller than cache->start. This invariant follows from the way the
address are calculated in btrfs_rmap_block:

    stripe_nr = physical - map->stripes[i].physical;
    stripe_nr = div64_u64(stripe_nr, map->stripe_len);
    bytenr = chunk_start + stripe_nr * io_stripe_size;

I.e it's always some IO stripe within the given chunk.

Exploit this invariant to simplify the body of the loop by removing the
unnecessary 'if' since its 'else' part is the one always executed.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

96f9b0f2

btrfs: read stripe len directly in btrfs_rmap_block · 9e22b925

由 Nikolay Borisov 提交于 4月 03, 2020

extent_map::orig_block_len contains the size of a physical stripe when
it's used to describe block groups (calculated in read_one_chunk via
calc_stripe_length or calculated in decide_stripe_size and then assigned
to extent_map::orig_block_len in create_chunk). Exploit this fact to get
the size directly rather than opencoding the calculations. No functional
changes.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9e22b925

btrfs: don't balance btree inode pages from buffered write path · 6a3c7f5c

由 Nikolay Borisov 提交于 5月 28, 2020

The call to btrfs_btree_balance_dirty has been there since the early
days of BTRFS, when the btree was directly modified from the write path,
hence dirtied btree inode pages. With the implementation of b888db2b
("Btrfs: Add delayed allocation to the extent based page tree code")
13 years ago the btree is no longer modified from the write path, hence
there is no point in calling this function. Just remove it.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

6a3c7f5c

25 7月, 2020 1 次提交

squashfs: fix length field overlap check in metadata reading · 2910c59f

由 Phillip Lougher 提交于 7月 23, 2020

This is a regression introduced by the "migrate from ll_rw_block usage
to BIO" patch.

Squashfs packs structures on byte boundaries, and due to that the length
field (of the metadata block) may not be fully in the current block.
The new code rewrote and introduced a faulty check for that edge case.

Fixes: 93e72b3c ("squashfs: migrate from ll_rw_block usage to BIO")
Reported-by: NBernd Amend <bernd.amend@gmail.com>
Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Adrien Schildknecht <adrien+dev@schischi.me>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Daniel Rosenberg <drosen@google.com>
Link: http://lkml.kernel.org/r/20200717195536.16069-1-phillip@squashfs.org.ukSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2910c59f

24 7月, 2020 2 次提交

Revert "cifs: Fix the target file was deleted when rename failed." · 0e670518

由 Steve French 提交于 7月 23, 2020

This reverts commit 9ffad926.

Upon additional testing with older servers, it was found that
the original commit introduced a regression when using the old SMB1
dialect and rsyncing over an existing file.

The patch will need to be respun to address this, likely including
a larger refactoring of the SMB1 and SMB3 rename code paths to make
it less confusing and also to address some additional rename error
cases that SMB3 may be able to workaround.
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reported-by: NPatrick Fernie <patrick.fernie@gmail.com>
CC: Stable <stable@vger.kernel.org>
Acked-by: NRonnie Sahlberg <lsahlber@redhat.com>
Acked-by: NPavel Shilovsky <pshilov@microsoft.com>
Acked-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>

0e670518

io_uring: missed req_init_async() for IOSQE_ASYNC · 3e863ea3

由 Pavel Begunkov 提交于 7月 23, 2020

IOSQE_ASYNC branch of io_queue_sqe() is another place where an
unitialised req->work can be accessed (i.e. prior io_req_init_async()).
Nothing really bad though, it just looses IO_WQ_WORK_CONCURRENT flag.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3e863ea3

23 7月, 2020 1 次提交

nfsd4: fix NULL dereference in nfsd/clients display code · 9affa435

由 J. Bruce Fields 提交于 7月 15, 2020

We hold the cl_lock here, and that's enough to keep stateid's from going
away, but it's not enough to prevent the files they point to from going
away.  Take fi_lock and a reference and check for NULL, as we do in
other code.
Reported-by: NNeilBrown <neilb@suse.de>
Fixes: 78599c42 ("nfsd4: add file to display list of client's opens")
Reviewed-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

9affa435

22 7月, 2020 4 次提交

btrfs: fix mount failure caused by race with umount · 48cfa61b

由 Boris Burkov 提交于 7月 16, 2020

It is possible to cause a btrfs mount to fail by racing it with a slow
umount. The crux of the sequence is generic_shutdown_super not yet
calling sop->put_super before btrfs_mount_root calls btrfs_open_devices.
If that occurs, btrfs_open_devices will decide the opened counter is
non-zero, increment it, and skip resetting fs_devices->total_rw_bytes to
0. From here, mount will call sget which will result in grab_super
trying to take the super block umount semaphore. That semaphore will be
held by the slow umount, so mount will block. Before up-ing the
semaphore, umount will delete the super block, resulting in mount's sget
reliably allocating a new one, which causes the mount path to dutifully
fill it out, and increment total_rw_bytes a second time, which causes
the mount to fail, as we see double the expected bytes.

Here is the sequence laid out in greater detail:

CPU0                                                    CPU1
down_write sb->s_umount
btrfs_kill_super
  kill_anon_super(sb)
    generic_shutdown_super(sb);
      shrink_dcache_for_umount(sb);
      sync_filesystem(sb);
      evict_inodes(sb); // SLOW

                                              btrfs_mount_root
                                                btrfs_scan_one_device
                                                fs_devices = device->fs_devices
                                                fs_info->fs_devices = fs_devices
                                                // fs_devices-opened makes this a no-op
                                                btrfs_open_devices(fs_devices, mode, fs_type)
                                                s = sget(fs_type, test, set, flags, fs_info);
                                                  find sb in s_instances
                                                  grab_super(sb);
                                                    down_write(&s->s_umount); // blocks

      sop->put_super(sb)
        // sb->fs_devices->opened == 2; no-op
      spin_lock(&sb_lock);
      hlist_del_init(&sb->s_instances);
      spin_unlock(&sb_lock);
      up_write(&sb->s_umount);
                                                    return 0;
                                                  retry lookup
                                                  don't find sb in s_instances (deleted by CPU0)
                                                  s = alloc_super
                                                  return s;
                                                btrfs_fill_super(s, fs_devices, data)
                                                  open_ctree // fs_devices total_rw_bytes improperly set!
                                                    btrfs_read_chunk_tree
                                                      read_one_dev // increment total_rw_bytes again!!
                                                      super_total_bytes < fs_devices->total_rw_bytes // ERROR!!!

To fix this, we clear total_rw_bytes from within btrfs_read_chunk_tree
before the calls to read_one_dev, while holding the sb umount semaphore
and the uuid mutex.

To reproduce, it is sufficient to dirty a decent number of inodes, then
quickly umount and mount.

  for i in $(seq 0 500)
  do
    dd if=/dev/zero of="/mnt/foo/$i" bs=1M count=1
  done
  umount /mnt/foo&
  mount /mnt/foo

does the trick for me.

CC: stable@vger.kernel.org # 4.4+
Signed-off-by: NBoris Burkov <boris@bur.io>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

48cfa61b

btrfs: fix page leaks after failure to lock page for delalloc · 5909ca11

由 Robbie Ko 提交于 7月 20, 2020

When locking pages for delalloc, we check if it's dirty and mapping still
matches. If it does not match, we need to return -EAGAIN and release all
pages. Only the current page was put though, iterate over all the
remaining pages too.

CC: stable@vger.kernel.org # 4.14+
Reviewed-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NRobbie Ko <robbieko@synology.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

5909ca11

btrfs: qgroup: fix data leak caused by race between writeback and truncate · fa91e4aa

由 Qu Wenruo 提交于 7月 17, 2020

[BUG]
When running tests like generic/013 on test device with btrfs quota
enabled, it can normally lead to data leak, detected at unmount time:

  BTRFS warning (device dm-3): qgroup 0/5 has unreleased space, type 0 rsv 4096
  ------------[ cut here ]------------
  WARNING: CPU: 11 PID: 16386 at fs/btrfs/disk-io.c:4142 close_ctree+0x1dc/0x323 [btrfs]
  RIP: 0010:close_ctree+0x1dc/0x323 [btrfs]
  Call Trace:
   btrfs_put_super+0x15/0x17 [btrfs]
   generic_shutdown_super+0x72/0x110
   kill_anon_super+0x18/0x30
   btrfs_kill_super+0x17/0x30 [btrfs]
   deactivate_locked_super+0x3b/0xa0
   deactivate_super+0x40/0x50
   cleanup_mnt+0x135/0x190
   __cleanup_mnt+0x12/0x20
   task_work_run+0x64/0xb0
   __prepare_exit_to_usermode+0x1bc/0x1c0
   __syscall_return_slowpath+0x47/0x230
   do_syscall_64+0x64/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  ---[ end trace caf08beafeca2392 ]---
  BTRFS error (device dm-3): qgroup reserved space leaked

[CAUSE]
In the offending case, the offending operations are:
2/6: writev f2X[269 1 0 0 0 0] [1006997,67,288] 0
2/7: truncate f2X[269 1 0 0 48 1026293] 18388 0

The following sequence of events could happen after the writev():
	CPU1 (writeback)		|		CPU2 (truncate)
-----------------------------------------------------------------
btrfs_writepages()			|
|- extent_write_cache_pages()		|
   |- Got page for 1003520		|
   |  1003520 is Dirty, no writeback	|
   |  So (!clear_page_dirty_for_io())   |
   |  gets called for it		|
   |- Now page 1003520 is Clean.	|
   |					| btrfs_setattr()
   |					| |- btrfs_setsize()
   |					|    |- truncate_setsize()
   |					|       New i_size is 18388
   |- __extent_writepage()		|
   |  |- page_offset() > i_size		|
      |- btrfs_invalidatepage()		|
	 |- Page is clean, so no qgroup |
	    callback executed

This means, the qgroup reserved data space is not properly released in
btrfs_invalidatepage() as the page is Clean.

[FIX]
Instead of checking the dirty bit of a page, call
btrfs_qgroup_free_data() unconditionally in btrfs_invalidatepage().

As qgroup rsv are completely bound to the QGROUP_RESERVED bit of
io_tree, not bound to page status, thus we won't cause double freeing
anyway.

Fixes: 0b34c261 ("btrfs: qgroup: Prevent qgroup->reserved from going subzero")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

fa91e4aa

btrfs: fix double free on ulist after backref resolution failure · 580c079b

由 Filipe Manana 提交于 7月 13, 2020

At btrfs_find_all_roots_safe() we allocate a ulist and set the **roots
argument to point to it. However if later we fail due to an error returned
by find_parent_nodes(), we free that ulist but leave a dangling pointer in
the **roots argument. Upon receiving the error, a caller of this function
can attempt to free the same ulist again, resulting in an invalid memory
access.

One such scenario is during qgroup accounting:

btrfs_qgroup_account_extents()

 --> calls btrfs_find_all_roots() passes &new_roots (a stack allocated
     pointer) to btrfs_find_all_roots()

   --> btrfs_find_all_roots() just calls btrfs_find_all_roots_safe()
       passing &new_roots to it

     --> allocates ulist and assigns its address to **roots (which
         points to new_roots from btrfs_qgroup_account_extents())

     --> find_parent_nodes() returns an error, so we free the ulist
         and leave **roots pointing to it after returning

 --> btrfs_qgroup_account_extents() sees btrfs_find_all_roots() returned
     an error and jumps to the label 'cleanup', which just tries to
     free again the same ulist

Stack trace example:

 ------------[ cut here ]------------
 BTRFS: tree first key check failed
 WARNING: CPU: 1 PID: 1763215 at fs/btrfs/disk-io.c:422 btrfs_verify_level_key+0xe0/0x180 [btrfs]
 Modules linked in: dm_snapshot dm_thin_pool (...)
 CPU: 1 PID: 1763215 Comm: fsstress Tainted: G        W         5.8.0-rc3-btrfs-next-64 #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 RIP: 0010:btrfs_verify_level_key+0xe0/0x180 [btrfs]
 Code: 28 5b 5d (...)
 RSP: 0018:ffffb89b473779a0 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffff90397759bf08 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000000027 RDI: 00000000ffffffff
 RBP: ffff9039a419c000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: ffffb89b43301000 R12: 000000000000005e
 R13: ffffb89b47377a2e R14: ffffb89b473779af R15: 0000000000000000
 FS:  00007fc47e1e1000(0000) GS:ffff9039ac200000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fc47e1df000 CR3: 00000003d9e4e001 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  read_block_for_search+0xf6/0x350 [btrfs]
  btrfs_next_old_leaf+0x242/0x650 [btrfs]
  resolve_indirect_refs+0x7cf/0x9e0 [btrfs]
  find_parent_nodes+0x4ea/0x12c0 [btrfs]
  btrfs_find_all_roots_safe+0xbf/0x130 [btrfs]
  btrfs_qgroup_account_extents+0x9d/0x390 [btrfs]
  btrfs_commit_transaction+0x4f7/0xb20 [btrfs]
  btrfs_sync_file+0x3d4/0x4d0 [btrfs]
  do_fsync+0x38/0x70
  __x64_sys_fdatasync+0x13/0x20
  do_syscall_64+0x5c/0xe0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7fc47e2d72e3
 Code: Bad RIP value.
 RSP: 002b:00007fffa32098c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc47e2d72e3
 RDX: 00007fffa3209830 RSI: 00007fffa3209830 RDI: 0000000000000003
 RBP: 000000000000072e R08: 0000000000000001 R09: 0000000000000003
 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000003e8
 R13: 0000000051eb851f R14: 00007fffa3209970 R15: 00005607c4ac8b50
 irq event stamp: 0
 hardirqs last  enabled at (0): [<0000000000000000>] 0x0
 hardirqs last disabled at (0): [<ffffffffb8eb5e85>] copy_process+0x755/0x1eb0
 softirqs last  enabled at (0): [<ffffffffb8eb5e85>] copy_process+0x755/0x1eb0
 softirqs last disabled at (0): [<0000000000000000>] 0x0
 ---[ end trace 8639237550317b48 ]---
 BTRFS error (device sdc): tree first key mismatch detected, bytenr=62324736 parent_transid=94 key expected=(262,108,1351680) has=(259,108,1921024)
 general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
 CPU: 2 PID: 1763215 Comm: fsstress Tainted: G        W         5.8.0-rc3-btrfs-next-64 #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 RIP: 0010:ulist_release+0x14/0x60 [btrfs]
 Code: c7 07 00 (...)
 RSP: 0018:ffffb89b47377d60 EFLAGS: 00010282
 RAX: 6b6b6b6b6b6b6b6b RBX: ffff903959b56b90 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000270024 RDI: ffff9036e2adc840
 RBP: ffff9036e2adc848 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9036e2adc840
 R13: 0000000000000015 R14: ffff9039a419ccf8 R15: ffff90395d605840
 FS:  00007fc47e1e1000(0000) GS:ffff9039ac600000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f8c1c0a51c8 CR3: 00000003d9e4e004 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  ulist_free+0x13/0x20 [btrfs]
  btrfs_qgroup_account_extents+0xf3/0x390 [btrfs]
  btrfs_commit_transaction+0x4f7/0xb20 [btrfs]
  btrfs_sync_file+0x3d4/0x4d0 [btrfs]
  do_fsync+0x38/0x70
  __x64_sys_fdatasync+0x13/0x20
  do_syscall_64+0x5c/0xe0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7fc47e2d72e3
 Code: Bad RIP value.
 RSP: 002b:00007fffa32098c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc47e2d72e3
 RDX: 00007fffa3209830 RSI: 00007fffa3209830 RDI: 0000000000000003
 RBP: 000000000000072e R08: 0000000000000001 R09: 0000000000000003
 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000003e8
 R13: 0000000051eb851f R14: 00007fffa3209970 R15: 00005607c4ac8b50
 Modules linked in: dm_snapshot dm_thin_pool (...)
 ---[ end trace 8639237550317b49 ]---
 RIP: 0010:ulist_release+0x14/0x60 [btrfs]
 Code: c7 07 00 (...)
 RSP: 0018:ffffb89b47377d60 EFLAGS: 00010282
 RAX: 6b6b6b6b6b6b6b6b RBX: ffff903959b56b90 RCX: 0000000000000000
 RDX: 0000000000000001 RSI: 0000000000270024 RDI: ffff9036e2adc840
 RBP: ffff9036e2adc848 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: ffff9036e2adc840
 R13: 0000000000000015 R14: ffff9039a419ccf8 R15: ffff90395d605840
 FS:  00007fc47e1e1000(0000) GS:ffff9039ad200000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f6a776f7d40 CR3: 00000003d9e4e002 CR4: 00000000003606e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fix this by making btrfs_find_all_roots_safe() set *roots to NULL after
it frees the ulist.

Fixes: 8da6d581 ("Btrfs: added btrfs_find_all_roots()")
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

580c079b

21 7月, 2020 4 次提交

exfat: fix name_hash computation on big endian systems · db415f7a

由 Ilya Ponetayev 提交于 7月 16, 2020

On-disk format for name_hash field is LE, so it must be explicitly
transformed on BE system for proper result.

Fixes: 370e812b ("exfat: add nls operations")
Cc: stable@vger.kernel.org # v5.7
Signed-off-by: NChen Minqiang <ptpt52@gmail.com>
Signed-off-by: NIlya Ponetayev <i.ponetaev@ndmsystems.com>
Reviewed-by: NSungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>

db415f7a

exfat: fix wrong size update of stream entry by typo · 41e3928f

由 Hyeongseok Kim 提交于 7月 08, 2020

The stream.size field is updated to the value of create timestamp
of the file entry. Fix this to use correct stream entry pointer.

Fixes: 29bbb14b ("exfat: fix incorrect update of stream entry in __exfat_truncate()")
Signed-off-by: NHyeongseok Kim <hyeongseok@gmail.com>
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>

41e3928f

exfat: fix wrong hint_stat initialization in exfat_find_dir_entry() · d2fa0c33

由 Namjae Jeon 提交于 7月 03, 2020

We found the wrong hint_stat initialization in exfat_find_dir_entry().
It should be initialized when cluster is EXFAT_EOF_CLUSTER.

Fixes: ca061973 ("exfat: add directory operations")
Cc: stable@vger.kernel.org # v5.7
Reviewed-by: NSungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>

d2fa0c33

exfat: fix overflow issue in exfat_cluster_to_sector() · 43946b70

由 Namjae Jeon 提交于 7月 03, 2020

An overflow issue can occur while calculating sector in
exfat_cluster_to_sector(). It needs to cast clus's type to sector_t
before left shifting.

Fixes: 1acf1a56 ("exfat: add in-memory and on-disk structures and headers")
Cc: stable@vger.kernel.org # v5.7
Reviewed-by: NSungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>

43946b70

20 7月, 2020 2 次提交

zonefs: count pages after truncating the iterator · 89ee7237

由 Johannes Thumshirn 提交于 7月 16, 2020

Count pages after possibly truncating the iterator to the maximum zone
append size, not before.
Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>

89ee7237

zonefs: Fix compilation warning · 01b2651c

由 Damien Le Moal 提交于 7月 20, 2020

Avoid the compilation warning "Variable 'ret' is reassigned a value
before the old one has been used." in zonefs_create_zgroup() by setting
ret for the error path only if an error happens.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>

01b2651c

19 7月, 2020 1 次提交

io_uring: always allow drain/link/hardlink/async sqe flags · 61710e43

由 Daniele Albano 提交于 7月 18, 2020

We currently filter these for timeout_remove/async_cancel/files_update,
but we only should be filtering for fixed file and buffer select. This
also causes a second read of sqe->flags, which isn't needed.

Just check req->flags for the relevant bits. This then allows these
commands to be used in links, for example, like everything else.
Signed-off-by: NDaniele Albano <d.albano@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

61710e43

18 7月, 2020 2 次提交

io_uring: ensure double poll additions work with both request types · 807abcb0

由 Jens Axboe 提交于 7月 17, 2020

The double poll additions were centered around doing POLL_ADD on file
descriptors that use more than one waitqueue (typically one for read,
one for write) when being polled. However, it can also end up being
triggered for when we use poll triggered retry. For that case, we cannot
safely use req->io, as that could be used by the request type itself.

Add a second io_poll_iocb pointer in the structure we allocate for poll
based retry, and ensure we use the right one from the two paths.

Fixes: 18bceab1 ("io_uring: allow POLL_ADD with double poll_wait() users")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

807abcb0

SUNRPC reverting ("NFSv4 fix CLOSE not waiting for direct IO compeletion") · 65caafd0

由 Olga Kornievskaia 提交于 7月 15, 2020

Reverting commit d03727b2 "NFSv4 fix CLOSE not waiting for
direct IO compeletion". This patch made it so that fput() by calling
inode_dio_done() in nfs_file_release() would wait uninterruptably
for any outstanding directIO to the file (but that wait on IO should
be killable).

The problem the patch was also trying to address was REMOVE returning
ERR_ACCESS because the file is still opened, is supposed to be resolved
by server returning ERR_FILE_OPEN and not ERR_ACCESS.
Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

65caafd0

16 7月, 2020 12 次提交

ovl: fix lookup of indexed hardlinks with metacopy · 4518dfcf

由 Amir Goldstein 提交于 7月 15, 2020

We recently moved setting inode flag OVL_UPPERDATA to ovl_lookup().

When looking up an overlay dentry, upperdentry may be found by index
and not by name.  In that case, we fail to read the metacopy xattr
and falsly set the OVL_UPPERDATA on the overlay inode.

This caused a regression in xfstest overlay/033 when run with
OVERLAY_MOUNT_OPTIONS="-o metacopy=on".

Fixes: 28166ab3 ("ovl: initialize OVL_UPPERDATA in ovl_lookup()")
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

4518dfcf

ovl: fix unneeded call to ovl_change_flags() · 81a33c1e

由 Amir Goldstein 提交于 6月 18, 2020

The check if user has changed the overlay file was wrong, causing unneeded
call to ovl_change_flags() including taking f_lock on every file access.

Fixes: d9899030 ("ovl: do not generate duplicate fsnotify events for "fake" path")
Cc: <stable@vger.kernel.org> # v4.19+
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

81a33c1e

afs: Fix interruption of operations · 811f04ba

由 David Howells 提交于 7月 08, 2020

The afs filesystem driver allows unstarted operations to be cancelled by
signal, but most of these can easily be restarted (mkdir for example).  The
primary culprits for reproducing this are those applications that use
SIGALRM to display a progress counter.

File lock-extension operation is marked uninterruptible as we have a
limited time in which to do it, and the release op is marked
uninterruptible also as if we fail to unlock a file, we'll have to wait 20
mins before anyone can lock it again.

The store operation logs a warning if it gets interruption, e.g.:

	kAFS: Unexpected error from FS.StoreData -4

because it's run from the background - but it can also be run from
fdatasync()-type things.  However, store options aren't marked
interruptible at the moment.

Fix this in the following ways:

 (1) Mark store operations as uninterruptible.  It might make sense to
     relax this for certain situations, but I'm not sure how to make sure
     that background store ops aren't affected by signals to foreground
     processes that happen to trigger them.

 (2) In afs_get_io_locks(), where we're getting the serialisation lock for
     talking to the fileserver, return ERESTARTSYS rather than EINTR
     because a lot of the operations (e.g. mkdir) are restartable if we
     haven't yet started sending the op to the server.

Fixes: e49c7b2f ("afs: Build an abstraction around an "operation" concept")
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

811f04ba

ovl: fix mount option checks for nfs_export with no upperdir · f0e1266e

由 Amir Goldstein 提交于 7月 13, 2020

Without upperdir mount option, there is no index dir and the dependency
checks nfs_export => index for mount options parsing are incorrect.

Allow the combination nfs_export=on,index=off with no upperdir and move
the check for dependency redirect_dir=nofollow for non-upper mount case
to mount options parsing.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

f0e1266e

ovl: force read-only sb on failure to create index dir · 470c1563

由 Amir Goldstein 提交于 7月 13, 2020

With index feature enabled, on failure to create index dir, overlay is
being mounted read-only.  However, we do not forbid user to remount overlay
read-write.  Fix that by setting ofs->workdir to NULL, which prevents
remount read-write.
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

470c1563

ovl: fix regression with re-formatted lower squashfs · a888db31

由 Amir Goldstein 提交于 7月 08, 2020

Commit 9df085f3 ("ovl: relax requirement for non null uuid of lower
fs") relaxed the requirement for non null uuid with single lower layer to
allow enabling index and nfs_export features with single lower squashfs.

Fabian reported a regression in a setup when overlay re-uses an existing
upper layer and re-formats the lower squashfs image. Because squashfs
has no uuid, the origin xattr in upper layer are decoded from the new
lower layer where they may resolve to a wrong origin file and user may
get an ESTALE or EIO error on lookup.

To avoid the reported regression while still allowing the new features
with single lower squashfs, do not allow decoding origin with lower null
uuid unless user opted-in to one of the new features that require
following the lower inode of non-dir upper (index, xino, metacopy).
Reported-by: NFabian <godi.beat@gmx.net>
Link: https://lore.kernel.org/linux-unionfs/32532923.JtPX5UtSzP@fgdesktop/
Fixes: 9df085f3 ("ovl: relax requirement for non null uuid of lower fs")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

a888db31

ovl: fix oops in ovl_indexdir_cleanup() with nfs_export=on · 20396365

由 Amir Goldstein 提交于 6月 21, 2020

Mounting with nfs_export=on, xfstests overlay/031 triggers a kernel panic
since v5.8-rc1 overlayfs updates.

 overlayfs: orphan index entry (index/00fb1..., ftype=4000, nlink=2)
 BUG: kernel NULL pointer dereference, address: 0000000000000030
 RIP: 0010:ovl_cleanup_and_whiteout+0x28/0x220 [overlay]

Bisect point at commit c21c839b ("ovl: whiteout inode sharing")

Minimal reproducer:
--------------------------------------------------
rm -rf l u w m
mkdir -p l u w m
mkdir -p l/testdir
touch l/testdir/testfile
mount -t overlay -o lowerdir=l,upperdir=u,workdir=w,nfs_export=on overlay m
echo 1 > m/testdir/testfile
umount m
rm -rf u/testdir
mount -t overlay -o lowerdir=l,upperdir=u,workdir=w,nfs_export=on overlay m
umount m
--------------------------------------------------

When mount with nfs_export=on, and fail to verify an orphan index, we're
cleaning this index from indexdir by calling ovl_cleanup_and_whiteout().
This dereferences ofs->workdir, that was earlier set to NULL.

The design was that ovl->workdir will point at ovl->indexdir, but we are
assigning ofs->indexdir to ofs->workdir only after ovl_indexdir_cleanup().
There is no reason not to do it sooner, because once we get success from
ofs->indexdir = ovl_workdir_create(... there is no turning back.
Reported-and-tested-by: NMurphy Zhou <jencce.kernel@gmail.com>
Fixes: c21c839b ("ovl: whiteout inode sharing")
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

20396365

ovl: relax WARN_ON() when decoding lower directory file handle · 124c2de2

由 Amir Goldstein 提交于 6月 17, 2020

Decoding a lower directory file handle to overlay path with cold
inode/dentry cache may go as follows:

1. Decode real lower file handle to lower dir path
2. Check if lower dir is indexed (was copied up)
3. If indexed, get the upper dir path from index
4. Lookup upper dir path in overlay
5. If overlay path found, verify that overlay lower is the lower dir
   from step 1

On failure to verify step 5 above, user will get an ESTALE error and a
WARN_ON will be printed.

A mismatch in step 5 could be a result of lower directory that was renamed
while overlay was offline, after that lower directory has been copied up
and indexed.

This is a scripted reproducer based on xfstest overlay/052:

  # Create lower subdir
  create_dirs
  create_test_files $lower/lowertestdir/subdir
  mount_dirs
  # Copy up lower dir and encode lower subdir file handle
  touch $SCRATCH_MNT/lowertestdir
  test_file_handles $SCRATCH_MNT/lowertestdir/subdir -p -o $tmp.fhandle
  # Rename lower dir offline
  unmount_dirs
  mv $lower/lowertestdir $lower/lowertestdir.new/
  mount_dirs
  # Attempt to decode lower subdir file handle
  test_file_handles $SCRATCH_MNT -p -i $tmp.fhandle

Since this WARN_ON() can be triggered by user we need to relax it.

Fixes: 4b91c30a ("ovl: lookup connected ancestor of dir in inode cache")
Cc: <stable@vger.kernel.org> # v4.16+
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

124c2de2

ovl: remove not used argument in ovl_check_origin · d78a0dcf

由 youngjun 提交于 6月 21, 2020

ovl_check_origin outparam 'ctrp' argument not used by caller.  So remove
this argument.
Signed-off-by: Nyoungjun <her0gyugyu@gmail.com>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

d78a0dcf

ovl: change ovl_copy_up_flags static · 5ac8e802

由 youngjun 提交于 6月 21, 2020

"ovl_copy_up_flags" is used in copy_up.c.
so, change it static.
Signed-off-by: Nyoungjun <her0gyugyu@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

5ac8e802

ovl: inode reference leak in ovl_is_inuse true case. · 24f14009

由 youngjun 提交于 6月 16, 2020

When "ovl_is_inuse" true case, trap inode reference not put.  plus adding
the comment explaining sequence of ovl_is_inuse after ovl_setup_trap.

Fixes: 0be0bfd2 ("ovl: fix regression caused by overlapping layers detection")
Cc: <stable@vger.kernel.org> # v4.19+
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: Nyoungjun <her0gyugyu@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

24f14009

io_uring: fix recvmsg memory leak with buffer selection · 681fda8d

由 Pavel Begunkov 提交于 7月 15, 2020

io_recvmsg() doesn't free memory allocated for struct io_buffer. This can
causes a leak when used with automatic buffer selection.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

681fda8d

15 7月, 2020 1 次提交

fuse: Fix parameter for FS_IOC_{GET,SET}FLAGS · 31070f6c

由 Chirantan Ekbote 提交于 7月 14, 2020

The ioctl encoding for this parameter is a long but the documentation says
it should be an int and the kernel drivers expect it to be an int.  If the
fuse driver treats this as a long it might end up scribbling over the stack
of a userspace process that only allocated enough space for an int.

This was previously discussed in [1] and a patch for fuse was proposed in
[2].  From what I can tell the patch in [2] was nacked in favor of adding
new, "fixed" ioctls and using those from userspace.  However there is still
no "fixed" version of these ioctls and the fact is that it's sometimes
infeasible to change all userspace to use the new one.

Handling the ioctls specially in the fuse driver seems like the most
pragmatic way for fuse servers to support them without causing crashes in
userspace applications that call them.

[1]: https://lore.kernel.org/linux-fsdevel/20131126200559.GH20559@hall.aurel32.net/T/
[2]: https://sourceforge.net/p/fuse/mailman/message/31771759/Signed-off-by: NChirantan Ekbote <chirantan@chromium.org>
Fixes: 59efec7b ("fuse: implement ioctl support")
Cc: <stable@vger.kernel.org>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

31070f6c

14 7月, 2020 6 次提交

fuse: don't ignore errors from fuse_writepages_fill() · 7779b047

由 Vasily Averin 提交于 6月 25, 2020

fuse_writepages() ignores some errors taken from fuse_writepages_fill() I
believe it is a bug: if .writepages is called with WB_SYNC_ALL it should
either guarantee that all data was successfully saved or return error.

Fixes: 26d614df ("fuse: Implement writepages callback")
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

7779b047

fuse: clean up condition for writepage sending · 6ddf3af9

由 Miklos Szeredi 提交于 7月 14, 2020

fuse_writepages_fill uses following construction:

if (wpa && ap->num_pages &&
    (A || B || C)) {
        action;
} else if (wpa && D) {
        if (E) {
                the same action;
        }
}

 - ap->num_pages check is always true and can be removed

 - "if" and "else if" calls the same action and can be merged.

Move checking A, B, C, D, E conditions to a helper, add comments.
Original-patch-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

6ddf3af9

fuse: reject options on reconfigure via fsconfig(2) · b330966f

由 Miklos Szeredi 提交于 7月 14, 2020

Previous patch changed handling of remount/reconfigure to ignore all
options, including those that are unknown to the fuse kernel fs.  This was
done for backward compatibility, but this likely only affects the old
mount(2) API.

The new fsconfig(2) based reconfiguration could possibly be improved.  This
would make the new API less of a drop in replacement for the old, OTOH this
is a good chance to get rid of some weirdnesses in the old API.

Several other behaviors might make sense:

 1) unknown options are rejected, known options are ignored

 2) unknown options are rejected, known options are rejected if the value
 is changed, allowed otherwise

 3) all options are rejected

Prior to the backward compatibility fix to ignore all options all known
options were accepted (1), even if they change the value of a mount
parameter; fuse_reconfigure() does not look at the config values set by
fuse_parse_param().

To fix that we'd need to verify that the value provided is the same as set
in the initial configuration (2).  The major drawback is that this is much
more complex than just rejecting all attempts at changing options (3);
i.e. all options signify initial configuration values and don't make sense
on reconfigure.

This patch opts for (3) with the rationale that no mount options are
reconfigurable in fuse.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

b330966f

fuse: ignore 'data' argument of mount(..., MS_REMOUNT) · e8b20a47

由 Miklos Szeredi 提交于 7月 14, 2020

The command

  mount -o remount -o unknownoption /mnt/fuse

succeeds on kernel versions prior to v5.4 and fails on kernel version at or
after.  This is because fuse_parse_param() rejects any unrecognised options
in case of FS_CONTEXT_FOR_RECONFIGURE, just as for FS_CONTEXT_FOR_MOUNT.

This causes a regression in case the fuse filesystem is in fstab, since
remount sends all options found there to the kernel; even ones that are
meant for the initial mount and are consumed by the userspace fuse server.

Fix this by ignoring mount options, just as fuse_remount_fs() did prior to
the conversion to the new API.
Reported-by: NStefan Priebe <s.priebe@profihost.ag>
Fixes: c30da2e9 ("fuse: convert to use the new mount API")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

e8b20a47

fuse: use ->reconfigure() instead of ->remount_fs() · 0189a2d3

由 Miklos Szeredi 提交于 7月 14, 2020

s_op->remount_fs() is only called from legacy_reconfigure(), which is not
used after being converted to the new API.

Convert to using ->reconfigure().  This restores the previous behavior of
syncing the filesystem and rejecting MS_MANDLOCK on remount.

Fixes: c30da2e9 ("fuse: convert to use the new mount API")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

0189a2d3

fuse: fix warning in tree_insert() and clean up writepage insertion · c146024e

由 Miklos Szeredi 提交于 7月 14, 2020

fuse_writepages_fill() calls tree_insert() with ap->num_pages = 0 which
triggers the following warning:

 WARNING: CPU: 1 PID: 17211 at fs/fuse/file.c:1728 tree_insert+0xab/0xc0 [fuse]
 RIP: 0010:tree_insert+0xab/0xc0 [fuse]
 Call Trace:
  fuse_writepages_fill+0x5da/0x6a0 [fuse]
  write_cache_pages+0x171/0x470
  fuse_writepages+0x8a/0x100 [fuse]
  do_writepages+0x43/0xe0

Fix up the warning and clean up the code around rb-tree insertion:

 - Rename tree_insert() to fuse_insert_writeback() and make it return the
   conflicting entry in case of failure

 - Re-add tree_insert() as a wrapper around fuse_insert_writeback()

 - Rename fuse_writepage_in_flight() to fuse_writepage_add() and reverse
   the meaning of the return value to mean

    + "true" in case the writepage entry was successfully added

    + "false" in case it was in-fligt queued on an existing writepage
       entry's auxiliary list or the existing writepage entry's temporary
       page updated

   Switch from fuse_find_writeback() + tree_insert() to
   fuse_insert_writeback()

 - Move setting orig_pages to before inserting/updating the entry; this may
   result in the orig_pages value being discarded later in case of an
   in-flight request

 - In case of a new writepage entry use fuse_writepage_add()
   unconditionally, only set data->wpa if the entry was added.

Fixes: 6b2fb799 ("fuse: optimize writepages search")
Reported-by: Nkernel test robot <rong.a.chen@intel.com>
Original-path-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

c146024e

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功