1. 26 10月, 2019 1 次提交
    • C
      f2fs: cache global IPU bio · 0b20fcec
      Chao Yu 提交于
      In commit 8648de2c ("f2fs: add bio cache for IPU"), we added
      f2fs_submit_ipu_bio() in __write_data_page() as below:
      
      __write_data_page()
      
      	if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) {
      		f2fs_submit_ipu_bio(sbi, bio, page);
      		....
      	}
      
      in order to avoid below deadlock:
      
      Thread A				Thread B
      - __write_data_page (inode x, page y)
       - f2fs_do_write_data_page
        - set_page_writeback        ---- set writeback flag in page y
        - f2fs_inplace_write_data
       - f2fs_balance_fs
      					 - lock gc_mutex
       - lock gc_mutex
      					  - f2fs_gc
      					   - do_garbage_collect
      					    - gc_data_segment
      					     - move_data_page
      					      - f2fs_wait_on_page_writeback
      					       - wait_on_page_writeback  --- wait writeback of page y
      
      However, the bio submission breaks the merge of IPU IOs.
      
      So in this patch let's add a global bio cache for merged IPU pages,
      then f2fs_wait_on_page_writeback() is able to submit bio if a
      writebacked page is cached in global bio cache.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b20fcec
  2. 23 10月, 2019 4 次提交
  3. 05 10月, 2019 1 次提交
    • C
      f2fs: fix to update time in lazytime mode · fe1897ea
      Chao Yu 提交于
      generic/018 reports an inconsistent status of atime, the
      testcase is as below:
      - open file with O_SYNC
      - write file to construct fraged space
      - calc md5 of file
      - record {a,c,m}time
      - defrag file --- do nothing
      - umount & mount
      - check {a,c,m}time
      
      The root cause is, as f2fs enables lazytime by default, atime
      update will dirty vfs inode, rather than dirtying f2fs inode (by set
      with FI_DIRTY_INODE), so later f2fs_write_inode() called from VFS will
      fail to update inode page due to our skip:
      
      f2fs_write_inode()
      	if (is_inode_flag_set(inode, FI_DIRTY_INODE))
      		return 0;
      
      So eventually, after evict(), we lose last atime for ever.
      
      To fix this issue, we need to check whether {a,c,m,cr}time is
      consistent in between inode cache and inode page, and only skip
      f2fs_update_inode() if f2fs inode is not dirty and time is
      consistent as well.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fe1897ea
  4. 04 10月, 2019 1 次提交
    • E
      vfs: Fix EOVERFLOW testing in put_compat_statfs64 · cc3a7bfe
      Eric Sandeen 提交于
      Today, put_compat_statfs64() disallows nearly any field value over
      2^32 if f_bsize is only 32 bits, but that makes no sense.
      compat_statfs64 is there for the explicit purpose of providing 64-bit
      fields for f_files, f_ffree, etc.  And f_bsize is always only 32 bits.
      
      As a result, 32-bit userspace gets -EOVERFLOW for i.e.  large file
      counts even with -D_FILE_OFFSET_BITS=64 set.
      
      In reality, only f_bsize and f_frsize can legitimately overflow
      (fields like f_type and f_namelen should never be large), so test
      only those fields.
      
      This bug was discussed at length some time ago, and this is the proposal
      Al suggested at https://lkml.org/lkml/2018/8/6/640.  It seemed to get
      dropped amid the discussion of other related changes, but this
      part seems obviously correct on its own, so I've picked it up and
      sent it, for expediency.
      
      Fixes: 64d2ab32 ("vfs: fix put_compat_statfs64() does not handle errors")
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc3a7bfe
  5. 01 10月, 2019 4 次提交
  6. 30 9月, 2019 1 次提交
    • L
      Revert "Revert "ext4: make __ext4_get_inode_loc plug"" · 02f03c42
      Linus Torvalds 提交于
      This reverts commit 72dbcf72.
      
      Instead of waiting forever for entropy that may just not happen, we now
      try to actively generate entropy when required, and are thus hopefully
      avoiding the problem that caused the nice ext4 IO pattern fix to be
      reverted.
      
      So revert the revert.
      
      Cc: Ahmed S. Darwish <darwish.07@gmail.com>
      Cc: Ted Ts'o <tytso@mit.edu>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Alexander E. Patrakov <patrakov@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02f03c42
  7. 27 9月, 2019 8 次提交
  8. 26 9月, 2019 16 次提交
  9. 25 9月, 2019 4 次提交
    • M
      sched/membarrier: Fix p->mm->membarrier_state racy load · 227a4aad
      Mathieu Desnoyers 提交于
      The membarrier_state field is located within the mm_struct, which
      is not guaranteed to exist when used from runqueue-lock-free iteration
      on runqueues by the membarrier system call.
      
      Copy the membarrier_state from the mm_struct into the scheduler runqueue
      when the scheduler switches between mm.
      
      When registering membarrier for mm, after setting the registration bit
      in the mm membarrier state, issue a synchronize_rcu() to ensure the
      scheduler observes the change. In order to take care of the case
      where a runqueue keeps executing the target mm without swapping to
      other mm, iterate over each runqueue and issue an IPI to copy the
      membarrier_state from the mm_struct into each runqueue which have the
      same mm which state has just been modified.
      
      Move the mm membarrier_state field closer to pgd in mm_struct to use
      a cache line already touched by the scheduler switch_mm.
      
      The membarrier_execve() (now membarrier_exec_mmap) hook now needs to
      clear the runqueue's membarrier state in addition to clear the mm
      membarrier state, so move its implementation into the scheduler
      membarrier code so it can access the runqueue structure.
      
      Add memory barrier in membarrier_exec_mmap() prior to clearing
      the membarrier state, ensuring memory accesses executed prior to exec
      are not reordered with the stores clearing the membarrier state.
      
      As suggested by Linus, move all membarrier.c RCU read-side locks outside
      of the for each cpu loops.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kirill Tkhai <tkhai@yandex.ru>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190919173705.2181-5-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      227a4aad
    • Q
      btrfs: Fix a regression which we can't convert to SINGLE profile · fab27359
      Qu Wenruo 提交于
      [BUG]
      With v5.3 kernel, we can't convert to SINGLE profile:
      
        # btrfs balance start -f -dconvert=single $mnt
        ERROR: error during balancing '/mnt/btrfs': Invalid argument
        # dmesg -t | tail
        validate_convert_profile: data profile=0x1000000000000 allowed=0x20 is_valid=1 final=0x1000000000000 ret=1
        BTRFS error (device dm-3): balance: invalid convert data profile single
      
      [CAUSE]
      With the extra debug output added, it shows that the @allowed bit is
      lacking the special in-memory only SINGLE profile bit.
      
      Thus we fail at that (profile & ~allowed) check.
      
      This regression is caused by commit 081db89b ("btrfs: use raid_attr
      to get allowed profiles for balance conversion") and the fact that we
      don't use any bit to indicate SINGLE profile on-disk, but uses special
      in-memory only bit to help distinguish different profiles.
      
      [FIX]
      Add that BTRFS_AVAIL_ALLOC_BIT_SINGLE to @allowed, so the code should be
      the same as it was and fix the regression.
      Reported-by: NChris Murphy <lists@colorremedies.com>
      Fixes: 081db89b ("btrfs: use raid_attr to get allowed profiles for balance conversion")
      CC: stable@vger.kernel.org # 5.3+
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fab27359
    • Q
      btrfs: relocation: fix use-after-free on dead relocation roots · 1fac4a54
      Qu Wenruo 提交于
      [BUG]
      One user reported a reproducible KASAN report about use-after-free:
      
        BTRFS info (device sdi1): balance: start -dvrange=1256811659264..1256811659265
        BTRFS info (device sdi1): relocating block group 1256811659264 flags data|raid0
        ==================================================================
        BUG: KASAN: use-after-free in btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
        Write of size 8 at addr ffff88856f671710 by task kworker/u24:10/261579
      
        CPU: 2 PID: 261579 Comm: kworker/u24:10 Tainted: P           OE     5.2.11-arch1-1-kasan #4
        Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme4, BIOS P3.80 04/06/2018
        Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
        Call Trace:
         dump_stack+0x7b/0xba
         print_address_description+0x6c/0x22e
         ? btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
         __kasan_report.cold+0x1b/0x3b
         ? btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
         kasan_report+0x12/0x17
         __asan_report_store8_noabort+0x17/0x20
         btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
         record_root_in_trans+0x2a0/0x370 [btrfs]
         btrfs_record_root_in_trans+0xf4/0x140 [btrfs]
         start_transaction+0x1ab/0xe90 [btrfs]
         btrfs_join_transaction+0x1d/0x20 [btrfs]
         btrfs_finish_ordered_io+0x7bf/0x18a0 [btrfs]
         ? lock_repin_lock+0x400/0x400
         ? __kmem_cache_shutdown.cold+0x140/0x1ad
         ? btrfs_unlink_subvol+0x9b0/0x9b0 [btrfs]
         finish_ordered_fn+0x15/0x20 [btrfs]
         normal_work_helper+0x1bd/0xca0 [btrfs]
         ? process_one_work+0x819/0x1720
         ? kasan_check_read+0x11/0x20
         btrfs_endio_write_helper+0x12/0x20 [btrfs]
         process_one_work+0x8c9/0x1720
         ? pwq_dec_nr_in_flight+0x2f0/0x2f0
         ? worker_thread+0x1d9/0x1030
         worker_thread+0x98/0x1030
         kthread+0x2bb/0x3b0
         ? process_one_work+0x1720/0x1720
         ? kthread_park+0x120/0x120
         ret_from_fork+0x35/0x40
      
        Allocated by task 369692:
         __kasan_kmalloc.part.0+0x44/0xc0
         __kasan_kmalloc.constprop.0+0xba/0xc0
         kasan_kmalloc+0x9/0x10
         kmem_cache_alloc_trace+0x138/0x260
         btrfs_read_tree_root+0x92/0x360 [btrfs]
         btrfs_read_fs_root+0x10/0xb0 [btrfs]
         create_reloc_root+0x47d/0xa10 [btrfs]
         btrfs_init_reloc_root+0x1e2/0x340 [btrfs]
         record_root_in_trans+0x2a0/0x370 [btrfs]
         btrfs_record_root_in_trans+0xf4/0x140 [btrfs]
         start_transaction+0x1ab/0xe90 [btrfs]
         btrfs_start_transaction+0x1e/0x20 [btrfs]
         __btrfs_prealloc_file_range+0x1c2/0xa00 [btrfs]
         btrfs_prealloc_file_range+0x13/0x20 [btrfs]
         prealloc_file_extent_cluster+0x29f/0x570 [btrfs]
         relocate_file_extent_cluster+0x193/0xc30 [btrfs]
         relocate_data_extent+0x1f8/0x490 [btrfs]
         relocate_block_group+0x600/0x1060 [btrfs]
         btrfs_relocate_block_group+0x3a0/0xa00 [btrfs]
         btrfs_relocate_chunk+0x9e/0x180 [btrfs]
         btrfs_balance+0x14e4/0x2fc0 [btrfs]
         btrfs_ioctl_balance+0x47f/0x640 [btrfs]
         btrfs_ioctl+0x119d/0x8380 [btrfs]
         do_vfs_ioctl+0x9f5/0x1060
         ksys_ioctl+0x67/0x90
         __x64_sys_ioctl+0x73/0xb0
         do_syscall_64+0xa5/0x370
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        Freed by task 369692:
         __kasan_slab_free+0x14f/0x210
         kasan_slab_free+0xe/0x10
         kfree+0xd8/0x270
         btrfs_drop_snapshot+0x154c/0x1eb0 [btrfs]
         clean_dirty_subvols+0x227/0x340 [btrfs]
         relocate_block_group+0x972/0x1060 [btrfs]
         btrfs_relocate_block_group+0x3a0/0xa00 [btrfs]
         btrfs_relocate_chunk+0x9e/0x180 [btrfs]
         btrfs_balance+0x14e4/0x2fc0 [btrfs]
         btrfs_ioctl_balance+0x47f/0x640 [btrfs]
         btrfs_ioctl+0x119d/0x8380 [btrfs]
         do_vfs_ioctl+0x9f5/0x1060
         ksys_ioctl+0x67/0x90
         __x64_sys_ioctl+0x73/0xb0
         do_syscall_64+0xa5/0x370
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        The buggy address belongs to the object at ffff88856f671100
         which belongs to the cache kmalloc-4k of size 4096
        The buggy address is located 1552 bytes inside of
         4096-byte region [ffff88856f671100, ffff88856f672100)
        The buggy address belongs to the page:
        page:ffffea0015bd9c00 refcount:1 mapcount:0 mapping:ffff88864400e600 index:0x0 compound_mapcount: 0
        flags: 0x2ffff0000010200(slab|head)
        raw: 02ffff0000010200 dead000000000100 dead000000000200 ffff88864400e600
        raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
        page dumped because: kasan: bad access detected
      
        Memory state around the buggy address:
         ffff88856f671600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
         ffff88856f671680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        >ffff88856f671700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                 ^
         ffff88856f671780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
         ffff88856f671800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ==================================================================
        BTRFS info (device sdi1): 1 enospc errors during balance
        BTRFS info (device sdi1): balance: ended with status: -28
      
      [CAUSE]
      The problem happens when finish_ordered_io() get called with balance
      still running, while the reloc root of that subvolume is already dead.
      (Tree is swap already done, but tree not yet deleted for possible qgroup
      usage.)
      
      That means root->reloc_root still exists, but that reloc_root can be
      under btrfs_drop_snapshot(), thus we shouldn't access it.
      
      The following race could cause the use-after-free problem:
      
                      CPU1              |                CPU2
      --------------------------------------------------------------------------
                                        | relocate_block_group()
                                        | |- unset_reloc_control(rc)
                                        | |- btrfs_commit_transaction()
      btrfs_finish_ordered_io()         | |- clean_dirty_subvols()
      |- btrfs_join_transaction()       |    |
         |- record_root_in_trans()      |    |
            |- btrfs_init_reloc_root()  |    |
               |- if (root->reloc_root) |    |
               |                        |    |- root->reloc_root = NULL
               |                        |    |- btrfs_drop_snapshot(reloc_root);
               |- reloc_root->last_trans|
                       = trans->transid |
      	    ^^^^^^^^^^^^^^^^^^^^^^
                  Use after free
      
      [FIX]
      Fix it by the following modifications:
      
      - Test if the root has dead reloc tree before accessing root->reloc_root
        If the root has BTRFS_ROOT_DEAD_RELOC_TREE, then we don't need to
        create or update root->reloc_tree
      
      - Clear the BTRFS_ROOT_DEAD_RELOC_TREE flag until we have fully dropped
        reloc tree
        To co-operate with above modification, so as long as
        BTRFS_ROOT_DEAD_RELOC_TREE is still set, we won't try to re-create
        reloc tree at record_root_in_trans().
      Reported-by: NCebtenzzre <cebtenzzre@gmail.com>
      Fixes: d2311e69 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
      CC: stable@vger.kernel.org # 5.1+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1fac4a54
    • S
      smb3: Add missing reparse tags · 131ea1ed
      Steve French 提交于
      Additional reparse tags were described for WSL and file sync.
      Add missing defines for these tags. Some will be useful for
      POSIX extensions (as discussed at Storage Developer Conference).
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      131ea1ed