1. 16 6月, 2020 5 次提交
    • E
      ext4: rename s_journal_flag_rwsem to s_writepages_rwsem · 1b7c46e9
      Eric Biggers 提交于
      task #28557685
      
      commit bbd55937de8f2754adc5792b0f8e5ff7d9c0420e upstream.
      
      In preparation for making s_journal_flag_rwsem synchronize
      ext4_writepages() with changes to both the EXTENTS and JOURNAL_DATA
      flags (rather than just JOURNAL_DATA as it does currently), rename it to
      s_writepages_rwsem.
      
      Link: https://lore.kernel.org/r/20200219183047.47417-2-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      1b7c46e9
    • J
      ext4: fix checksum errors with indexed dirs · 20bd22e8
      Jan Kara 提交于
      task #28557685
      
      commit 48a34311953d921235f4d7bbd2111690d2e469cf upstream.
      
      DIR_INDEX has been introduced as a compat ext4 feature. That means that
      even kernels / tools that don't understand the feature may modify the
      filesystem. This works because for kernels not understanding indexed dir
      format, internal htree nodes appear just as empty directory entries.
      Index dir aware kernels then check the htree structure is still
      consistent before using the data. This all worked reasonably well until
      metadata checksums were introduced. The problem is that these
      effectively made DIR_INDEX only ro-compatible because internal htree
      nodes store checksums in a different place than normal directory blocks.
      Thus any modification ignorant to DIR_INDEX (or just clearing
      EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch
      and trigger kernel errors. So we have to be more careful when dealing
      with indexed directories on filesystems with checksumming enabled.
      
      1) We just disallow loading any directory inodes with EXT4_INDEX_FL when
      DIR_INDEX is not enabled. This is harsh but it should be very rare (it
      means someone disabled DIR_INDEX on existing filesystem and didn't run
      e2fsck), e2fsck can fix the problem, and we don't want to answer the
      difficult question: "Should we rather corrupt the directory more or
      should we ignore that DIR_INDEX feature is not set?"
      
      2) When we find out htree structure is corrupted (but the filesystem and
      the directory should in support htrees), we continue just ignoring htree
      information for reading but we refuse to add new entries to the
      directory to avoid corrupting it more.
      
      Link: https://lore.kernel.org/r/20200210144316.22081-1-jack@suse.cz
      Fixes: dbe89444 ("ext4: Calculate and verify checksums for htree nodes")
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      20bd22e8
    • Q
      ext4: fix a data race at inode->i_blocks · 7b067e44
      Qian Cai 提交于
      task #28557685
      
      commit 28936b62e71e41600bab319f262ea9f9b1027629 upstream.
      
      inode->i_blocks could be accessed concurrently as noticed by KCSAN,
      
       BUG: KCSAN: data-race in ext4_do_update_inode [ext4] / inode_add_bytes
      
       write to 0xffff9a00d4b982d0 of 8 bytes by task 22100 on cpu 118:
        inode_add_bytes+0x65/0xf0
        __inode_add_bytes at fs/stat.c:689
        (inlined by) inode_add_bytes at fs/stat.c:702
        ext4_mb_new_blocks+0x418/0xca0 [ext4]
        ext4_ext_map_blocks+0x1a6b/0x27b0 [ext4]
        ext4_map_blocks+0x1a9/0x950 [ext4]
        _ext4_get_block+0xfc/0x270 [ext4]
        ext4_get_block_unwritten+0x33/0x50 [ext4]
        __block_write_begin_int+0x22e/0xae0
        __block_write_begin+0x39/0x50
        ext4_write_begin+0x388/0xb50 [ext4]
        ext4_da_write_begin+0x35f/0x8f0 [ext4]
        generic_perform_write+0x15d/0x290
        ext4_buffered_write_iter+0x11f/0x210 [ext4]
        ext4_file_write_iter+0xce/0x9e0 [ext4]
        new_sync_write+0x29c/0x3b0
        __vfs_write+0x92/0xa0
        vfs_write+0x103/0x260
        ksys_write+0x9d/0x130
        __x64_sys_write+0x4c/0x60
        do_syscall_64+0x91/0xb05
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       read to 0xffff9a00d4b982d0 of 8 bytes by task 8 on cpu 65:
        ext4_do_update_inode+0x4a0/0xf60 [ext4]
        ext4_inode_blocks_set at fs/ext4/inode.c:4815
        ext4_mark_iloc_dirty+0xaf/0x160 [ext4]
        ext4_mark_inode_dirty+0x129/0x3e0 [ext4]
        ext4_convert_unwritten_extents+0x253/0x2d0 [ext4]
        ext4_convert_unwritten_io_end_vec+0xc5/0x150 [ext4]
        ext4_end_io_rsv_work+0x22c/0x350 [ext4]
        process_one_work+0x54f/0xb90
        worker_thread+0x80/0x5f0
        kthread+0x1cd/0x1f0
        ret_from_fork+0x27/0x50
      
       4 locks held by kworker/u256:0/8:
        #0: ffff9a025abc4328 ((wq_completion)ext4-rsv-conversion){+.+.}, at: process_one_work+0x443/0xb90
        #1: ffffab5a862dbe20 ((work_completion)(&ei->i_rsv_conversion_work)){+.+.}, at: process_one_work+0x443/0xb90
        #2: ffff9a025a9d0f58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
        #3: ffff9a00d4b985d8 (&(&ei->i_raw_lock)->rlock){+.+.}, at: ext4_do_update_inode+0xaa/0xf60 [ext4]
       irq event stamp: 3009267
       hardirqs last  enabled at (3009267): [<ffffffff980da9b7>] __find_get_block+0x107/0x790
       hardirqs last disabled at (3009266): [<ffffffff980da8f9>] __find_get_block+0x49/0x790
       softirqs last  enabled at (3009230): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c
       softirqs last disabled at (3009223): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 65 PID: 8 Comm: kworker/u256:0 Tainted: G L 5.6.0-rc2-next-20200221+ #7
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
       Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work [ext4]
      
      The plain read is outside of inode->i_lock critical section which
      results in a data race. Fix it by adding READ_ONCE() there.
      
      Link: https://lore.kernel.org/r/20200222043258.2279-1-cai@lca.pwSigned-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      7b067e44
    • Q
      ext4: fix a data race in EXT4_I(inode)->i_disksize · ff11a028
      Qian Cai 提交于
      task #28557685
      
      commit 35df4299a6487f323b0aca120ea3f485dfee2ae3 upstream.
      
      EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by
      KCSAN,
      
       BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4]
      
       write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127:
        ext4_write_end+0x4e3/0x750 [ext4]
        ext4_update_i_disksize at fs/ext4/ext4.h:3032
        (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046
        (inlined by) ext4_write_end at fs/ext4/inode.c:1287
        generic_perform_write+0x208/0x2a0
        ext4_buffered_write_iter+0x11f/0x210 [ext4]
        ext4_file_write_iter+0xce/0x9e0 [ext4]
        new_sync_write+0x29c/0x3b0
        __vfs_write+0x92/0xa0
        vfs_write+0x103/0x260
        ksys_write+0x9d/0x130
        __x64_sys_write+0x4c/0x60
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37:
        ext4_writepages+0x10ac/0x1d00 [ext4]
        mpage_map_and_submit_extent at fs/ext4/inode.c:2468
        (inlined by) ext4_writepages at fs/ext4/inode.c:2772
        do_writepages+0x5e/0x130
        __writeback_single_inode+0xeb/0xb20
        writeback_sb_inodes+0x429/0x900
        __writeback_inodes_wb+0xc4/0x150
        wb_writeback+0x4bd/0x870
        wb_workfn+0x6b4/0x960
        process_one_work+0x54c/0xbe0
        worker_thread+0x80/0x650
        kthread+0x1e0/0x200
        ret_from_fork+0x27/0x50
      
       Reported by Kernel Concurrency Sanitizer on:
       CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G        W  O L 5.5.0-next-20200204+ #5
       Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
       Workqueue: writeback wb_workfn (flush-7:0)
      
      Since only the read is operating as lockless (outside of the
      "i_data_sem"), load tearing could introduce a logic bug. Fix it by
      adding READ_ONCE() for the read and WRITE_ONCE() for the write.
      Signed-off-by: NQian Cai <cai@lca.pw>
      Link: https://lore.kernel.org/r/1581085751-31793-1-git-send-email-cai@lca.pwSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      ff11a028
    • D
      ext4: unlock on error in ext4_expand_extra_isize() · b23f8091
      Dan Carpenter 提交于
      task #28557685
      
      commit 7f420d64a08c1dcd65b27be82a27cf2bdb2e7847 upstream.
      
      We need to unlock the xattr before returning on this error path.
      
      Cc: stable@kernel.org # 4.13
      Fixes: c03b45b8 ("ext4, project: expand inode extra size if possible")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Link: https://lore.kernel.org/r/20191213185010.6k7yl2tck3wlsdkt@kili.mountainSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      b23f8091
  2. 18 3月, 2020 11 次提交
  3. 17 1月, 2020 1 次提交
  4. 15 1月, 2020 1 次提交
  5. 27 12月, 2019 4 次提交
  6. 18 12月, 2019 2 次提交
    • Y
      ext4: fix a bug in ext4_wait_for_tail_page_commit · b1ec93dd
      yangerkun 提交于
      commit 565333a1554d704789e74205989305c811fd9c7a upstream.
      
      No need to wait for any commit once the page is fully truncated.
      Besides, it may confuse e.g. concurrent ext4_writepage() with the page
      still be dirty (will be cleared by truncate_pagecache() in
      ext4_setattr()) but buffers has been freed; and then trigger a bug
      show as below:
      
      [   26.057508] ------------[ cut here ]------------
      [   26.058531] kernel BUG at fs/ext4/inode.c:2134!
      ...
      [   26.088130] Call trace:
      [   26.088695]  ext4_writepage+0x914/0xb28
      [   26.089541]  writeout.isra.4+0x1b4/0x2b8
      [   26.090409]  move_to_new_page+0x3b0/0x568
      [   26.091338]  __unmap_and_move+0x648/0x988
      [   26.092241]  unmap_and_move+0x48c/0xbb8
      [   26.093096]  migrate_pages+0x220/0xb28
      [   26.093945]  kernel_mbind+0x828/0xa18
      [   26.094791]  __arm64_sys_mbind+0xc8/0x138
      [   26.095716]  el0_svc_common+0x190/0x490
      [   26.096571]  el0_svc_handler+0x60/0xd0
      [   26.097423]  el0_svc+0x8/0xc
      
      Run the procedure (generate by syzkaller) parallel with ext3.
      
      void main()
      {
      	int fd, fd1, ret;
      	void *addr;
      	size_t length = 4096;
      	int flags;
      	off_t offset = 0;
      	char *str = "12345";
      
      	fd = open("a", O_RDWR | O_CREAT);
      	assert(fd >= 0);
      
      	/* Truncate to 4k */
      	ret = ftruncate(fd, length);
      	assert(ret == 0);
      
      	/* Journal data mode */
      	flags = 0xc00f;
      	ret = ioctl(fd, _IOW('f', 2, long), &flags);
      	assert(ret == 0);
      
      	/* Truncate to 0 */
      	fd1 = open("a", O_TRUNC | O_NOATIME);
      	assert(fd1 >= 0);
      
      	addr = mmap(NULL, length, PROT_WRITE | PROT_READ,
      					MAP_SHARED, fd, offset);
      	assert(addr != (void *)-1);
      
      	memcpy(addr, str, 5);
      	mbind(addr, length, 0, 0, 0, MPOL_MF_MOVE);
      }
      
      And the bug will be triggered once we seen the below order.
      
      reproduce1                         reproduce2
      
      ...                            |   ...
      truncate to 4k                 |
      change to journal data mode    |
                                     |   memcpy(set page dirty)
      truncate to 0:                 |
      ext4_setattr:                  |
      ...                            |
      ext4_wait_for_tail_page_commit |
                                     |   mbind(trigger bug)
      truncate_pagecache(clean dirty)|   ...
      ...                            |
      
      mbind will call ext4_writepage() since the page still be dirty, and then
      report the bug since the buffers has been free. Fix it by return
      directly once offset equals to 0 which means the page has been fully
      truncated.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Link: https://lore.kernel.org/r/20190919063508.1045-1-yangerkun@huawei.comReviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b1ec93dd
    • J
      ext4: Fix credit estimate for final inode freeing · 595a92a4
      Jan Kara 提交于
      commit 65db869c754e7c271691dd5feabf884347e694f5 upstream.
      
      Estimate for the number of credits needed for final freeing of inode in
      ext4_evict_inode() was to small. We may modify 4 blocks (inode & sb for
      orphan deletion, bitmap & group descriptor for inode freeing) and not
      just 3.
      
      [ Fixed minor whitespace nit. -- TYT ]
      
      Fixes: e50e5129 ("ext4: xattr-in-inode support")
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20191105164437.32602-6-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      595a92a4
  7. 05 12月, 2019 1 次提交
  8. 05 10月, 2019 1 次提交
    • T
      ext4: fix punch hole for inline_data file systems · 091c754d
      Theodore Ts'o 提交于
      commit c1e8220bd316d8ae8e524df39534b8a412a45d5e upstream.
      
      If a program attempts to punch a hole on an inline data file, we need
      to convert it to a normal file first.
      
      This was detected using ext4/032 using the adv configuration.  Simple
      reproducer:
      
      mke2fs -Fq -t ext4 -O inline_data /dev/vdc
      mount /vdc
      echo "" > /vdc/testfile
      xfs_io -c 'truncate 33554432' /vdc/testfile
      xfs_io -c 'fpunch 0 1048576' /vdc/testfile
      umount /vdc
      e2fsck -fy /dev/vdc
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      091c754d
  9. 16 9月, 2019 1 次提交
  10. 28 7月, 2019 2 次提交
  11. 31 5月, 2019 2 次提交
  12. 22 5月, 2019 2 次提交
  13. 17 1月, 2019 2 次提交
    • T
      ext4: fix special inode number checks in __ext4_iget() · 5dc41af3
      Theodore Ts'o 提交于
      commit 191ce17876c9367819c4b0a25b503c0f6d9054d8 upstream.
      
      The check for special (reserved) inode number checks in __ext4_iget()
      was broken by commit 8a363970d1dc: ("ext4: avoid declaring fs
      inconsistent due to invalid file handles").  This was caused by a
      botched reversal of the sense of the flag now known as
      EXT4_IGET_SPECIAL (when it was previously named EXT4_IGET_NORMAL).
      Fix the logic appropriately.
      
      Fixes: 8a363970d1dc ("ext4: avoid declaring fs inconsistent...")
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5dc41af3
    • T
      ext4: make sure enough credits are reserved for dioread_nolock writes · 7c2ea25e
      Theodore Ts'o 提交于
      commit 812c0cab2c0dfad977605dbadf9148490ca5d93f upstream.
      
      There are enough credits reserved for most dioread_nolock writes;
      however, if the extent tree is sufficiently deep, and/or quota is
      enabled, the code was not allowing for all eventualities when
      reserving journal credits for the unwritten extent conversion.
      
      This problem can be seen using xfstests ext4/034:
      
         WARNING: CPU: 1 PID: 257 at fs/ext4/ext4_jbd2.c:271 __ext4_handle_dirty_metadata+0x10c/0x180
         Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work
         RIP: 0010:__ext4_handle_dirty_metadata+0x10c/0x180
         	...
         EXT4-fs: ext4_free_blocks:4938: aborting transaction: error 28 in __ext4_handle_dirty_metadata
         EXT4: jbd2_journal_dirty_metadata failed: handle type 11 started at line 4921, credits 4/0, errcode -28
         EXT4-fs error (device dm-1) in ext4_free_blocks:4950: error 28
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c2ea25e
  14. 10 1月, 2019 2 次提交
    • T
      ext4: check for shutdown and r/o file system in ext4_write_inode() · 0cb4f655
      Theodore Ts'o 提交于
      commit 18f2c4fcebf2582f96cbd5f2238f4f354a0e4847 upstream.
      
      If the file system has been shut down or is read-only, then
      ext4_write_inode() needs to bail out early.
      
      Also use jbd2_complete_transaction() instead of ext4_force_commit() so
      we only force a commit if it is needed.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cb4f655
    • T
      ext4: avoid declaring fs inconsistent due to invalid file handles · 26366388
      Theodore Ts'o 提交于
      commit 8a363970d1dc38c4ec4ad575c862f776f468d057 upstream.
      
      If we receive a file handle, either from NFS or open_by_handle_at(2),
      and it points at an inode which has not been initialized, and the file
      system has metadata checksums enabled, we shouldn't try to get the
      inode, discover the checksum is invalid, and then declare the file
      system as being inconsistent.
      
      This can be reproduced by creating a test file system via "mke2fs -t
      ext4 -O metadata_csum /tmp/foo.img 8M", mounting it, cd'ing into that
      directory, and then running the following program.
      
      #define _GNU_SOURCE
      #include <fcntl.h>
      
      struct handle {
      	struct file_handle fh;
      	unsigned char fid[MAX_HANDLE_SZ];
      };
      
      int main(int argc, char **argv)
      {
      	struct handle h = {{8, 1 }, { 12, }};
      
      	open_by_handle_at(AT_FDCWD, &h.fh, O_RDONLY);
      	return 0;
      }
      
      Google-Bug-Id: 120690101
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      26366388
  15. 21 11月, 2018 1 次提交
  16. 16 9月, 2018 2 次提交
    • T
      ext4, dax: set ext4_dax_aops for dax files · cce6c9f7
      Toshi Kani 提交于
      Sync syscall to DAX file needs to flush processor cache, but it
      currently does not flush to existing DAX files.  This is because
      'ext4_da_aops' is set to address_space_operations of existing DAX
      files, instead of 'ext4_dax_aops', since S_DAX flag is set after
      ext4_set_aops() in the open path.
      
        New file
        --------
        lookup_open
          ext4_create
            __ext4_new_inode
              ext4_set_inode_flags   // Set S_DAX flag
            ext4_set_aops            // Set aops to ext4_dax_aops
      
        Existing file
        -------------
        lookup_open
          ext4_lookup
            ext4_iget
              ext4_set_aops          // Set aops to ext4_da_aops
              ext4_set_inode_flags   // Set S_DAX flag
      
      Change ext4_iget() to initialize i_flags before ext4_set_aops().
      
      Fixes: 5f0663bb ("ext4, dax: introduce ext4_dax_aops")
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Suggested-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      cce6c9f7
    • T
      ext4, dax: add ext4_bmap to ext4_dax_aops · 94dbb631
      Toshi Kani 提交于
      Ext4 mount path calls .bmap to the journal inode. This currently
      works for the DAX mount case because ext4_iget() always set
      'ext4_da_aops' to any regular files.
      
      In preparation to fix ext4_iget() to set 'ext4_dax_aops' for ext4
      DAX files, add ext4_bmap() to 'ext4_dax_aops', since bmap works for
      DAX inodes.
      
      Fixes: 5f0663bb ("ext4, dax: introduce ext4_dax_aops")
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Suggested-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      94dbb631