1. 26 7月, 2022 1 次提交
  2. 22 6月, 2022 1 次提交
  3. 23 5月, 2022 1 次提交
    • Y
      ext4: Fix warning in ext4_da_release_space · 782a6ba7
      Ye Bin 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I58KLD
      CVE: NA
      
      ---------------------------
      
      We got issue as follows:
      WARNING: CPU: 2 PID: 1936 at fs/ext4/inode.c:1511 ext4_da_release_space+0x1b9/0x266
      Modules linked in:
      CPU: 2 PID: 1936 Comm: dd Not tainted 5.10.0+ #344
      RIP: 0010:ext4_da_release_space+0x1b9/0x266
      RSP: 0018:ffff888127307848 EFLAGS: 00010292
      RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff843f67cc
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffed1024e60ed9
      RBP: ffff888124dc8140 R08: 0000000000000083 R09: ffffed1075da6d23
      R10: ffff8883aed36917 R11: ffffed1075da6d22 R12: ffff888124dc83f0
      R13: ffff888124dc844c R14: ffff888124dc8168 R15: 000000000000000c
      FS:  00007f6b7247d740(0000) GS:ffff8883aed00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffc1a0b7dd8 CR3: 00000001065ce000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       ext4_es_remove_extent+0x187/0x230
       mpage_release_unused_pages+0x3af/0x470
       ext4_writepages+0xb9b/0x1160
       do_writepages+0xbb/0x1e0
       __filemap_fdatawrite_range+0x1b1/0x1f0
       file_write_and_wait_range+0x80/0xe0
       ext4_sync_file+0x13d/0x800
       vfs_fsync_range+0x75/0x140
       do_fsync+0x4d/0x90
       __x64_sys_fsync+0x1d/0x30
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Above issue may happens as follows:
      	process1                        process2
      ext4_da_write_begin
        ext4_da_reserve_space
          ext4_es_insert_delayed_block[1/1]
                                          ext4_da_write_begin
      				      ext4_es_insert_delayed_block[0/1]
      ext4_writepages
        ****Delayed block allocation failed****
        mpage_release_unused_pages
          ext4_es_remove_extent[1/1]
            ext4_da_release_space [reserved 0]
      
      ext4_da_write_begin
        ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk)
         ->As there exist [0, 1] extent, so will return true
                                         ext4_writepages
      				   ****Delayed block allocation failed****
                                           mpage_release_unused_pages
      				       ext4_es_remove_extent[0/1]
      				         ext4_da_release_space [reserved 1]
      					   ei->i_reserved_data_blocks [1->0]
      
        ext4_es_insert_delayed_block[1/1]
      
      ext4_writepages
        ****Delayed block allocation failed****
        mpage_release_unused_pages
        ext4_es_remove_extent[1/1]
         ext4_da_release_space [reserved 1]
          ei->i_reserved_data_blocks[0, -1]
          ->As ei->i_reserved_data_blocks already is zero but to_free is 1,
          will trigger warning.
      
      To solve above issue, introduce i_clu_lock to protect insert delayed
      block and remove block under cluster delay allocate mode.
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      782a6ba7
  4. 17 5月, 2022 1 次提交
  5. 10 5月, 2022 1 次提交
    • Y
      ext4: fix use-after-free in ext4_search_dir · 9668ea4d
      Ye Bin 提交于
      mainline inclusion
      from mainline-v5.18-rc4
      commit c186f088
      category: bugfix
      bugzilla: 186477, https://gitee.com/openeuler/kernel/issues/I55UHT
      CVE: NA
      
      -------------------------------------------------
      
      We got issue as follows:
      EXT4-fs (loop0): mounted filesystem without journal. Opts: ,errors=continue
      
      ==================================================================
      BUG: KASAN: use-after-free in ext4_search_dir fs/ext4/namei.c:1394 [inline]
      BUG: KASAN: use-after-free in search_dirblock fs/ext4/namei.c:1199 [inline]
      BUG: KASAN: use-after-free in __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553
      Read of size 1 at addr ffff8881317c3005 by task syz-executor117/2331
      
      CPU: 1 PID: 2331 Comm: syz-executor117 Not tainted 5.10.0+ #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:83 [inline]
       dump_stack+0x144/0x187 lib/dump_stack.c:124
       print_address_description+0x7d/0x630 mm/kasan/report.c:387
       __kasan_report+0x132/0x190 mm/kasan/report.c:547
       kasan_report+0x47/0x60 mm/kasan/report.c:564
       ext4_search_dir fs/ext4/namei.c:1394 [inline]
       search_dirblock fs/ext4/namei.c:1199 [inline]
       __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553
       ext4_lookup_entry fs/ext4/namei.c:1622 [inline]
       ext4_lookup+0xb8/0x3a0 fs/ext4/namei.c:1690
       __lookup_hash+0xc5/0x190 fs/namei.c:1451
       do_rmdir+0x19e/0x310 fs/namei.c:3760
       do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x445e59
      Code: 4d c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fff2277fac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054
      RAX: ffffffffffffffda RBX: 0000000000400280 RCX: 0000000000445e59
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200000c0
      RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000002
      R10: 00007fff2277f990 R11: 0000000000000246 R12: 0000000000000000
      R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000
      
      The buggy address belongs to the page:
      page:0000000048cd3304 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1317c3
      flags: 0x200000000000000()
      raw: 0200000000000000 ffffea0004526588 ffffea0004528088 0000000000000000
      raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881317c2f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8881317c2f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff8881317c3000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                         ^
       ffff8881317c3080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8881317c3100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      ==================================================================
      
      ext4_search_dir:
        ...
        de = (struct ext4_dir_entry_2 *)search_buf;
        dlimit = search_buf + buf_size;
        while ((char *) de < dlimit) {
        ...
          if ((char *) de + de->name_len <= dlimit &&
      	 ext4_match(dir, fname, de)) {
      	    ...
          }
        ...
          de_len = ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize);
          if (de_len <= 0)
            return -1;
          offset += de_len;
          de = (struct ext4_dir_entry_2 *) ((char *) de + de_len);
        }
      
      Assume:
      de=0xffff8881317c2fff
      dlimit=0x0xffff8881317c3000
      
      If read 'de->name_len' which address is 0xffff8881317c3005, obviously is
      out of range, then will trigger use-after-free.
      To solve this issue, 'dlimit' must reserve 8 bytes, as we will read
      'de->name_len' to judge if '(char *) de + de->name_len' out of range.
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220324064816.1209985-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NChenXiaoSong <chenxiaosong2@huawei.com>
      Reviewed-by: Nyebin <yebin10@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      9668ea4d
  6. 27 4月, 2022 1 次提交
  7. 26 1月, 2022 1 次提交
  8. 15 11月, 2021 1 次提交
  9. 06 7月, 2021 1 次提交
  10. 22 4月, 2021 1 次提交
    • E
      ext4: shrink race window in ext4_should_retry_alloc() · 560f326e
      Eric Whitney 提交于
      stable inclusion
      from stable-5.10.28
      commit 4b3139576a20e27fccb9a103ca5503b02e1ac655
      bugzilla: 51779
      
      --------------------------------
      
      [ Upstream commit efc61345 ]
      
      When generic/371 is run on kvm-xfstests using 5.10 and 5.11 kernels, it
      fails at significant rates on the two test scenarios that disable
      delayed allocation (ext3conv and data_journal) and force actual block
      allocation for the fallocate and pwrite functions in the test.  The
      failure rate on 5.10 for both ext3conv and data_journal on one test
      system typically runs about 85%.  On 5.11, the failure rate on ext3conv
      sometimes drops to as low as 1% while the rate on data_journal
      increases to nearly 100%.
      
      The observed failures are largely due to ext4_should_retry_alloc()
      cutting off block allocation retries when s_mb_free_pending (used to
      indicate that a transaction in progress will free blocks) is 0.
      However, free space is usually available when this occurs during runs
      of generic/371.  It appears that a thread attempting to allocate
      blocks is just missing transaction commits in other threads that
      increase the free cluster count and reset s_mb_free_pending while
      the allocating thread isn't running.  Explicitly testing for free space
      availability avoids this race.
      
      The current code uses a post-increment operator in the conditional
      expression that determines whether the retry limit has been exceeded.
      This means that the conditional expression uses the value of the
      retry counter before it's increased, resulting in an extra retry cycle.
      The current code actually retries twice before hitting its retry limit
      rather than once.
      
      Increasing the retry limit to 3 from the current actual maximum retry
      count of 2 in combination with the change described above reduces the
      observed failure rate to less that 0.1% on both ext3conv and
      data_journal with what should be limited impact on users sensitive to
      the overhead caused by retries.
      
      A per filesystem percpu counter exported via sysfs is added to allow
      users or developers to track the number of times the retry limit is
      exceeded without resorting to debugging methods.  This should provide
      some insight into worst case retry behavior.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Link: https://lore.kernel.org/r/20210218151132.19678-1-enwlinux@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      560f326e
  11. 13 4月, 2021 1 次提交
    • H
      ext4: fix rename whiteout with fast commit · 4f957e17
      Harshad Shirwadkar 提交于
      stable inclusion
      from stable-5.10.26
      commit 35ecf664fd6c14b679586bd5a7ccc8a725b043aa
      bugzilla: 51363
      
      --------------------------------
      
      commit 8210bb29 upstream.
      
      This patch adds rename whiteout support in fast commits. Note that the
      whiteout object that gets created is actually char device. Which
      imples, the function ext4_inode_journal_mode(struct inode *inode)
      would return "JOURNAL_DATA" for this inode. This has a consequence in
      fast commit code that it will make creation of the whiteout object a
      fast-commit ineligible behavior and thus will fall back to full
      commits. With this patch, this can be observed by running fast commits
      with rename whiteout and seeing the stats generated by ext4_fc_stats
      tracepoint as follows:
      
      ext4_fc_stats: dev 254:32 fc ineligible reasons:
      XATTR:0, CROSS_RENAME:0, JOURNAL_FLAG_CHANGE:0, NO_MEM:0, SWAP_BOOT:0,
      RESIZE:0, RENAME_DIR:0, FALLOC_RANGE:0, INODE_JOURNAL_DATA:16;
      num_commits:6, ineligible: 6, numblks: 3
      
      So in short, this patch guarantees that in case of rename whiteout, we
      fall back to full commits.
      
      Amir mentioned that instead of creating a new whiteout object for
      every rename, we can create a static whiteout object with irrelevant
      nlink. That will make fast commits to not fall back to full
      commit. But until this happens, this patch will ensure correctness by
      falling back to full commits.
      
      Fixes: 8016e29f ("ext4: fast commit recovery path")
      Cc: stable@kernel.org
      Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
      Link: https://lore.kernel.org/r/20210316221921.1124955-1-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: N  Weilong Chen <chenweilong@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      4f957e17
  12. 09 4月, 2021 2 次提交
  13. 20 11月, 2020 1 次提交
  14. 12 11月, 2020 1 次提交
  15. 07 11月, 2020 4 次提交
  16. 29 10月, 2020 2 次提交
  17. 22 10月, 2020 5 次提交
  18. 18 10月, 2020 9 次提交
  19. 22 9月, 2020 1 次提交
    • E
      fscrypt: handle test_dummy_encryption in more logical way · ac4acb1f
      Eric Biggers 提交于
      The behavior of the test_dummy_encryption mount option is that when a
      new file (or directory or symlink) is created in an unencrypted
      directory, it's automatically encrypted using a dummy encryption policy.
      That's it; in particular, the encryption (or lack thereof) of existing
      files (or directories or symlinks) doesn't change.
      
      Unfortunately the implementation of test_dummy_encryption is a bit weird
      and confusing.  When test_dummy_encryption is enabled and a file is
      being created in an unencrypted directory, we set up an encryption key
      (->i_crypt_info) for the directory.  This isn't actually used to do any
      encryption, however, since the directory is still unencrypted!  Instead,
      ->i_crypt_info is only used for inheriting the encryption policy.
      
      One consequence of this is that the filesystem ends up providing a
      "dummy context" (policy + nonce) instead of a "dummy policy".  In
      commit ed318a6c ("fscrypt: support test_dummy_encryption=v2"), I
      mistakenly thought this was required.  However, actually the nonce only
      ends up being used to derive a key that is never used.
      
      Another consequence of this implementation is that it allows for
      'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
      case that can be forgotten about.  For example, currently
      FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
      dummy encryption policy when the filesystem is mounted with
      test_dummy_encryption.  That seems like the wrong thing to do, since
      again, the directory itself is not actually encrypted.
      
      Therefore, switch to a more logical and maintainable implementation
      where the dummy encryption policy inheritance is done without setting up
      keys for unencrypted directories.  This involves:
      
      - Adding a function fscrypt_policy_to_inherit() which returns the
        encryption policy to inherit from a directory.  This can be a real
        policy, a dummy policy, or no policy.
      
      - Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
        with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.
      
      - Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
        of an inode.
      Acked-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Acked-by: NJeff Layton <jlayton@kernel.org>
      Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>
      ac4acb1f
  20. 20 8月, 2020 1 次提交
    • B
      ext4: limit the length of per-inode prealloc list · 27bc446e
      brookxu 提交于
      In the scenario of writing sparse files, the per-inode prealloc list may
      be very long, resulting in high overhead for ext4_mb_use_preallocated().
      To circumvent this problem, we limit the maximum length of per-inode
      prealloc list to 512 and allow users to modify it.
      
      After patching, we observed that the sys ratio of cpu has dropped, and
      the system throughput has increased significantly. We created a process
      to write the sparse file, and the running time of the process on the
      fixed kernel was significantly reduced, as follows:
      
      Running time on unfixed kernel:
      [root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
      real    0m2.051s
      user    0m0.008s
      sys     0m2.026s
      
      Running time on fixed kernel:
      [root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
      real    0m0.471s
      user    0m0.004s
      sys     0m0.395s
      Signed-off-by: NChunguang Xu <brookxu@tencent.com>
      Link: https://lore.kernel.org/r/d7a98178-056b-6db5-6bce-4ead23f4a257@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      27bc446e
  21. 19 8月, 2020 1 次提交
  22. 08 8月, 2020 2 次提交