1. 08 9月, 2020 1 次提交
    • J
      fs: Don't invalidate page buffers in block_write_full_page() · 6dbf7bb5
      Jan Kara 提交于
      If block_write_full_page() is called for a page that is beyond current
      inode size, it will truncate page buffers for the page and return 0.
      This logic has been added in 2.5.62 in commit 81eb69062588 ("fix ext3
      BUG due to race with truncate") in history.git tree to fix a problem
      with ext3 in data=ordered mode. This particular problem doesn't exist
      anymore because ext3 is long gone and ext4 handles ordered data
      differently. Also normally buffers are invalidated by truncate code and
      there's no need to specially handle this in ->writepage() code.
      
      This invalidation of page buffers in block_write_full_page() is causing
      issues to filesystems (e.g. ext4 or ocfs2) when block device is shrunk
      under filesystem's hands and metadata buffers get discarded while being
      tracked by the journalling layer. Although it is obviously "not
      supported" it can cause kernel crashes like:
      
      [ 7986.689400] BUG: unable to handle kernel NULL pointer dereference at
      +0000000000000008
      [ 7986.697197] PGD 0 P4D 0
      [ 7986.699724] Oops: 0002 [#1] SMP PTI
      [ 7986.703200] CPU: 4 PID: 203778 Comm: jbd2/dm-3-8 Kdump: loaded Tainted: G
      +O     --------- -  - 4.18.0-147.5.0.5.h126.eulerosv2r9.x86_64 #1
      [ 7986.716438] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 1.57 08/11/2015
      [ 7986.723462] RIP: 0010:jbd2_journal_grab_journal_head+0x1b/0x40 [jbd2]
      ...
      [ 7986.810150] Call Trace:
      [ 7986.812595]  __jbd2_journal_insert_checkpoint+0x23/0x70 [jbd2]
      [ 7986.818408]  jbd2_journal_commit_transaction+0x155f/0x1b60 [jbd2]
      [ 7986.836467]  kjournald2+0xbd/0x270 [jbd2]
      
      which is not great. The crash happens because bh->b_private is suddently
      NULL although BH_JBD flag is still set (this is because
      block_invalidatepage() cleared BH_Mapped flag and subsequent bh lookup
      found buffer without BH_Mapped set, called init_page_buffers() which has
      rewritten bh->b_private). So just remove the invalidation in
      block_write_full_page().
      
      Note that the buffer cache invalidation when block device changes size
      is already careful to avoid similar problems by using
      invalidate_mapping_pages() which skips busy buffers so it was only this
      odd block_write_full_page() behavior that could tear down bdev buffers
      under filesystem's hands.
      Reported-by: NYe Bin <yebin10@huawei.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      CC: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6dbf7bb5
  2. 24 8月, 2020 1 次提交
  3. 08 8月, 2020 1 次提交
    • X
      fs: prevent BUG_ON in submit_bh_wbc() · 377254b2
      Xianting Tian 提交于
      If a device is hot-removed --- for example, when a physical device is
      unplugged from pcie slot or a nbd device's network is shutdown ---
      this can result in a BUG_ON() crash in submit_bh_wbc().  This is
      because the when the block device dies, the buffer heads will have
      their Buffer_Mapped flag get cleared, leading to the crash in
      submit_bh_wbc.
      
      We had attempted to work around this problem in commit a17712c8
      ("ext4: check superblock mapped prior to committing").  Unfortunately,
      it's still possible to hit the BUG_ON(!buffer_mapped(bh)) if the
      device dies between when the work-around check in ext4_commit_super()
      and when submit_bh_wbh() is finally called:
      
      Code path:
      ext4_commit_super
          judge if 'buffer_mapped(sbh)' is false, return <== commit a17712c8
                lock_buffer(sbh)
                ...
                unlock_buffer(sbh)
                     __sync_dirty_buffer(sbh,...
                          lock_buffer(sbh)
                              judge if 'buffer_mapped(sbh))' is false, return <== added by this patch
                                  submit_bh(...,sbh)
                                      submit_bh_wbc(...,sbh,...)
      
      [100722.966497] kernel BUG at fs/buffer.c:3095! <== BUG_ON(!buffer_mapped(bh))' in submit_bh_wbc()
      [100722.966503] invalid opcode: 0000 [#1] SMP
      [100722.966566] task: ffff8817e15a9e40 task.stack: ffffc90024744000
      [100722.966574] RIP: 0010:submit_bh_wbc+0x180/0x190
      [100722.966575] RSP: 0018:ffffc90024747a90 EFLAGS: 00010246
      [100722.966576] RAX: 0000000000620005 RBX: ffff8818a80603a8 RCX: 0000000000000000
      [100722.966576] RDX: ffff8818a80603a8 RSI: 0000000000020800 RDI: 0000000000000001
      [100722.966577] RBP: ffffc90024747ac0 R08: 0000000000000000 R09: ffff88207f94170d
      [100722.966578] R10: 00000000000437c8 R11: 0000000000000001 R12: 0000000000020800
      [100722.966578] R13: 0000000000000001 R14: 000000000bf9a438 R15: ffff88195f333000
      [100722.966580] FS:  00007fa2eee27700(0000) GS:ffff88203d840000(0000) knlGS:0000000000000000
      [100722.966580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [100722.966581] CR2: 0000000000f0b008 CR3: 000000201a622003 CR4: 00000000007606e0
      [100722.966582] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [100722.966583] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [100722.966583] PKRU: 55555554
      [100722.966583] Call Trace:
      [100722.966588]  __sync_dirty_buffer+0x6e/0xd0
      [100722.966614]  ext4_commit_super+0x1d8/0x290 [ext4]
      [100722.966626]  __ext4_std_error+0x78/0x100 [ext4]
      [100722.966635]  ? __ext4_journal_get_write_access+0xca/0x120 [ext4]
      [100722.966646]  ext4_reserve_inode_write+0x58/0xb0 [ext4]
      [100722.966655]  ? ext4_dirty_inode+0x48/0x70 [ext4]
      [100722.966663]  ext4_mark_inode_dirty+0x53/0x1e0 [ext4]
      [100722.966671]  ? __ext4_journal_start_sb+0x6d/0xf0 [ext4]
      [100722.966679]  ext4_dirty_inode+0x48/0x70 [ext4]
      [100722.966682]  __mark_inode_dirty+0x17f/0x350
      [100722.966686]  generic_update_time+0x87/0xd0
      [100722.966687]  touch_atime+0xa9/0xd0
      [100722.966690]  generic_file_read_iter+0xa09/0xcd0
      [100722.966694]  ? page_cache_tree_insert+0xb0/0xb0
      [100722.966704]  ext4_file_read_iter+0x4a/0x100 [ext4]
      [100722.966707]  ? __inode_security_revalidate+0x4f/0x60
      [100722.966709]  __vfs_read+0xec/0x160
      [100722.966711]  vfs_read+0x8c/0x130
      [100722.966712]  SyS_pread64+0x87/0xb0
      [100722.966716]  do_syscall_64+0x67/0x1b0
      [100722.966719]  entry_SYSCALL64_slow_path+0x25/0x25
      
      To address this, add the check of 'buffer_mapped(bh)' to
      __sync_dirty_buffer().  This also has the benefit of fixing this for
      other file systems.
      
      With this addition, we can drop the workaround in ext4_commit_supper().
      
      [ Commit description rewritten by tytso. ]
      Signed-off-by: NXianting Tian <xianting_tian@126.com>
      Link: https://lore.kernel.org/r/1596211825-8750-1-git-send-email-xianting_tian@126.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      377254b2
  4. 09 7月, 2020 1 次提交
  5. 01 7月, 2020 1 次提交
  6. 03 6月, 2020 2 次提交
  7. 18 4月, 2020 1 次提交
  8. 16 4月, 2020 1 次提交
    • R
      ext4: use non-movable memory for superblock readahead · d87f6392
      Roman Gushchin 提交于
      Since commit a8ac900b ("ext4: use non-movable memory for the
      superblock") buffers for ext4 superblock were allocated using
      the sb_bread_unmovable() helper which allocated buffer heads
      out of non-movable memory blocks. It was necessarily to not block
      page migrations and do not cause cma allocation failures.
      
      However commit 85c8f176 ("ext4: preload block group descriptors")
      broke this by introducing pre-reading of the ext4 superblock.
      The problem is that __breadahead() is using __getblk() underneath,
      which allocates buffer heads out of movable memory.
      
      It resulted in page migration failures I've seen on a machine
      with an ext4 partition and a preallocated cma area.
      
      Fix this by introducing sb_breadahead_unmovable() and
      __breadahead_gfp() helpers which use non-movable memory for buffer
      head allocations and use them for the ext4 superblock readahead.
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Fixes: 85c8f176 ("ext4: preload block group descriptors")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Link: https://lore.kernel.org/r/20200229001411.128010-1-guro@fb.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      d87f6392
  9. 28 3月, 2020 1 次提交
  10. 25 3月, 2020 1 次提交
  11. 25 1月, 2020 1 次提交
  12. 09 1月, 2020 1 次提交
    • M
      fs: move guard_bio_eod() after bio_set_op_attrs · 83c9c547
      Ming Lei 提交于
      Commit 85a8ce62 ("block: add bio_truncate to fix guard_bio_eod")
      adds bio_truncate() for handling bio EOD. However, bio_truncate()
      doesn't use the passed 'op' parameter from guard_bio_eod's callers.
      
      So bio_trunacate() may retrieve wrong 'op', and zering pages may
      not be done for READ bio.
      
      Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs()
      in submit_bh_wbc() so that bio_truncate() can always retrieve correct
      op info.
      
      Meantime remove the 'op' parameter from guard_bio_eod() because it isn't
      used any more.
      
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: linux-fsdevel@vger.kernel.org
      Fixes: 85a8ce62 ("block: add bio_truncate to fix guard_bio_eod")
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      
      Fold in kerneldoc and bio_op() change.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      83c9c547
  13. 29 12月, 2019 1 次提交
    • M
      block: add bio_truncate to fix guard_bio_eod · 85a8ce62
      Ming Lei 提交于
      Some filesystem, such as vfat, may send bio which crosses device boundary,
      and the worse thing is that the IO request starting within device boundaries
      can contain more than one segment past EOD.
      
      Commit dce30ca9 ("fs: fix guard_bio_eod to check for real EOD errors")
      tries to fix this issue by returning -EIO for this situation. However,
      this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb()
      may hang for ever.
      
      Also the current truncating on last segment is dangerous by updating the
      last bvec, given bvec table becomes not immutable any more, and fs bio
      users may not retrieve the truncated pages via bio_for_each_segment_all() in
      its .end_io callback.
      
      Fixes this issue by supporting multi-segment truncating. And the
      approach is simpler:
      
      - just update bio size since block layer can make correct bvec with
      the updated bio size. Then bvec table becomes really immutable.
      
      - zero all truncated segments for read bio
      
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: linux-fsdevel@vger.kernel.org
      Fixed-by: dce30ca9 ("fs: fix guard_bio_eod to check for real EOD errors")
      Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      85a8ce62
  14. 01 12月, 2019 2 次提交
  15. 15 11月, 2019 1 次提交
    • E
      fs/buffer.c: support fscrypt in block_read_full_page() · 31fb992c
      Eric Biggers 提交于
      After each filesystem block (as represented by a buffer_head) has been
      read from disk by block_read_full_page(), decrypt it if needed.  The
      decryption is done on the fscrypt_read_workqueue.
      
      This is the final change needed to support ext4 encryption with
      blocksize != PAGE_SIZE, and it's a fairly small change now that
      CONFIG_FS_ENCRYPTION is a bool and fs/crypto/ exposes functions to
      decrypt individual blocks and to enqueue work on the fscrypt workqueue.
      
      Don't try to add fs-verity support yet, as the fs/verity/ support layer
      isn't ready for sub-page blocks yet.  Just add fscrypt support for now.
      
      Almost all the new code is compiled away when CONFIG_FS_ENCRYPTION=n.
      
      Cc: Chandan Rajendra <chandan@linux.ibm.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20191023033312.361355-2-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      31fb992c
  16. 10 7月, 2019 1 次提交
  17. 28 6月, 2019 1 次提交
  18. 21 5月, 2019 1 次提交
  19. 01 5月, 2019 2 次提交
  20. 01 3月, 2019 1 次提交
    • C
      fs: fix guard_bio_eod to check for real EOD errors · dce30ca9
      Carlos Maiolino 提交于
      guard_bio_eod() can truncate a segment in bio to allow it to do IO on
      odd last sectors of a device.
      
      It already checks if the IO starts past EOD, but it does not consider
      the possibility of an IO request starting within device boundaries can
      contain more than one segment past EOD.
      
      In such cases, truncated_bytes can be bigger than PAGE_SIZE, and will
      underflow bvec->bv_len.
      
      Fix this by checking if truncated_bytes is lower than PAGE_SIZE.
      
      This situation has been found on filesystems such as isofs and vfat,
      which doesn't check the device size before mount, if the device is
      smaller than the filesystem itself, a readahead on such filesystem,
      which spans EOD, can trigger this situation, leading a call to
      zero_user() with a wrong size possibly corrupting memory.
      
      I didn't see any crash, or didn't let the system run long enough to
      check if memory corruption will be hit somewhere, but adding
      instrumentation to guard_bio_end() to check truncated_bytes size, was
      enough to see the error.
      
      The following script can trigger the error.
      
      MNT=/mnt
      IMG=./DISK.img
      DEV=/dev/loop0
      
      mkfs.vfat $IMG
      mount $IMG $MNT
      cp -R /etc $MNT &> /dev/null
      umount $MNT
      
      losetup -D
      
      losetup --find --show --sizelimit 16247280 $IMG
      mount $DEV $MNT
      
      find $MNT -type f -exec cat {} + >/dev/null
      
      Kudos to Eric Sandeen for coming up with the reproducer above
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dce30ca9
  21. 15 2月, 2019 1 次提交
  22. 07 2月, 2019 1 次提交
    • T
      fs: ratelimit __find_get_block_slow() failure message. · 43636c80
      Tetsuo Handa 提交于
      When something let __find_get_block_slow() hit all_mapped path, it calls
      printk() for 100+ times per a second. But there is no need to print same
      message with such high frequency; it is just asking for stall warning, or
      at least bloating log files.
      
        [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
        [  399.873324][T15342] b_state=0x00000029, b_size=512
        [  399.878403][T15342] device loop0 blocksize: 4096
        [  399.883296][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
        [  399.890400][T15342] b_state=0x00000029, b_size=512
        [  399.895595][T15342] device loop0 blocksize: 4096
        [  399.900556][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8
        [  399.907471][T15342] b_state=0x00000029, b_size=512
        [  399.912506][T15342] device loop0 blocksize: 4096
      
      This patch reduces frequency to up to once per a second, in addition to
      concatenating three lines into one.
      
        [  399.866302][T15342] __find_get_block_slow() failed. block=1, b_blocknr=8, b_state=0x00000029, b_size=512, device loop0 blocksize: 4096
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      43636c80
  23. 05 1月, 2019 1 次提交
  24. 08 12月, 2018 1 次提交
  25. 02 11月, 2018 1 次提交
  26. 21 10月, 2018 1 次提交
  27. 22 9月, 2018 1 次提交
  28. 30 8月, 2018 1 次提交
  29. 18 8月, 2018 1 次提交
  30. 20 6月, 2018 2 次提交
  31. 02 6月, 2018 1 次提交
  32. 12 4月, 2018 2 次提交
  33. 11 4月, 2018 1 次提交
  34. 06 4月, 2018 1 次提交
  35. 19 3月, 2018 1 次提交
    • M
      buffer.c: call thaw_super during emergency thaw · 08fdc8a0
      Mateusz Guzik 提交于
      There are 2 distinct freezing mechanisms - one operates on block
      devices and another one directly on super blocks. Both end up with the
      same result, but thaw of only one of these does not thaw the other.
      
      In particular fsfreeze --freeze uses the ioctl variant going to the
      super block. Since prior to this patch emergency thaw was not doing
      a relevant thaw, filesystems frozen with this method remained
      unaffected.
      
      The patch is a hack which adds blind unfreezing.
      
      In order to keep the super block write-locked the whole time the code
      is shuffled around and the newly introduced __iterate_supers is
      employed.
      Signed-off-by: NMateusz Guzik <mguzik@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      08fdc8a0