1. 18 1月, 2020 2 次提交
    • C
      f2fs: fix to add swap extent correctly · 3e5e479a
      Chao Yu 提交于
      As Youling reported in mailing list:
      
      https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs-is-broken-4175666043/
      
      https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/
      
      There is a test case can corrupt f2fs image:
      - dd if=/dev/zero of=/swapfile bs=1M count=4096
      - chmod 600 /swapfile
      - mkswap /swapfile
      - swapon --discard /swapfile
      
      The root cause is f2fs_swap_activate() intends to return zero value
      to setup_swap_extents() to enable SWP_FS mode (swap file goes through
      fs), in this flow, setup_swap_extents() setups swap extent with wrong
      block address range, result in discard_swap() erasing incorrect address.
      
      Because f2fs_swap_activate() has pinned swapfile, its data block
      address will not change, it's safe to let swap to handle IO through
      raw device, so we can get rid of SWAP_FS mode and initial swap extents
      inside f2fs_swap_activate(), by this way, later discard_swap() can trim
      in right address range.
      
      Fixes: 4969c06a ("f2fs: support swap file w/ DIO")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3e5e479a
    • C
      f2fs: support data compression · 4c8ff709
      Chao Yu 提交于
      This patch tries to support compression in f2fs.
      
      - New term named cluster is defined as basic unit of compression, file can
      be divided into multiple clusters logically. One cluster includes 4 << n
      (n >= 0) logical pages, compression size is also cluster size, each of
      cluster can be compressed or not.
      
      - In cluster metadata layout, one special flag is used to indicate cluster
      is compressed one or normal one, for compressed cluster, following metadata
      maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
      data including compress header and compressed data.
      
      - In order to eliminate write amplification during overwrite, F2FS only
      support compression on write-once file, data can be compressed only when
      all logical blocks in file are valid and cluster compress ratio is lower
      than specified threshold.
      
      - To enable compression on regular inode, there are three ways:
      * chattr +c file
      * chattr +c dir; touch dir/file
      * mount w/ -o compress_extension=ext; touch file.ext
      
      Compress metadata layout:
                                   [Dnode Structure]
                   +-----------------------------------------------+
                   | cluster 1 | cluster 2 | ......... | cluster N |
                   +-----------------------------------------------+
                   .           .                       .           .
             .                       .                .                      .
        .         Compressed Cluster       .        .        Normal Cluster            .
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
      |compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 |
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
                 .                             .
               .                                           .
             .                                                           .
            +-------------+-------------+----------+----------------------------+
            | data length | data chksum | reserved |      compressed data       |
            +-------------+-------------+----------+----------------------------+
      
      Changelog:
      
      20190326:
      - fix error handling of read_end_io().
      - remove unneeded comments in f2fs_encrypt_one_page().
      
      20190327:
      - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
      - don't jump into loop directly to avoid uninitialized variables.
      - add TODO tag in error path of f2fs_write_cache_pages().
      
      20190328:
      - fix wrong merge condition in f2fs_read_multi_pages().
      - check compressed file in f2fs_post_read_required().
      
      20190401
      - allow overwrite on non-compressed cluster.
      - check cluster meta before writing compressed data.
      
      20190402
      - don't preallocate blocks for compressed file.
      
      - add lz4 compress algorithm
      - process multiple post read works in one workqueue
        Now f2fs supports processing post read work in multiple workqueue,
        it shows low performance due to schedule overhead of multiple
        workqueue executing orderly.
      
      20190921
      - compress: support buffered overwrite
      C: compress cluster flag
      V: valid block address
      N: NEW_ADDR
      
      One cluster contain 4 blocks
      
       before overwrite   after overwrite
      
      - VVVV		->	CVNN
      - CVNN		->	VVVV
      
      - CVNN		->	CVNN
      - CVNN		->	CVVV
      
      - CVVV		->	CVNN
      - CVVV		->	CVVV
      
      20191029
      - add kconfig F2FS_FS_COMPRESSION to isolate compression related
      codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
      note that: will remove lzo backend if Jaegeuk agreed that too.
      - update codes according to Eric's comments.
      
      20191101
      - apply fixes from Jaegeuk
      
      20191113
      - apply fixes from Jaegeuk
      - split workqueue for fsverity
      
      20191216
      - apply fixes from Jaegeuk
      
      20200117
      - fix to avoid NULL pointer dereference
      
      [Jaegeuk Kim]
      - add tracepoint for f2fs_{,de}compress_pages()
      - fix many bugs and add some compression stats
      - fix overwrite/mmap bugs
      - address 32bit build error, reported by Geert.
      - bug fixes when handling errors and i_compressed_blocks
      
      Reported-by: <noreply@ellerman.id.au>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4c8ff709
  2. 16 1月, 2020 1 次提交
    • C
      f2fs: introduce private bioset · f543805f
      Chao Yu 提交于
      In low memory scenario, we can allocate multiple bios without
      submitting any of them.
      
      - f2fs_write_checkpoint()
       - block_operations()
        - f2fs_sync_node_pages()
         step 1) flush cold nodes, allocate new bio from mempool
         - bio_alloc()
          - mempool_alloc()
         step 2) flush hot nodes, allocate a bio from mempool
         - bio_alloc()
          - mempool_alloc()
         step 3) flush warm nodes, be stuck in below call path
         - bio_alloc()
          - mempool_alloc()
           - loop to wait mempool element release, as we only
             reserved memory for two bio allocation, however above
             allocated two bios may never be submitted.
      
      So we need avoid using default bioset, in this patch we introduce a
      private bioset, in where we enlarg mempool element count to total
      number of log header, so that we can make sure we have enough
      backuped memory pool in scenario of allocating/holding multiple
      bios.
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f543805f
  3. 13 12月, 2019 1 次提交
  4. 10 12月, 2019 1 次提交
  5. 20 11月, 2019 1 次提交
  6. 08 11月, 2019 1 次提交
    • C
      f2fs: fix potential overflow · 1f0d5c91
      Chao Yu 提交于
      We expect 64-bit calculation result from below statement, however
      in 32-bit machine, looped left shift operation on pgoff_t type
      variable may cause overflow issue, fix it by forcing type cast.
      
      page->index << PAGE_SHIFT;
      
      Fixes: 26de9b11 ("f2fs: avoid unnecessary updating inode during fsync")
      Fixes: 0a2aa8fb ("f2fs: refactor __exchange_data_block for speed up")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1f0d5c91
  7. 26 10月, 2019 1 次提交
    • C
      f2fs: cache global IPU bio · 0b20fcec
      Chao Yu 提交于
      In commit 8648de2c ("f2fs: add bio cache for IPU"), we added
      f2fs_submit_ipu_bio() in __write_data_page() as below:
      
      __write_data_page()
      
      	if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) {
      		f2fs_submit_ipu_bio(sbi, bio, page);
      		....
      	}
      
      in order to avoid below deadlock:
      
      Thread A				Thread B
      - __write_data_page (inode x, page y)
       - f2fs_do_write_data_page
        - set_page_writeback        ---- set writeback flag in page y
        - f2fs_inplace_write_data
       - f2fs_balance_fs
      					 - lock gc_mutex
       - lock gc_mutex
      					  - f2fs_gc
      					   - do_garbage_collect
      					    - gc_data_segment
      					     - move_data_page
      					      - f2fs_wait_on_page_writeback
      					       - wait_on_page_writeback  --- wait writeback of page y
      
      However, the bio submission breaks the merge of IPU IOs.
      
      So in this patch let's add a global bio cache for merged IPU pages,
      then f2fs_wait_on_page_writeback() is able to submit bio if a
      writebacked page is cached in global bio cache.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b20fcec
  8. 16 9月, 2019 3 次提交
  9. 07 9月, 2019 2 次提交
  10. 23 8月, 2019 3 次提交
  11. 17 8月, 2019 1 次提交
    • J
      f2fs: fix livelock in swapfile writes · 75a037f3
      Jaegeuk Kim 提交于
      This patch fixes livelock in the below call path when writing swap pages.
      
      [46374.617256] c2    701  __switch_to+0xe4/0x100
      [46374.617265] c2    701  __schedule+0x80c/0xbc4
      [46374.617273] c2    701  schedule+0x74/0x98
      [46374.617281] c2    701  rwsem_down_read_failed+0x190/0x234
      [46374.617291] c2    701  down_read+0x58/0x5c
      [46374.617300] c2    701  f2fs_map_blocks+0x138/0x9a8
      [46374.617310] c2    701  get_data_block_dio_write+0x74/0x104
      [46374.617320] c2    701  __blockdev_direct_IO+0x1350/0x3930
      [46374.617331] c2    701  f2fs_direct_IO+0x55c/0x8bc
      [46374.617341] c2    701  __swap_writepage+0x1d0/0x3e8
      [46374.617351] c2    701  swap_writepage+0x44/0x54
      [46374.617360] c2    701  shrink_page_list+0x140/0xe80
      [46374.617371] c2    701  shrink_inactive_list+0x510/0x918
      [46374.617381] c2    701  shrink_node_memcg+0x2d4/0x804
      [46374.617391] c2    701  shrink_node+0x10c/0x2f8
      [46374.617400] c2    701  do_try_to_free_pages+0x178/0x38c
      [46374.617410] c2    701  try_to_free_pages+0x348/0x4b8
      [46374.617419] c2    701  __alloc_pages_nodemask+0x7f8/0x1014
      [46374.617429] c2    701  pagecache_get_page+0x184/0x2cc
      [46374.617438] c2    701  f2fs_new_node_page+0x60/0x41c
      [46374.617449] c2    701  f2fs_new_inode_page+0x50/0x7c
      [46374.617460] c2    701  f2fs_init_inode_metadata+0x128/0x530
      [46374.617472] c2    701  f2fs_add_inline_entry+0x138/0xd64
      [46374.617480] c2    701  f2fs_do_add_link+0xf4/0x178
      [46374.617488] c2    701  f2fs_create+0x1e4/0x3ac
      [46374.617497] c2    701  path_openat+0xdc0/0x1308
      [46374.617507] c2    701  do_filp_open+0x78/0x124
      [46374.617516] c2    701  do_sys_open+0x134/0x248
      [46374.617525] c2    701  SyS_openat+0x14/0x20
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      75a037f3
  12. 13 8月, 2019 1 次提交
    • E
      f2fs: add fs-verity support · 95ae251f
      Eric Biggers 提交于
      Add fs-verity support to f2fs.  fs-verity is a filesystem feature that
      enables transparent integrity protection and authentication of read-only
      files.  It uses a dm-verity like mechanism at the file level: a Merkle
      tree is used to verify any block in the file in log(filesize) time.  It
      is implemented mainly by helper functions in fs/verity/.  See
      Documentation/filesystems/fsverity.rst for the full documentation.
      
      The f2fs support for fs-verity consists of:
      
      - Adding a filesystem feature flag and an inode flag for fs-verity.
      
      - Implementing the fsverity_operations to support enabling verity on an
        inode and reading/writing the verity metadata.
      
      - Updating ->readpages() to verify data as it's read from verity files
        and to support reading verity metadata pages.
      
      - Updating ->write_begin(), ->write_end(), and ->writepages() to support
        writing verity metadata pages.
      
      - Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
      
      Like ext4, f2fs stores the verity metadata (Merkle tree and
      fsverity_descriptor) past the end of the file, starting at the first 64K
      boundary beyond i_size.  This approach works because (a) verity files
      are readonly, and (b) pages fully beyond i_size aren't visible to
      userspace but can be read/written internally by f2fs with only some
      relatively small changes to f2fs.  Extended attributes cannot be used
      because (a) f2fs limits the total size of an inode's xattr entries to
      4096 bytes, which wouldn't be enough for even a single Merkle tree
      block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
      metadata *must* be encrypted when the file is because it contains hashes
      of the plaintext data.
      Acked-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Acked-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      95ae251f
  13. 19 7月, 2019 1 次提交
  14. 10 7月, 2019 1 次提交
  15. 03 7月, 2019 2 次提交
  16. 29 5月, 2019 2 次提交
    • E
      fscrypt: support encrypting multiple filesystem blocks per page · 53bc1d85
      Eric Biggers 提交于
      Rename fscrypt_encrypt_page() to fscrypt_encrypt_pagecache_blocks() and
      redefine its behavior to encrypt all filesystem blocks from the given
      region of the given page, rather than assuming that the region consists
      of just one filesystem block.  Also remove the 'inode' and 'lblk_num'
      parameters, since they can be retrieved from the page as it's already
      assumed to be a pagecache page.
      
      This is in preparation for allowing encryption on ext4 filesystems with
      blocksize != PAGE_SIZE.
      
      This is based on work by Chandan Rajendra.
      Reviewed-by: NChandan Rajendra <chandan@linux.ibm.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      53bc1d85
    • E
      fscrypt: simplify bounce page handling · d2d0727b
      Eric Biggers 提交于
      Currently, bounce page handling for writes to encrypted files is
      unnecessarily complicated.  A fscrypt_ctx is allocated along with each
      bounce page, page_private(bounce_page) points to this fscrypt_ctx, and
      fscrypt_ctx::w::control_page points to the original pagecache page.
      
      However, because writes don't use the fscrypt_ctx for anything else,
      there's no reason why page_private(bounce_page) can't just point to the
      original pagecache page directly.
      
      Therefore, this patch makes this change.  In the process, it also cleans
      up the API exposed to filesystems that allows testing whether a page is
      a bounce page, getting the pagecache page from a bounce page, and
      freeing a bounce page.
      Reviewed-by: NChandan Rajendra <chandan@linux.ibm.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      d2d0727b
  17. 23 5月, 2019 2 次提交
    • C
      f2fs: fix to avoid deadloop if data_flush is on · 040d2bb3
      Chao Yu 提交于
      As Hagbard Celine reported:
      
      [  615.697824] INFO: task kworker/u16:5:344 blocked for more than 120 seconds.
      [  615.697825]       Not tainted 5.0.15-gentoo-f2fslog #4
      [  615.697826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
      disables this message.
      [  615.697827] kworker/u16:5   D    0   344      2 0x80000000
      [  615.697831] Workqueue: writeback wb_workfn (flush-259:0)
      [  615.697832] Call Trace:
      [  615.697836]  ? __schedule+0x2c5/0x8b0
      [  615.697839]  schedule+0x32/0x80
      [  615.697841]  schedule_preempt_disabled+0x14/0x20
      [  615.697842]  __mutex_lock.isra.8+0x2ba/0x4d0
      [  615.697845]  ? log_store+0xf5/0x260
      [  615.697848]  f2fs_write_data_pages+0x133/0x320
      [  615.697851]  ? trace_hardirqs_on+0x2c/0xe0
      [  615.697854]  do_writepages+0x41/0xd0
      [  615.697857]  __filemap_fdatawrite_range+0x81/0xb0
      [  615.697859]  f2fs_sync_dirty_inodes+0x1dd/0x200
      [  615.697861]  f2fs_balance_fs_bg+0x2a7/0x2c0
      [  615.697863]  ? up_read+0x5/0x20
      [  615.697865]  ? f2fs_do_write_data_page+0x2cb/0x940
      [  615.697867]  f2fs_balance_fs+0xe5/0x2c0
      [  615.697869]  __write_data_page+0x1c8/0x6e0
      [  615.697873]  f2fs_write_cache_pages+0x1e0/0x450
      [  615.697878]  f2fs_write_data_pages+0x14b/0x320
      [  615.697880]  ? trace_hardirqs_on+0x2c/0xe0
      [  615.697883]  do_writepages+0x41/0xd0
      [  615.697885]  __filemap_fdatawrite_range+0x81/0xb0
      [  615.697887]  f2fs_sync_dirty_inodes+0x1dd/0x200
      [  615.697889]  f2fs_balance_fs_bg+0x2a7/0x2c0
      [  615.697891]  f2fs_write_node_pages+0x51/0x220
      [  615.697894]  do_writepages+0x41/0xd0
      [  615.697897]  __writeback_single_inode+0x3d/0x3d0
      [  615.697899]  writeback_sb_inodes+0x1e8/0x410
      [  615.697902]  __writeback_inodes_wb+0x5d/0xb0
      [  615.697904]  wb_writeback+0x28f/0x340
      [  615.697906]  ? cpumask_next+0x16/0x20
      [  615.697908]  wb_workfn+0x33e/0x420
      [  615.697911]  process_one_work+0x1a1/0x3d0
      [  615.697913]  worker_thread+0x30/0x380
      [  615.697915]  ? process_one_work+0x3d0/0x3d0
      [  615.697916]  kthread+0x116/0x130
      [  615.697918]  ? kthread_create_worker_on_cpu+0x70/0x70
      [  615.697921]  ret_from_fork+0x3a/0x50
      
      There is still deadloop in below condition:
      
      d A
      - do_writepages
       - f2fs_write_node_pages
        - f2fs_balance_fs_bg
         - f2fs_sync_dirty_inodes
          - f2fs_write_cache_pages
           - mutex_lock(&sbi->writepages)	-- lock once
           - __write_data_page
            - f2fs_balance_fs_bg
             - f2fs_sync_dirty_inodes
              - f2fs_write_data_pages
               - mutex_lock(&sbi->writepages)	-- lock again
      
      Thread A			Thread B
      - do_writepages
       - f2fs_write_node_pages
        - f2fs_balance_fs_bg
         - f2fs_sync_dirty_inodes
          - .cp_task = current
      				- f2fs_sync_dirty_inodes
      				 - .cp_task = current
      				 - filemap_fdatawrite
      				 - .cp_task = NULL
          - filemap_fdatawrite
           - f2fs_write_cache_pages
            - enter f2fs_balance_fs_bg since .cp_task is NULL
          - .cp_task = NULL
      
      Change as below to avoid this:
      - add condition to avoid holding .writepages mutex lock in path
      of data flush
      - introduce mutex lock sbi.flush_lock to exclude concurrent data
      flush in background.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      040d2bb3
    • C
      f2fs: add bio cache for IPU · 8648de2c
      Chao Yu 提交于
      SQLite in Wal mode may trigger sequential IPU write in db-wal file, after
      commit d1b3e72d ("f2fs: submit bio of in-place-update pages"), we
      lost the chance of merging page in inner managed bio cache, result in
      submitting more small-sized IO.
      
      So let's add temporary bio in writepages() to cache mergeable write IO as
      much as possible.
      
      Test case:
      1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
      2. xfs_io -f /mnt/f2fs/file -c "pwrite 0 65536" -c "fsync"
      
      Before:
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65552, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65560, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65568, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65576, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65584, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65592, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65600, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65608, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65616, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65624, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65632, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65640, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65648, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65656, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65664, size = 4096
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57352, size = 4096
      
      After:
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), DATA, sector = 65544, size = 65536
      f2fs_submit_write_bio: dev = (251,0)/(251,0), rw = WRITE(S), NODE, sector = 57368, size = 4096
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8648de2c
  18. 09 5月, 2019 4 次提交
  19. 30 4月, 2019 1 次提交
  20. 17 4月, 2019 1 次提交
  21. 06 4月, 2019 2 次提交
    • C
      f2fs: fix potential recursive call when enabling data_flush · 186857c5
      Chao Yu 提交于
      As Hagbard Celine reported:
      
      Hi, this is a long standing bug that I've hit before on older kernels,
      but I was not able to get the syslog saved because of the nature of
      the bug. This time I had booted form a pen-drive, and was able to save
      the log to it's efi-partition.
      What i did to trigger it was to create a partition and format it f2fs,
      then mount it with options:
      "rw,relatime,lazytime,background_gc=on,disable_ext_identify,discard,heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,data_flush,extent_cache,mode=adaptive,active_logs=6,whint_mode=fs-based,alloc_mode=default,fsync_mode=strict".
      Then I unpacked a big .tar.xz to the partition (I used a
      gentoo-stage3-tarball as I was in process of installing Gentoo).
      
      Same options just without data_flush gives no problems.
      
      Mar 20 20:54:01 usbgentoo kernel: FAT-fs (nvme0n1p4): Volume was not
      properly unmounted. Some data may be corrupt. Please run fsck.
      Mar 20 21:05:23 usbgentoo kernel: kworker/dying (1588) used greatest
      stack depth: 12064 bytes left
      Mar 20 21:06:40 usbgentoo kernel: BUG: stack guard page was hit at
      00000000a4b0733c (stack is 0000000056016422..0000000096e7463f)
      Mar 20 21:06:40 usbgentoo kernel: kernel stack overflow
      
      ......
      
      Mar 20 21:06:40 usbgentoo kernel: Call Trace:
      Mar 20 21:06:40 usbgentoo kernel:  read_node_page+0x71/0xf0
      Mar 20 21:06:40 usbgentoo kernel:  ? xas_load+0x8/0x50
      Mar 20 21:06:40 usbgentoo kernel:  __get_node_page+0x73/0x2a0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_get_dnode_of_data+0x34e/0x580
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_inline_data+0x5e/0x2a0
      Mar 20 21:06:40 usbgentoo kernel:  __write_data_page+0x421/0x690
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_cache_pages+0x1cf/0x460
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_write_data_pages+0x2b3/0x2e0
      Mar 20 21:06:40 usbgentoo kernel:  ? f2fs_inode_chksum_verify+0x1d/0xc0
      Mar 20 21:06:40 usbgentoo kernel:  ? read_node_page+0x71/0xf0
      Mar 20 21:06:40 usbgentoo kernel:  do_writepages+0x3c/0xd0
      Mar 20 21:06:40 usbgentoo kernel:  __filemap_fdatawrite_range+0x7c/0xb0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_sync_dirty_inodes+0xf2/0x200
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_balance_fs_bg+0x2a3/0x2c0
      Mar 20 21:06:40 usbgentoo kernel:  ? f2fs_inode_dirtied+0x21/0xc0
      Mar 20 21:06:40 usbgentoo kernel:  f2fs_balance_fs+0xd6/0x2b0
      Mar 20 21:06:40 usbgentoo kernel:  __write_data_page+0x4fb/0x690
      
      ......
      
      Mar 20 21:06:40 usbgentoo kernel:  __writeback_single_inode+0x2a1/0x340
      Mar 20 21:06:40 usbgentoo kernel:  ? soft_cursor+0x1b4/0x220
      Mar 20 21:06:40 usbgentoo kernel:  writeback_sb_inodes+0x1d5/0x3e0
      Mar 20 21:06:40 usbgentoo kernel:  __writeback_inodes_wb+0x58/0xa0
      Mar 20 21:06:40 usbgentoo kernel:  wb_writeback+0x250/0x2e0
      Mar 20 21:06:40 usbgentoo kernel:  ? 0xffffffff8c000000
      Mar 20 21:06:40 usbgentoo kernel:  ? cpumask_next+0x16/0x20
      Mar 20 21:06:40 usbgentoo kernel:  wb_workfn+0x2f6/0x3b0
      Mar 20 21:06:40 usbgentoo kernel:  ? __switch_to_asm+0x40/0x70
      Mar 20 21:06:40 usbgentoo kernel:  process_one_work+0x1f5/0x3f0
      Mar 20 21:06:40 usbgentoo kernel:  worker_thread+0x28/0x3c0
      Mar 20 21:06:40 usbgentoo kernel:  ? rescuer_thread+0x330/0x330
      Mar 20 21:06:40 usbgentoo kernel:  kthread+0x10e/0x130
      Mar 20 21:06:40 usbgentoo kernel:  ? kthread_create_on_node+0x60/0x60
      Mar 20 21:06:40 usbgentoo kernel:  ret_from_fork+0x35/0x40
      
      The root cause is that we run into an infinite recursive calling in
      between f2fs_balance_fs_bg and writepage() as described below:
      
      - f2fs_write_data_pages		--- A
       - __write_data_page
        - f2fs_balance_fs
         - f2fs_balance_fs_bg		--- B
          - f2fs_sync_dirty_inodes
           - filemap_fdatawrite
            - f2fs_write_data_pages	--- A
      ...
                - f2fs_balance_fs_bg	--- B
      ...
      
      In order to fix this issue, let's detect such condition in __write_data_page()
      and just skip calling f2fs_balance_fs() recursively.
      Reported-by: NHagbard Celine <hagbardcelin@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      186857c5
    • D
      f2fs: Fix use of number of devices · 0916878d
      Damien Le Moal 提交于
      For a single device mount using a zoned block device, the zone
      information for the device is stored in the sbi->devs single entry
      array and sbi->s_ndevs is set to 1. This differs from a single device
      mount using a regular block device which does not allocate sbi->devs
      and sets sbi->s_ndevs to 0.
      
      However, sbi->s_devs == 0 condition is used throughout the code to
      differentiate a single device mount from a multi-device mount where
      sbi->s_ndevs is always larger than 1. This results in problems with
      single zoned block device volumes as these are treated as multi-device
      mounts but do not have the start_blk and end_blk information set. One
      of the problem observed is skipping of zone discard issuing resulting in
      write commands being issued to full zones or unaligned to a zone write
      pointer.
      
      Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single
      regular block device mount) and sbi->s_ndevs == 1 (single zoned block
      device mount) in the same manner. This is done by introducing the
      helper function f2fs_is_multi_device() and using this helper in place
      of direct tests of sbi->s_ndevs value, improving code readability.
      
      Fixes: 7bb3a371 ("f2fs: Fix zoned block device support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0916878d
  22. 13 3月, 2019 5 次提交
  23. 06 3月, 2019 1 次提交
    • C
      f2fs: fix potential data inconsistence of checkpoint · c42d28ce
      Chao Yu 提交于
      Previously, we changed lock from cp_rwsem to node_change, it solved
      the deadlock issue which was caused by below race condition:
      
      Thread A			Thread B
      - f2fs_setattr
       - f2fs_lock_op  -- read_lock
       - dquot_transfer
        - __dquot_transfer
         - dquot_acquire
          - commit_dqblk
           - f2fs_quota_write
            - f2fs_write_begin
             - f2fs_write_failed
      				- write_checkpoint
      				 - block_operations
      				  - f2fs_lock_all  -- write_lock
              - f2fs_truncate_blocks
               - f2fs_lock_op  -- read_lock
      
      But it breaks the sematics of cp_rwsem, in other callers like:
      - f2fs_file_write_iter -> f2fs_write_begin -> f2fs_write_failed
      - f2fs_direct_IO -> f2fs_write_failed
      
      We allow to truncate dnode w/o cp_rwsem held, result in incorrect sit
      bitmap update, which can cause further data corruption.
      
      So this patch reverts previous fix implementation, and try to fix
      deadlock by skipping calling f2fs_truncate_blocks() in f2fs_write_failed()
      only for quota file, and keep the preallocated data/node in the tail of
      quota file, we can expecte that the preallocated space can be used to
      store quota info latter soon.
      
      Fixes: af033b2a ("f2fs: guarantee journalled quota data by checkpoint")
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c42d28ce