1. 09 6月, 2020 2 次提交
  2. 05 6月, 2020 1 次提交
    • S
      f2fs: fix retry logic in f2fs_write_cache_pages() · e78790f8
      Sahitya Tummala 提交于
      In case a compressed file is getting overwritten, the current retry
      logic doesn't include the current page to be retried now as it sets
      the new start index as 0 and new end index as writeback_index - 1.
      This causes the corresponding cluster to be uncompressed and written
      as normal pages without compression. Fix this by allowing writeback to
      be retried for the current page as well (in case of compressed page
      getting retried due to index mismatch with cluster index). So that
      this cluster can be written compressed in case of overwrite.
      
      Also, align f2fs_write_cache_pages() according to the change -
      <64081362>("mm/page-writeback.c: fix range_cyclic writeback vs
      writepages deadlock").
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e78790f8
  3. 12 5月, 2020 3 次提交
  4. 08 5月, 2020 2 次提交
  5. 24 4月, 2020 1 次提交
  6. 18 4月, 2020 1 次提交
  7. 17 4月, 2020 1 次提交
  8. 31 3月, 2020 5 次提交
    • C
      f2fs: fix to use f2fs_readpage_limit() in f2fs_read_multi_pages() · 7496affa
      Chao Yu 提交于
      Multipage read flow should consider fsverity, so it needs to use
      f2fs_readpage_limit() instead of i_size_read() to check EOF condition.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7496affa
    • C
      f2fs: fix to avoid double unlock · 74878565
      Chao Yu 提交于
      On image that has verity and compression feature, if compressed pages
      and non-compressed pages are mixed in one bio, we may double unlock
      non-compressed page in below flow:
      
      - f2fs_post_read_work
       - f2fs_decompress_work
        - f2fs_decompress_bio
         - __read_end_io
          - unlock_page
       - fsverity_enqueue_verify_work
        - f2fs_verity_work
         - f2fs_verify_bio
          - unlock_page
      
      So it should skip handling non-compressed page in f2fs_decompress_work()
      if verity is on.
      
      Besides, add missing dec_page_count() in f2fs_verify_bio().
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      74878565
    • C
      f2fs: fix NULL pointer dereference in f2fs_verity_work() · 79bbefb1
      Chao Yu 提交于
      If both compression and fsverity feature is on, generic/572 will
      report below NULL pointer dereference bug.
      
       BUG: kernel NULL pointer dereference, address: 0000000000000018
       RIP: 0010:f2fs_verity_work+0x60/0x90 [f2fs]
       #PF: supervisor read access in kernel mode
       Workqueue: fsverity_read_queue f2fs_verity_work [f2fs]
       RIP: 0010:f2fs_verity_work+0x60/0x90 [f2fs]
       Call Trace:
        process_one_work+0x16c/0x3f0
        worker_thread+0x4c/0x440
        ? rescuer_thread+0x350/0x350
        kthread+0xf8/0x130
        ? kthread_unpark+0x70/0x70
        ret_from_fork+0x35/0x40
      
      There are two issue in f2fs_verity_work():
      - it needs to traverse and verify all pages in bio.
      - if pages in bio belong to non-compressed cluster, accessing
      decompress IO context stored in page private will cause NULL
      pointer dereference.
      
      Fix them.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      79bbefb1
    • C
      f2fs: fix to avoid potential deadlock · b13f67ff
      Chao Yu 提交于
      We should always check F2FS_I(inode)->cp_task condition in prior to other
      conditions in __should_serialize_io() to avoid deadloop described in
      commit 040d2bb3 ("f2fs: fix to avoid deadloop if data_flush is on"),
      however we break this rule when we support compression, fix it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b13f67ff
    • D
      f2fs: delete DIO read lock · ad8d6a02
      DongDongJu 提交于
      This lock can be a contention with multi 4k random read IO with single inode.
      
      example) fio --output=test --name=test --numjobs=60 --filename=/media/samsung960pro/file_test --rw=randread --bs=4k
       --direct=1 --time_based --runtime=7 --ioengine=libaio --iodepth=256 --group_reporting --size=10G
      
      With this commit, it remove that possible lock contention.
      Signed-off-by: NDongjoo Seo <commisori28@gmail.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ad8d6a02
  9. 20 3月, 2020 6 次提交
  10. 11 3月, 2020 2 次提交
    • C
      f2fs: fix inconsistent comments · 7a88ddb5
      Chao Yu 提交于
      Lack of maintenance on comments may mislead developers, fix them.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7a88ddb5
    • C
      f2fs: cover last_disk_size update with spinlock · c10c9820
      Chao Yu 提交于
      This change solves below hangtask issue:
      
      INFO: task kworker/u16:1:58 blocked for more than 122 seconds.
            Not tainted 5.6.0-rc2-00590-g9983bdae4974e #11
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      kworker/u16:1   D    0    58      2 0x00000000
      Workqueue: writeback wb_workfn (flush-179:0)
      Backtrace:
       (__schedule) from [<c0913234>] (schedule+0x78/0xf4)
       (schedule) from [<c017ec74>] (rwsem_down_write_slowpath+0x24c/0x4c0)
       (rwsem_down_write_slowpath) from [<c0915f2c>] (down_write+0x6c/0x70)
       (down_write) from [<c0435b80>] (f2fs_write_single_data_page+0x608/0x7ac)
       (f2fs_write_single_data_page) from [<c0435fd8>] (f2fs_write_cache_pages+0x2b4/0x7c4)
       (f2fs_write_cache_pages) from [<c043682c>] (f2fs_write_data_pages+0x344/0x35c)
       (f2fs_write_data_pages) from [<c0267ee8>] (do_writepages+0x3c/0xd4)
       (do_writepages) from [<c0310cbc>] (__writeback_single_inode+0x44/0x454)
       (__writeback_single_inode) from [<c03112d0>] (writeback_sb_inodes+0x204/0x4b0)
       (writeback_sb_inodes) from [<c03115cc>] (__writeback_inodes_wb+0x50/0xe4)
       (__writeback_inodes_wb) from [<c03118f4>] (wb_writeback+0x294/0x338)
       (wb_writeback) from [<c0312dac>] (wb_workfn+0x35c/0x54c)
       (wb_workfn) from [<c014f2b8>] (process_one_work+0x214/0x544)
       (process_one_work) from [<c014f634>] (worker_thread+0x4c/0x574)
       (worker_thread) from [<c01564fc>] (kthread+0x144/0x170)
       (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
      Reported-and-tested-by: NOndřej Jirman <megi@xff.cz>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c10c9820
  11. 03 2月, 2020 1 次提交
    • C
      fs: Enable bmap() function to properly return errors · 30460e1e
      Carlos Maiolino 提交于
      By now, bmap() will either return the physical block number related to
      the requested file offset or 0 in case of error or the requested offset
      maps into a hole.
      This patch makes the needed changes to enable bmap() to proper return
      errors, using the return value as an error return, and now, a pointer
      must be passed to bmap() to be filled with the mapped physical block.
      
      It will change the behavior of bmap() on return:
      
      - negative value in case of error
      - zero on success or map fell into a hole
      
      In case of a hole, the *block will be zero too
      
      Since this is a prep patch, by now, the only error return is -EINVAL if
      ->bmap doesn't exist.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      30460e1e
  12. 18 1月, 2020 4 次提交
    • E
      f2fs: fix deadlock allocating bio_post_read_ctx from mempool · 644c8c92
      Eric Biggers 提交于
      Without any form of coordination, any case where multiple allocations
      from the same mempool are needed at a time to make forward progress can
      deadlock under memory pressure.
      
      This is the case for struct bio_post_read_ctx, as one can be allocated
      to decrypt a Merkle tree page during fsverity_verify_bio(), which itself
      is running from a post-read callback for a data bio which has its own
      struct bio_post_read_ctx.
      
      Fix this by freeing first bio_post_read_ctx before calling
      fsverity_verify_bio().  This works because verity (if enabled) is always
      the last post-read step.
      
      This deadlock can be reproduced by trying to read from an encrypted
      verity file after reducing NUM_PREALLOC_POST_READ_CTXS to 1 and patching
      mempool_alloc() to pretend that pool->alloc() always fails.
      
      Note that since NUM_PREALLOC_POST_READ_CTXS is actually 128, to actually
      hit this bug in practice would require reading from lots of encrypted
      verity files at the same time.  But it's theoretically possible, as N
      available objects doesn't guarantee forward progress when > N/2 threads
      each need 2 objects at a time.
      
      Fixes: 95ae251f ("f2fs: add fs-verity support")
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      644c8c92
    • E
      f2fs: remove unneeded check for error allocating bio_post_read_ctx · e8ce5749
      Eric Biggers 提交于
      Since allocating an object from a mempool never fails when
      __GFP_DIRECT_RECLAIM (which is included in GFP_NOFS) is set, the check
      for failure to allocate a bio_post_read_ctx is unnecessary.  Remove it.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e8ce5749
    • C
      f2fs: fix to add swap extent correctly · 3e5e479a
      Chao Yu 提交于
      As Youling reported in mailing list:
      
      https://www.linuxquestions.org/questions/linux-newbie-8/the-file-system-f2fs-is-broken-4175666043/
      
      https://www.linux.org/threads/the-file-system-f2fs-is-broken.26490/
      
      There is a test case can corrupt f2fs image:
      - dd if=/dev/zero of=/swapfile bs=1M count=4096
      - chmod 600 /swapfile
      - mkswap /swapfile
      - swapon --discard /swapfile
      
      The root cause is f2fs_swap_activate() intends to return zero value
      to setup_swap_extents() to enable SWP_FS mode (swap file goes through
      fs), in this flow, setup_swap_extents() setups swap extent with wrong
      block address range, result in discard_swap() erasing incorrect address.
      
      Because f2fs_swap_activate() has pinned swapfile, its data block
      address will not change, it's safe to let swap to handle IO through
      raw device, so we can get rid of SWAP_FS mode and initial swap extents
      inside f2fs_swap_activate(), by this way, later discard_swap() can trim
      in right address range.
      
      Fixes: 4969c06a ("f2fs: support swap file w/ DIO")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3e5e479a
    • C
      f2fs: support data compression · 4c8ff709
      Chao Yu 提交于
      This patch tries to support compression in f2fs.
      
      - New term named cluster is defined as basic unit of compression, file can
      be divided into multiple clusters logically. One cluster includes 4 << n
      (n >= 0) logical pages, compression size is also cluster size, each of
      cluster can be compressed or not.
      
      - In cluster metadata layout, one special flag is used to indicate cluster
      is compressed one or normal one, for compressed cluster, following metadata
      maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
      data including compress header and compressed data.
      
      - In order to eliminate write amplification during overwrite, F2FS only
      support compression on write-once file, data can be compressed only when
      all logical blocks in file are valid and cluster compress ratio is lower
      than specified threshold.
      
      - To enable compression on regular inode, there are three ways:
      * chattr +c file
      * chattr +c dir; touch dir/file
      * mount w/ -o compress_extension=ext; touch file.ext
      
      Compress metadata layout:
                                   [Dnode Structure]
                   +-----------------------------------------------+
                   | cluster 1 | cluster 2 | ......... | cluster N |
                   +-----------------------------------------------+
                   .           .                       .           .
             .                       .                .                      .
        .         Compressed Cluster       .        .        Normal Cluster            .
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
      |compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 |
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
                 .                             .
               .                                           .
             .                                                           .
            +-------------+-------------+----------+----------------------------+
            | data length | data chksum | reserved |      compressed data       |
            +-------------+-------------+----------+----------------------------+
      
      Changelog:
      
      20190326:
      - fix error handling of read_end_io().
      - remove unneeded comments in f2fs_encrypt_one_page().
      
      20190327:
      - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
      - don't jump into loop directly to avoid uninitialized variables.
      - add TODO tag in error path of f2fs_write_cache_pages().
      
      20190328:
      - fix wrong merge condition in f2fs_read_multi_pages().
      - check compressed file in f2fs_post_read_required().
      
      20190401
      - allow overwrite on non-compressed cluster.
      - check cluster meta before writing compressed data.
      
      20190402
      - don't preallocate blocks for compressed file.
      
      - add lz4 compress algorithm
      - process multiple post read works in one workqueue
        Now f2fs supports processing post read work in multiple workqueue,
        it shows low performance due to schedule overhead of multiple
        workqueue executing orderly.
      
      20190921
      - compress: support buffered overwrite
      C: compress cluster flag
      V: valid block address
      N: NEW_ADDR
      
      One cluster contain 4 blocks
      
       before overwrite   after overwrite
      
      - VVVV		->	CVNN
      - CVNN		->	VVVV
      
      - CVNN		->	CVNN
      - CVNN		->	CVVV
      
      - CVVV		->	CVNN
      - CVVV		->	CVVV
      
      20191029
      - add kconfig F2FS_FS_COMPRESSION to isolate compression related
      codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
      note that: will remove lzo backend if Jaegeuk agreed that too.
      - update codes according to Eric's comments.
      
      20191101
      - apply fixes from Jaegeuk
      
      20191113
      - apply fixes from Jaegeuk
      - split workqueue for fsverity
      
      20191216
      - apply fixes from Jaegeuk
      
      20200117
      - fix to avoid NULL pointer dereference
      
      [Jaegeuk Kim]
      - add tracepoint for f2fs_{,de}compress_pages()
      - fix many bugs and add some compression stats
      - fix overwrite/mmap bugs
      - address 32bit build error, reported by Geert.
      - bug fixes when handling errors and i_compressed_blocks
      
      Reported-by: <noreply@ellerman.id.au>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4c8ff709
  13. 16 1月, 2020 1 次提交
    • C
      f2fs: introduce private bioset · f543805f
      Chao Yu 提交于
      In low memory scenario, we can allocate multiple bios without
      submitting any of them.
      
      - f2fs_write_checkpoint()
       - block_operations()
        - f2fs_sync_node_pages()
         step 1) flush cold nodes, allocate new bio from mempool
         - bio_alloc()
          - mempool_alloc()
         step 2) flush hot nodes, allocate a bio from mempool
         - bio_alloc()
          - mempool_alloc()
         step 3) flush warm nodes, be stuck in below call path
         - bio_alloc()
          - mempool_alloc()
           - loop to wait mempool element release, as we only
             reserved memory for two bio allocation, however above
             allocated two bios may never be submitted.
      
      So we need avoid using default bioset, in this patch we introduce a
      private bioset, in where we enlarg mempool element count to total
      number of log header, so that we can make sure we have enough
      backuped memory pool in scenario of allocating/holding multiple
      bios.
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f543805f
  14. 15 1月, 2020 1 次提交
    • E
      fs-verity: implement readahead of Merkle tree pages · fd39073d
      Eric Biggers 提交于
      When fs-verity verifies data pages, currently it reads each Merkle tree
      page synchronously using read_mapping_page().
      
      Therefore, when the Merkle tree pages aren't already cached, fs-verity
      causes an extra 4 KiB I/O request for every 512 KiB of data (assuming
      that the Merkle tree uses SHA-256 and 4 KiB blocks).  This results in
      more I/O requests and performance loss than is strictly necessary.
      
      Therefore, implement readahead of the Merkle tree pages.
      
      For simplicity, we take advantage of the fact that the kernel already
      does readahead of the file's *data*, just like it does for any other
      file.  Due to this, we don't really need a separate readahead state
      (struct file_ra_state) just for the Merkle tree, but rather we just need
      to piggy-back on the existing data readahead requests.
      
      We also only really need to bother with the first level of the Merkle
      tree, since the usual fan-out factor is 128, so normally over 99% of
      Merkle tree I/O requests are for the first level.
      
      Therefore, make fsverity_verify_bio() enable readahead of the first
      Merkle tree level, for up to 1/4 the number of pages in the bio, when it
      sees that the REQ_RAHEAD flag is set on the bio.  The readahead size is
      then passed down to ->read_merkle_tree_page() for the filesystem to
      (optionally) implement if it sees that the requested page is uncached.
      
      While we're at it, also make build_merkle_tree_level() set the Merkle
      tree readahead size, since it's easy to do there.
      
      However, for now don't set the readahead size in fsverity_verify_page(),
      since currently it's only used to verify holes on ext4 and f2fs, and it
      would need parameters added to know how much to read ahead.
      
      This patch significantly improves fs-verity sequential read performance.
      Some quick benchmarks with 'cat'-ing a 250MB file after dropping caches:
      
          On an ARM64 phone (using sha256-ce):
              Before: 217 MB/s
              After: 263 MB/s
              (compare to sha256sum of non-verity file: 357 MB/s)
      
          In an x86_64 VM (using sha256-avx2):
              Before: 173 MB/s
              After: 215 MB/s
              (compare to sha256sum of non-verity file: 223 MB/s)
      
      Link: https://lore.kernel.org/r/20200106205533.137005-1-ebiggers@kernel.orgReviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      fd39073d
  15. 13 12月, 2019 1 次提交
  16. 10 12月, 2019 1 次提交
  17. 20 11月, 2019 1 次提交
  18. 08 11月, 2019 1 次提交
    • C
      f2fs: fix potential overflow · 1f0d5c91
      Chao Yu 提交于
      We expect 64-bit calculation result from below statement, however
      in 32-bit machine, looped left shift operation on pgoff_t type
      variable may cause overflow issue, fix it by forcing type cast.
      
      page->index << PAGE_SHIFT;
      
      Fixes: 26de9b11 ("f2fs: avoid unnecessary updating inode during fsync")
      Fixes: 0a2aa8fb ("f2fs: refactor __exchange_data_block for speed up")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1f0d5c91
  19. 26 10月, 2019 1 次提交
    • C
      f2fs: cache global IPU bio · 0b20fcec
      Chao Yu 提交于
      In commit 8648de2c ("f2fs: add bio cache for IPU"), we added
      f2fs_submit_ipu_bio() in __write_data_page() as below:
      
      __write_data_page()
      
      	if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) {
      		f2fs_submit_ipu_bio(sbi, bio, page);
      		....
      	}
      
      in order to avoid below deadlock:
      
      Thread A				Thread B
      - __write_data_page (inode x, page y)
       - f2fs_do_write_data_page
        - set_page_writeback        ---- set writeback flag in page y
        - f2fs_inplace_write_data
       - f2fs_balance_fs
      					 - lock gc_mutex
       - lock gc_mutex
      					  - f2fs_gc
      					   - do_garbage_collect
      					    - gc_data_segment
      					     - move_data_page
      					      - f2fs_wait_on_page_writeback
      					       - wait_on_page_writeback  --- wait writeback of page y
      
      However, the bio submission breaks the merge of IPU IOs.
      
      So in this patch let's add a global bio cache for merged IPU pages,
      then f2fs_wait_on_page_writeback() is able to submit bio if a
      writebacked page is cached in global bio cache.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b20fcec
  20. 16 9月, 2019 3 次提交
  21. 07 9月, 2019 1 次提交