1. 09 1月, 2019 1 次提交
  2. 29 12月, 2018 1 次提交
  3. 27 12月, 2018 2 次提交
    • C
      f2fs: check PageWriteback flag for ordered case · bae0ee7a
      Chao Yu 提交于
      For all ordered cases in f2fs_wait_on_page_writeback(), we need to
      check PageWriteback status, so let's clean up to relocate the check
      into f2fs_wait_on_page_writeback().
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bae0ee7a
    • J
      f2fs: use kvmalloc, if kmalloc is failed · 5222595d
      Jaegeuk Kim 提交于
      One report says memalloc failure during mount.
      
       (unwind_backtrace) from [<c010cd4c>] (show_stack+0x10/0x14)
       (show_stack) from [<c049c6b8>] (dump_stack+0x8c/0xa0)
       (dump_stack) from [<c024fcf0>] (warn_alloc+0xc4/0x160)
       (warn_alloc) from [<c0250218>] (__alloc_pages_nodemask+0x3f4/0x10d0)
       (__alloc_pages_nodemask) from [<c0270450>] (kmalloc_order_trace+0x2c/0x120)
       (kmalloc_order_trace) from [<c03fa748>] (build_node_manager+0x35c/0x688)
       (build_node_manager) from [<c03de494>] (f2fs_fill_super+0xf0c/0x16cc)
       (f2fs_fill_super) from [<c02a5864>] (mount_bdev+0x15c/0x188)
       (mount_bdev) from [<c03da624>] (f2fs_mount+0x18/0x20)
       (f2fs_mount) from [<c02a68b8>] (mount_fs+0x158/0x19c)
       (mount_fs) from [<c02c3c9c>] (vfs_kern_mount+0x78/0x134)
       (vfs_kern_mount) from [<c02c76ac>] (do_mount+0x474/0xca4)
       (do_mount) from [<c02c8264>] (SyS_mount+0x94/0xbc)
       (SyS_mount) from [<c0108180>] (ret_fast_syscall+0x0/0x48)
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5222595d
  4. 14 12月, 2018 1 次提交
  5. 27 11月, 2018 6 次提交
    • J
      f2fs: fix m_may_create to make OPU DIO write correctly · f4f0b677
      Jia Zhu 提交于
      Previously, we added a parameter @map.m_may_create to trigger OPU
      allocation and call f2fs_balance_fs() correctly.
      
      But in get_more_blocks(), @create has been overwritten by below code.
      So the function f2fs_map_blocks() will not allocate new block address
      but directly go out. Meanwile,there are several functions calling
      f2fs_map_blocks() directly and @map.m_may_create not initialized.
      CODE:
      create = dio->op == REQ_OP_WRITE;
      	if (dio->flags & DIO_SKIP_HOLES) {
      		if (fs_startblk <= ((i_size_read(dio->inode) - 1) >>
      						i_blkbits))
      			create = 0;
      	}
      
      This patch fixes it.
      Signed-off-by: NJia Zhu <zhujia13@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f4f0b677
    • J
      f2fs: fix to update new block address correctly for OPU · 73c0a927
      Jia Zhu 提交于
      Previously, we allocated a new block address for OPU mode in direct_IO.
      
      But the new address couldn't be assigned to @map->m_pblk correctly.
      
      This patch fix it.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 511f52d02f05 ("f2fs: allow out-place-update for direct IO in LFS mode")
      Signed-off-by: NJia Zhu <zhujia13@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      73c0a927
    • S
      f2fs: fix race between write_checkpoint and write_begin · 2866fb16
      Sheng Yong 提交于
      The following race could lead to inconsistent SIT bitmap:
      
      Task A                          Task B
      ======                          ======
      f2fs_write_checkpoint
        block_operations
          f2fs_lock_all
            down_write(node_change)
            down_write(node_write)
            ... sync ...
            up_write(node_change)
                                      f2fs_file_write_iter
                                        set_inode_flag(FI_NO_PREALLOC)
                                        ......
                                        f2fs_write_begin(index=0, has inline data)
                                          prepare_write_begin
                                            __do_map_lock(AIO) => down_read(node_change)
                                            f2fs_convert_inline_page => update SIT
                                            __do_map_lock(AIO) => up_read(node_change)
        f2fs_flush_sit_entries <= inconsistent SIT
        finish write checkpoint
        sudden-power-off
      
      If SPO occurs after checkpoint is finished, SIT bitmap will be set
      incorrectly.
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2866fb16
    • Y
      f2fs: only flush the single temp bio cache which owns the target page · 1e771e83
      Yunlong Song 提交于
      Previously, when f2fs finds which temp bio cache owns the target page,
      it will flush all the three temp bio caches, but we only need to flush
      one single bio cache indeed, which can help to keep bio merged.
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1e771e83
    • C
      f2fs: fix out-place-update DIO write · f9d6d059
      Chao Yu 提交于
      In get_more_blocks(), we may override @create as below code:
      
      	create = dio->op == REQ_OP_WRITE;
      	if (dio->flags & DIO_SKIP_HOLES) {
      		if (fs_startblk <= ((i_size_read(dio->inode) - 1) >>
      						i_blkbits))
      			create = 0;
      	}
      
      But in f2fs_map_blocks(), we only trigger f2fs_balance_fs() if @create
      is 1, so in LFS mode, dio overwrite under LFS mode can easily run out
      of free segments, result in below panic.
      
       Call Trace:
        allocate_segment_by_default+0xa8/0x270 [f2fs]
        f2fs_allocate_data_block+0x1ea/0x5c0 [f2fs]
        __allocate_data_block+0x306/0x480 [f2fs]
        f2fs_map_blocks+0x6f6/0x920 [f2fs]
        __get_data_block+0x4f/0xb0 [f2fs]
        get_data_block_dio_write+0x50/0x60 [f2fs]
        do_blockdev_direct_IO+0xcd5/0x21e0
        __blockdev_direct_IO+0x3a/0x3c
        f2fs_direct_IO+0x1ff/0x4a0 [f2fs]
        generic_file_direct_write+0xd9/0x160
        __generic_file_write_iter+0xbb/0x1e0
        f2fs_file_write_iter+0xaf/0x220 [f2fs]
        __vfs_write+0xd0/0x130
        vfs_write+0xb2/0x1b0
        SyS_pwrite64+0x69/0xa0
        ? vtime_user_exit+0x29/0x70
        do_syscall_64+0x6e/0x160
        entry_SYSCALL64_slow_path+0x25/0x25
       RIP: new_curseg+0x36f/0x380 [f2fs] RSP: ffffac570393f7a8
      
      So this patch introduces a parameter map.m_may_create to indicate that
      f2fs_map_blocks() is called from write or read path, which can give the
      right hint to let f2fs_map_blocks() trigger OPU allocation and call
      f2fs_balanc_fs() correctly.
      
      BTW, it disables physical address preallocation for direct IO in
      f2fs_preallocate_blocks, which is redundant to OPU allocation of
      f2fs_map_blocks.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f9d6d059
    • C
      f2fs: add to account direct IO · 02b16d0a
      Chao Yu 提交于
      This patch adds f2fs_dio_submit_bio() to hook submit_io/end_io functions
      in direct IO path, in order to account DIO.
      
      Later, we will add this count into is_idle() to let background GC/Discard
      thread be aware of DIO.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      02b16d0a
  6. 23 10月, 2018 7 次提交
    • C
      f2fs: guarantee journalled quota data by checkpoint · af033b2a
      Chao Yu 提交于
      For journalled quota mode, let checkpoint to flush dquot dirty data
      and quota file data to guarntee persistence of all quota sysfile in
      last checkpoint, by this way, we can avoid corrupting quota sysfile
      when encountering SPO.
      
      The implementation is as below:
      
      1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
      cached dquot metadata changes in quota subsystem, and later checkpoint
      should:
       a) flush dquot metadata into quota file.
       b) flush quota file to storage to keep file usage be consistent.
      
      2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
      operation failed due to -EIO or -ENOSPC, so later,
       a) checkpoint will skip syncing dquot metadata.
       b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
          hint for fsck repairing.
      
      3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
      data updating is very heavy, it may cause hungtask in block_operation().
      To avoid this, if our retry time exceed threshold, let's just skip
      flushing and retry in next checkpoint().
      Signed-off-by: NWeichao Guo <guoweichao@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: avoid warnings and set fsck flag]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      af033b2a
    • S
      f2fs: fix data corruption issue with hardware encryption · 1e78e8bd
      Sahitya Tummala 提交于
      Direct IO can be used in case of hardware encryption. The following
      scenario results into data corruption issue in this path -
      
      Thread A -                          Thread B-
      -> write file#1 in direct IO
                                          -> GC gets kicked in
                                          -> GC submitted bio on meta mapping
      				       for file#1, but pending completion
      -> write file#1 again with new data
         in direct IO
                                          -> GC bio gets completed now
                                          -> GC writes old data to the new
                                             location and thus file#1 is
      				       corrupted.
      
      Fix this by submitting and waiting for pending io on meta mapping
      for direct IO case in f2fs_map_blocks().
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1e78e8bd
    • C
      f2fs: fix to spread clear_cold_data() · 2baf0781
      Chao Yu 提交于
      We need to drop PG_checked flag on page as well when we clear PG_uptodate
      flag, in order to avoid treating the page as GCing one later.
      Signed-off-by: NWeichao Guo <guoweichao@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2baf0781
    • J
      Revert "f2fs: fix to clear PG_checked flag in set_page_dirty()" · 164a63fa
      Jaegeuk Kim 提交于
      This reverts commit 66110abc.
      
      If we clear the cold data flag out of the writeback flow, we can miscount
      -1 by end_io, which incurs a deadlock caused by all I/Os being blocked during
      heavy GC.
      
      Balancing F2FS Async:
       - IO (CP:    1, Data:   -1, Flush: (   0    0    1), Discard: (   ...
      
      GC thread:                              IRQ
      - move_data_page()
       - set_page_dirty()
        - clear_cold_data()
                                              - f2fs_write_end_io()
                                               - type = WB_DATA_TYPE(page);
                                                 here, we get wrong type
                                               - dec_page_count(sbi, type);
       - f2fs_wait_on_page_writeback()
      
      Cc: <stable@vger.kernel.org>
      Reported-and-Tested-by: NPark Ju Hyung <qkrwngud825@gmail.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      164a63fa
    • J
      f2fs: account read IOs and use IO counts for is_idle · 5f9abab4
      Jaegeuk Kim 提交于
      This patch adds issued read IO counts which is under block layer.
      
      Chao modified a bit, since:
      
      Below race can cause reversed reference on F2FS_RD_DATA, there is
      the same issue in f2fs_submit_page_bio(), fix them by relocate
      __submit_bio() and inc_page_count.
      
      Thread A			Thread B
      - f2fs_write_begin
       - f2fs_submit_page_read
       - __submit_bio
      				- f2fs_read_end_io
      				 - __read_end_io
      				 - dec_page_count(, F2FS_RD_DATA)
       - inc_page_count(, F2FS_RD_DATA)
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5f9abab4
    • C
      f2fs: fix to account IO correctly for cgroup writeback · 78efac53
      Chao Yu 提交于
      Now, we have supported cgroup writeback, it depends on correctly IO
      account of specified filesystem.
      
      But in commit d1b3e72d ("f2fs: submit bio of in-place-update pages"),
      we split write paths from f2fs_submit_page_mbio() to two:
      - f2fs_submit_page_bio() for IPU path
      - f2fs_submit_page_bio() for OPU path
      
      But still we account write IO only in f2fs_submit_page_mbio(), result in
      incorrect IO account, fix it by adding missing IO account in IPU path.
      
      Fixes: d1b3e72d ("f2fs: submit bio of in-place-update pages")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      78efac53
    • C
      f2fs: fix to account IO correctly · 4c58ed07
      Chao Yu 提交于
      Below race can cause reversed reference on dirty count, fix it by
      relocating __submit_bio() and inc_page_count().
      
      Thread A				Thread B
      - f2fs_inplace_write_data
       - f2fs_submit_page_bio
        - __submit_bio
      					- f2fs_write_end_io
      					 - dec_page_count
        - inc_page_count
      
      Cc: <stable@vger.kernel.org>
      Fixes: d1b3e72d ("f2fs: submit bio of in-place-update pages")
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4c58ed07
  7. 21 10月, 2018 2 次提交
  8. 17 10月, 2018 1 次提交
    • D
      f2fs: checkpoint disabling · 4354994f
      Daniel Rosenberg 提交于
      Note that, it requires "f2fs: return correct errno in f2fs_gc".
      
      This adds a lightweight non-persistent snapshotting scheme to f2fs.
      
      To use, mount with the option checkpoint=disable, and to return to
      normal operation, remount with checkpoint=enable. If the filesystem
      is shut down before remounting with checkpoint=enable, it will revert
      back to its apparent state when it was first mounted with
      checkpoint=disable. This is useful for situations where you wish to be
      able to roll back the state of the disk in case of some critical
      failure.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      [Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4354994f
  9. 03 10月, 2018 1 次提交
    • J
      f2fs: clear PageError on the read path · fb7d70db
      Jaegeuk Kim 提交于
      When running fault injection test, I hit somewhat wrong behavior in f2fs_gc ->
      gc_data_segment():
      
      0. fault injection generated some PageError'ed pages
      
      1. gc_data_segment
       -> f2fs_get_read_data_page(REQ_RAHEAD)
      
      2. move_data_page
       -> f2fs_get_lock_data_page()
        -> f2f_get_read_data_page()
         -> f2fs_submit_page_read()
          -> submit_bio(READ)
        -> return EIO due to PageError
        -> fail to move data
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fb7d70db
  10. 01 10月, 2018 3 次提交
    • C
      f2fs: allow out-place-update for direct IO in LFS mode · f847c699
      Chao Yu 提交于
      Normally, DIO uses in-pllace-update, but in LFS mode, f2fs doesn't
      allow triggering any in-place-update writes, so we fallback direct
      write to buffered write, result in bad performance of large size
      write.
      
      This patch adds to support triggering out-place-update for direct IO
      to enhance its performance.
      
      Note that it needs to exclude direct read IO during direct write,
      since new data writing to new block address will no be valid until
      write finished.
      
      storage: zram
      
      time xfs_io -f -d /mnt/f2fs/file -c "pwrite 0 1073741824" -c "fsync"
      
      Before:
      real	0m13.061s
      user	0m0.327s
      sys	0m12.486s
      
      After:
      real	0m6.448s
      user	0m0.228s
      sys	0m6.212s
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f847c699
    • C
      f2fs: refactor ->page_mkwrite() flow · 39a86958
      Chao Yu 提交于
      Thread A				Thread B
      - f2fs_vm_page_mkwrite
      					- f2fs_setattr
      					 - down_write(i_mmap_sem)
      					 - truncate_setsize
      					 - f2fs_truncate
      					 - up_write(i_mmap_sem)
       - f2fs_reserve_block
       reserve NEW_ADDR
       - skip dirty page due to truncation
      
      1. we don't need to rserve new block address for a truncated page.
      2. dn.data_blkaddr is used out of node page lock coverage.
      
      Refactor ->page_mkwrite() flow to fix above issues:
      - use __do_map_lock() to avoid racing checkpoint()
      - lock data page in prior to dnode page
      - cover f2fs_reserve_block with i_mmap_sem lock
      - wait page writeback before zeroing page
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      39a86958
    • C
      Revert: "f2fs: check last page index in cached bio to decide submission" · bab475c5
      Chao Yu 提交于
      There is one case that we can leave bio in f2fs, result in hanging
      page writeback waiter.
      
      Thread A				Thread B
      - f2fs_write_cache_pages
       - f2fs_submit_page_write
       page #0 cached in bio #0 of cold log
       - f2fs_submit_page_write
       page #1 cached in bio #1 of warm log
      					- f2fs_write_cache_pages
      					 - f2fs_submit_page_write
      					 bio is full, submit bio #1 contain page #1
       - f2fs_submit_merged_write_cond(, page #1)
       fail to submit bio #0 due to page #1 is not in any cached bios.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bab475c5
  11. 27 9月, 2018 1 次提交
  12. 13 9月, 2018 2 次提交
  13. 08 9月, 2018 1 次提交
  14. 06 9月, 2018 1 次提交
  15. 21 8月, 2018 3 次提交
    • C
      f2fs: readahead encrypted block during GC · 6aa58d8a
      Chao Yu 提交于
      During GC, for each encrypted block, we will read block synchronously
      into meta page, and then submit it into current cold data log area.
      
      So this block read model with 4k granularity can make poor performance,
      like migrating non-encrypted block, let's readahead encrypted block
      as well to improve migration performance.
      
      To implement this, we choose meta page that its index is old block
      address of the encrypted block, and readahead ciphertext into this
      page, later, if readaheaded page is still updated, we will load its
      data into target meta page, and submit the write IO.
      
      Note that for OPU, truncation, deletion, we need to invalid meta
      page after we invalid old block address, to make sure we won't load
      invalid data from target meta page during encrypted block migration.
      
      for ((i = 0; i < 1000; i++))
      do {
              xfs_io -f /mnt/f2fs/dir/$i -c "pwrite 0 128k" -c "fsync";
      } done
      
      for ((i = 0; i < 1000; i+=2))
      do {
              rm /mnt/f2fs/dir/$i;
      } done
      
      ret = ioctl(fd, F2FS_IOC_GARBAGE_COLLECT, 0);
      
      Before:
                    gc-6549  [001] d..1 214682.212797: block_rq_insert: 8,32 RA 32768 () 786400 + 64 [gc]
                    gc-6549  [001] d..1 214682.212802: block_unplug: [gc] 1
                    gc-6549  [001] .... 214682.213892: block_bio_queue: 8,32 R 67494144 + 8 [gc]
                    gc-6549  [001] .... 214682.213899: block_getrq: 8,32 R 67494144 + 8 [gc]
                    gc-6549  [001] .... 214682.213902: block_plug: [gc]
                    gc-6549  [001] d..1 214682.213905: block_rq_insert: 8,32 R 4096 () 67494144 + 8 [gc]
                    gc-6549  [001] d..1 214682.213908: block_unplug: [gc] 1
                    gc-6549  [001] .... 214682.226405: block_bio_queue: 8,32 R 67494152 + 8 [gc]
                    gc-6549  [001] .... 214682.226412: block_getrq: 8,32 R 67494152 + 8 [gc]
                    gc-6549  [001] .... 214682.226414: block_plug: [gc]
                    gc-6549  [001] d..1 214682.226417: block_rq_insert: 8,32 R 4096 () 67494152 + 8 [gc]
                    gc-6549  [001] d..1 214682.226420: block_unplug: [gc] 1
                    gc-6549  [001] .... 214682.226904: block_bio_queue: 8,32 R 67494160 + 8 [gc]
                    gc-6549  [001] .... 214682.226910: block_getrq: 8,32 R 67494160 + 8 [gc]
                    gc-6549  [001] .... 214682.226911: block_plug: [gc]
                    gc-6549  [001] d..1 214682.226914: block_rq_insert: 8,32 R 4096 () 67494160 + 8 [gc]
                    gc-6549  [001] d..1 214682.226916: block_unplug: [gc] 1
      
      After:
                    gc-5678  [003] .... 214327.025906: block_bio_queue: 8,32 R 67493824 + 8 [gc]
                    gc-5678  [003] .... 214327.025908: block_bio_backmerge: 8,32 R 67493824 + 8 [gc]
                    gc-5678  [003] .... 214327.025915: block_bio_queue: 8,32 R 67493832 + 8 [gc]
                    gc-5678  [003] .... 214327.025917: block_bio_backmerge: 8,32 R 67493832 + 8 [gc]
                    gc-5678  [003] .... 214327.025923: block_bio_queue: 8,32 R 67493840 + 8 [gc]
                    gc-5678  [003] .... 214327.025925: block_bio_backmerge: 8,32 R 67493840 + 8 [gc]
                    gc-5678  [003] .... 214327.025932: block_bio_queue: 8,32 R 67493848 + 8 [gc]
                    gc-5678  [003] .... 214327.025934: block_bio_backmerge: 8,32 R 67493848 + 8 [gc]
                    gc-5678  [003] .... 214327.025941: block_bio_queue: 8,32 R 67493856 + 8 [gc]
                    gc-5678  [003] .... 214327.025943: block_bio_backmerge: 8,32 R 67493856 + 8 [gc]
                    gc-5678  [003] .... 214327.025953: block_bio_queue: 8,32 R 67493864 + 8 [gc]
                    gc-5678  [003] .... 214327.025955: block_bio_backmerge: 8,32 R 67493864 + 8 [gc]
                    gc-5678  [003] .... 214327.025962: block_bio_queue: 8,32 R 67493872 + 8 [gc]
                    gc-5678  [003] .... 214327.025964: block_bio_backmerge: 8,32 R 67493872 + 8 [gc]
                    gc-5678  [003] .... 214327.025970: block_bio_queue: 8,32 R 67493880 + 8 [gc]
                    gc-5678  [003] .... 214327.025972: block_bio_backmerge: 8,32 R 67493880 + 8 [gc]
                    gc-5678  [003] .... 214327.026000: block_bio_queue: 8,32 WS 34123776 + 2048 [gc]
                    gc-5678  [003] .... 214327.026019: block_getrq: 8,32 WS 34123776 + 2048 [gc]
                    gc-5678  [003] d..1 214327.026021: block_rq_insert: 8,32 R 131072 () 67493632 + 256 [gc]
                    gc-5678  [003] d..1 214327.026023: block_unplug: [gc] 1
                    gc-5678  [003] d..1 214327.026026: block_rq_issue: 8,32 R 131072 () 67493632 + 256 [gc]
                    gc-5678  [003] .... 214327.026046: block_plug: [gc]
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6aa58d8a
    • J
      f2fs: avoid fi->i_gc_rwsem[WRITE] lock in f2fs_gc · 6f8d4455
      Jaegeuk Kim 提交于
      The f2fs_gc() called by f2fs_balance_fs() requires to be called outside of
      fi->i_gc_rwsem[WRITE], since f2fs_gc() can try to grab it in a loop.
      
      If it hits the miximum retrials in GC, let's give a chance to release
      gc_mutex for a short time in order not to go into live lock in the worst
      case.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6f8d4455
    • J
      f2fs: fix performance issue observed with multi-thread sequential read · 853137ce
      Jaegeuk Kim 提交于
      This reverts the commit - "b93f7712 - f2fs: remove writepages lock"
      to fix the drop in sequential read throughput.
      
      Test: ./tiotest -t 32 -d /data/tio_tmp -f 32 -b 524288 -k 1 -k 3 -L
      device: UFS
      
      Before -
      read throughput: 185 MB/s
      total read requests: 85177 (of these ~80000 are 4KB size requests).
      total write requests: 2546 (of these ~2208 requests are written in 512KB).
      
      After -
      read throughput: 758 MB/s
      total read requests: 2417 (of these ~2042 are 512KB reads).
      total write requests: 2701 (of these ~2034 requests are written in 512KB).
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      853137ce
  16. 18 8月, 2018 1 次提交
  17. 15 8月, 2018 1 次提交
    • A
      f2fs: rework fault injection handling to avoid a warning · 7fa750a1
      Arnd Bergmann 提交于
      When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an
      unused label:
      
      fs/f2fs/segment.c: In function '__submit_discard_cmd':
      fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label]
      
      This could be fixed by adding another #ifdef around it, but the more
      reliable way of doing this seems to be to remove the other #ifdefs
      where that is easily possible.
      
      By defining time_to_inject() as a trivial stub, most of the checks for
      CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer
      formatting of the code.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7fa750a1
  18. 14 8月, 2018 2 次提交
    • C
      f2fs: fix avoid race between truncate and background GC · a33c1502
      Chao Yu 提交于
      Thread A				Background GC
      - f2fs_setattr isize to 0
       - truncate_setsize
      					- gc_data_segment
      					 - f2fs_get_read_data_page page #0
      					  - set_page_dirty
      					  - set_cold_data
       - f2fs_truncate
      
      - f2fs_setattr isize to 4k
      - read 4k <--- hit data in cached page #0
      
      Above race condition can cause read out invalid data in a truncated
      page, fix it by i_gc_rwsem[WRITE] lock.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a33c1502
    • C
      f2fs: fix to do sanity check with block address in main area v2 · 91291e99
      Chao Yu 提交于
      This patch adds f2fs_is_valid_blkaddr() in below functions to do sanity
      check with block address to avoid pentential panic:
      - f2fs_grab_read_bio()
      - __written_first_block()
      
      https://bugzilla.kernel.org/show_bug.cgi?id=200465
      
      - Reproduce
      
      - POC (poc.c)
          #define _GNU_SOURCE
          #include <sys/types.h>
          #include <sys/mount.h>
          #include <sys/mman.h>
          #include <sys/stat.h>
          #include <sys/xattr.h>
      
          #include <dirent.h>
          #include <errno.h>
          #include <error.h>
          #include <fcntl.h>
          #include <stdio.h>
          #include <stdlib.h>
          #include <string.h>
          #include <unistd.h>
      
          #include <linux/falloc.h>
          #include <linux/loop.h>
      
          static void activity(char *mpoint) {
      
            char *xattr;
            int err;
      
            err = asprintf(&xattr, "%s/foo/bar/xattr", mpoint);
      
            char buf2[113];
            memset(buf2, 0, sizeof(buf2));
            listxattr(xattr, buf2, sizeof(buf2));
      
          }
      
          int main(int argc, char *argv[]) {
            activity(argv[1]);
            return 0;
          }
      
      - kernel message
      [  844.718738] F2FS-fs (loop0): Mounted with checkpoint version = 2
      [  846.430929] F2FS-fs (loop0): access invalid blkaddr:1024
      [  846.431058] WARNING: CPU: 1 PID: 1249 at fs/f2fs/checkpoint.c:154 f2fs_is_valid_blkaddr+0x10f/0x160
      [  846.431059] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper
      [  846.431310] CPU: 1 PID: 1249 Comm: a.out Not tainted 4.18.0-rc3+ #1
      [  846.431312] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  846.431315] RIP: 0010:f2fs_is_valid_blkaddr+0x10f/0x160
      [  846.431316] Code: 00 eb ed 31 c0 83 fa 05 75 ae 48 83 ec 08 48 8b 3f 89 f1 48 c7 c2 fc 0b 0f 8b 48 c7 c6 8b d7 09 8b 88 44 24 07 e8 61 8b ff ff <0f> 0b 0f b6 44 24 07 48 83 c4 08 eb 81 4c 8b 47 10 8b 8f 38 04 00
      [  846.431347] RSP: 0018:ffff961c414a7bc0 EFLAGS: 00010282
      [  846.431349] RAX: 0000000000000000 RBX: ffffc5f787b8ea80 RCX: 0000000000000000
      [  846.431350] RDX: 0000000000000000 RSI: ffff89dfffd165d8 RDI: ffff89dfffd165d8
      [  846.431351] RBP: ffff961c414a7c20 R08: 0000000000000001 R09: 0000000000000248
      [  846.431353] R10: 0000000000000000 R11: 0000000000000248 R12: 0000000000000007
      [  846.431369] R13: ffff89dff5492800 R14: ffff89dfae3aa000 R15: ffff89dff4ff88d0
      [  846.431372] FS:  00007f882e2fb700(0000) GS:ffff89dfffd00000(0000) knlGS:0000000000000000
      [  846.431373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  846.431374] CR2: 0000000001a88008 CR3: 00000001eb572000 CR4: 00000000000006e0
      [  846.431384] Call Trace:
      [  846.431426]  f2fs_iget+0x6f4/0xe70
      [  846.431430]  ? f2fs_find_entry+0x71/0x90
      [  846.431432]  f2fs_lookup+0x1aa/0x390
      [  846.431452]  __lookup_slow+0x97/0x150
      [  846.431459]  lookup_slow+0x35/0x50
      [  846.431462]  walk_component+0x1c6/0x470
      [  846.431479]  ? memcg_kmem_charge_memcg+0x70/0x90
      [  846.431488]  ? page_add_file_rmap+0x13/0x200
      [  846.431491]  path_lookupat+0x76/0x230
      [  846.431501]  ? __alloc_pages_nodemask+0xfc/0x280
      [  846.431504]  filename_lookup+0xb8/0x1a0
      [  846.431534]  ? _cond_resched+0x16/0x40
      [  846.431541]  ? kmem_cache_alloc+0x160/0x1d0
      [  846.431549]  ? path_listxattr+0x41/0xa0
      [  846.431551]  path_listxattr+0x41/0xa0
      [  846.431570]  do_syscall_64+0x55/0x100
      [  846.431583]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  846.431607] RIP: 0033:0x7f882de1c0d7
      [  846.431607] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48
      [  846.431639] RSP: 002b:00007ffe8e66c238 EFLAGS: 00000202 ORIG_RAX: 00000000000000c2
      [  846.431641] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f882de1c0d7
      [  846.431642] RDX: 0000000000000071 RSI: 00007ffe8e66c280 RDI: 0000000001a880c0
      [  846.431643] RBP: 00007ffe8e66c300 R08: 0000000001a88010 R09: 0000000000000000
      [  846.431645] R10: 00000000000001ab R11: 0000000000000202 R12: 0000000000400550
      [  846.431646] R13: 00007ffe8e66c400 R14: 0000000000000000 R15: 0000000000000000
      [  846.431648] ---[ end trace abca54df39d14f5c ]---
      [  846.431651] F2FS-fs (loop0): invalid blkaddr: 1024, type: 5, run fsck to fix.
      [  846.431762] WARNING: CPU: 1 PID: 1249 at fs/f2fs/f2fs.h:2697 f2fs_iget+0xd17/0xe70
      [  846.431763] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper
      [  846.431797] CPU: 1 PID: 1249 Comm: a.out Tainted: G        W         4.18.0-rc3+ #1
      [  846.431798] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  846.431800] RIP: 0010:f2fs_iget+0xd17/0xe70
      [  846.431801] Code: ff ff 48 63 d8 e9 e1 f6 ff ff 48 8b 45 c8 41 b8 05 00 00 00 48 c7 c2 d8 e8 0e 8b 48 c7 c6 1d b0 0a 8b 48 8b 38 e8 f9 b4 00 00 <0f> 0b 48 8b 45 c8 f0 80 48 48 04 e9 d8 f9 ff ff 0f 0b 48 8b 43 18
      [  846.431832] RSP: 0018:ffff961c414a7bd0 EFLAGS: 00010282
      [  846.431834] RAX: 0000000000000000 RBX: ffffc5f787b8ea80 RCX: 0000000000000006
      [  846.431835] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff89dfffd165d0
      [  846.431836] RBP: ffff961c414a7c20 R08: 0000000000000000 R09: 0000000000000273
      [  846.431837] R10: 0000000000000000 R11: ffff89dfad50ca60 R12: 0000000000000007
      [  846.431838] R13: ffff89dff5492800 R14: ffff89dfae3aa000 R15: ffff89dff4ff88d0
      [  846.431840] FS:  00007f882e2fb700(0000) GS:ffff89dfffd00000(0000) knlGS:0000000000000000
      [  846.431841] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  846.431842] CR2: 0000000001a88008 CR3: 00000001eb572000 CR4: 00000000000006e0
      [  846.431846] Call Trace:
      [  846.431850]  ? f2fs_find_entry+0x71/0x90
      [  846.431853]  f2fs_lookup+0x1aa/0x390
      [  846.431856]  __lookup_slow+0x97/0x150
      [  846.431858]  lookup_slow+0x35/0x50
      [  846.431874]  walk_component+0x1c6/0x470
      [  846.431878]  ? memcg_kmem_charge_memcg+0x70/0x90
      [  846.431880]  ? page_add_file_rmap+0x13/0x200
      [  846.431882]  path_lookupat+0x76/0x230
      [  846.431884]  ? __alloc_pages_nodemask+0xfc/0x280
      [  846.431886]  filename_lookup+0xb8/0x1a0
      [  846.431890]  ? _cond_resched+0x16/0x40
      [  846.431891]  ? kmem_cache_alloc+0x160/0x1d0
      [  846.431894]  ? path_listxattr+0x41/0xa0
      [  846.431896]  path_listxattr+0x41/0xa0
      [  846.431898]  do_syscall_64+0x55/0x100
      [  846.431901]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  846.431902] RIP: 0033:0x7f882de1c0d7
      [  846.431903] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48
      [  846.431934] RSP: 002b:00007ffe8e66c238 EFLAGS: 00000202 ORIG_RAX: 00000000000000c2
      [  846.431936] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f882de1c0d7
      [  846.431937] RDX: 0000000000000071 RSI: 00007ffe8e66c280 RDI: 0000000001a880c0
      [  846.431939] RBP: 00007ffe8e66c300 R08: 0000000001a88010 R09: 0000000000000000
      [  846.431940] R10: 00000000000001ab R11: 0000000000000202 R12: 0000000000400550
      [  846.431941] R13: 00007ffe8e66c400 R14: 0000000000000000 R15: 0000000000000000
      [  846.431943] ---[ end trace abca54df39d14f5d ]---
      [  846.432033] F2FS-fs (loop0): access invalid blkaddr:1024
      [  846.432051] WARNING: CPU: 1 PID: 1249 at fs/f2fs/checkpoint.c:154 f2fs_is_valid_blkaddr+0x10f/0x160
      [  846.432051] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper
      [  846.432085] CPU: 1 PID: 1249 Comm: a.out Tainted: G        W         4.18.0-rc3+ #1
      [  846.432086] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  846.432089] RIP: 0010:f2fs_is_valid_blkaddr+0x10f/0x160
      [  846.432089] Code: 00 eb ed 31 c0 83 fa 05 75 ae 48 83 ec 08 48 8b 3f 89 f1 48 c7 c2 fc 0b 0f 8b 48 c7 c6 8b d7 09 8b 88 44 24 07 e8 61 8b ff ff <0f> 0b 0f b6 44 24 07 48 83 c4 08 eb 81 4c 8b 47 10 8b 8f 38 04 00
      [  846.432120] RSP: 0018:ffff961c414a7900 EFLAGS: 00010286
      [  846.432122] RAX: 0000000000000000 RBX: 0000000000000400 RCX: 0000000000000006
      [  846.432123] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffff89dfffd165d0
      [  846.432124] RBP: ffff89dff5492800 R08: 0000000000000001 R09: 000000000000029d
      [  846.432125] R10: ffff961c414a7820 R11: 000000000000029d R12: 0000000000000400
      [  846.432126] R13: 0000000000000000 R14: ffff89dff4ff88d0 R15: 0000000000000000
      [  846.432128] FS:  00007f882e2fb700(0000) GS:ffff89dfffd00000(0000) knlGS:0000000000000000
      [  846.432130] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  846.432131] CR2: 0000000001a88008 CR3: 00000001eb572000 CR4: 00000000000006e0
      [  846.432135] Call Trace:
      [  846.432151]  f2fs_wait_on_block_writeback+0x20/0x110
      [  846.432158]  f2fs_grab_read_bio+0xbc/0xe0
      [  846.432161]  f2fs_submit_page_read+0x21/0x280
      [  846.432163]  f2fs_get_read_data_page+0xb7/0x3c0
      [  846.432165]  f2fs_get_lock_data_page+0x29/0x1e0
      [  846.432167]  f2fs_get_new_data_page+0x148/0x550
      [  846.432170]  f2fs_add_regular_entry+0x1d2/0x550
      [  846.432178]  ? __switch_to+0x12f/0x460
      [  846.432181]  f2fs_add_dentry+0x6a/0xd0
      [  846.432184]  f2fs_do_add_link+0xe9/0x140
      [  846.432186]  __recover_dot_dentries+0x260/0x280
      [  846.432189]  f2fs_lookup+0x343/0x390
      [  846.432193]  __lookup_slow+0x97/0x150
      [  846.432195]  lookup_slow+0x35/0x50
      [  846.432208]  walk_component+0x1c6/0x470
      [  846.432212]  ? memcg_kmem_charge_memcg+0x70/0x90
      [  846.432215]  ? page_add_file_rmap+0x13/0x200
      [  846.432217]  path_lookupat+0x76/0x230
      [  846.432219]  ? __alloc_pages_nodemask+0xfc/0x280
      [  846.432221]  filename_lookup+0xb8/0x1a0
      [  846.432224]  ? _cond_resched+0x16/0x40
      [  846.432226]  ? kmem_cache_alloc+0x160/0x1d0
      [  846.432228]  ? path_listxattr+0x41/0xa0
      [  846.432230]  path_listxattr+0x41/0xa0
      [  846.432233]  do_syscall_64+0x55/0x100
      [  846.432235]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  846.432237] RIP: 0033:0x7f882de1c0d7
      [  846.432237] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48
      [  846.432269] RSP: 002b:00007ffe8e66c238 EFLAGS: 00000202 ORIG_RAX: 00000000000000c2
      [  846.432271] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f882de1c0d7
      [  846.432272] RDX: 0000000000000071 RSI: 00007ffe8e66c280 RDI: 0000000001a880c0
      [  846.432273] RBP: 00007ffe8e66c300 R08: 0000000001a88010 R09: 0000000000000000
      [  846.432274] R10: 00000000000001ab R11: 0000000000000202 R12: 0000000000400550
      [  846.432275] R13: 00007ffe8e66c400 R14: 0000000000000000 R15: 0000000000000000
      [  846.432277] ---[ end trace abca54df39d14f5e ]---
      [  846.432279] F2FS-fs (loop0): invalid blkaddr: 1024, type: 5, run fsck to fix.
      [  846.432376] WARNING: CPU: 1 PID: 1249 at fs/f2fs/f2fs.h:2697 f2fs_wait_on_block_writeback+0xb1/0x110
      [  846.432376] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper
      [  846.432410] CPU: 1 PID: 1249 Comm: a.out Tainted: G        W         4.18.0-rc3+ #1
      [  846.432411] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  846.432413] RIP: 0010:f2fs_wait_on_block_writeback+0xb1/0x110
      [  846.432414] Code: 66 90 f0 ff 4b 34 74 59 5b 5d c3 48 8b 7d 00 41 b8 05 00 00 00 89 d9 48 c7 c2 d8 e8 0e 8b 48 c7 c6 1d b0 0a 8b e8 df bc fd ff <0f> 0b f0 80 4d 48 04 e9 67 ff ff ff 48 8b 03 48 c1 e8 37 83 e0 07
      [  846.432445] RSP: 0018:ffff961c414a7910 EFLAGS: 00010286
      [  846.432447] RAX: 0000000000000000 RBX: 0000000000000400 RCX: 0000000000000006
      [  846.432448] RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffff89dfffd165d0
      [  846.432449] RBP: ffff89dff5492800 R08: 0000000000000000 R09: 00000000000002d1
      [  846.432450] R10: ffff961c414a7820 R11: ffff89dfad50cf80 R12: 0000000000000400
      [  846.432451] R13: 0000000000000000 R14: ffff89dff4ff88d0 R15: 0000000000000000
      [  846.432453] FS:  00007f882e2fb700(0000) GS:ffff89dfffd00000(0000) knlGS:0000000000000000
      [  846.432454] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  846.432455] CR2: 0000000001a88008 CR3: 00000001eb572000 CR4: 00000000000006e0
      [  846.432459] Call Trace:
      [  846.432463]  f2fs_grab_read_bio+0xbc/0xe0
      [  846.432464]  f2fs_submit_page_read+0x21/0x280
      [  846.432466]  f2fs_get_read_data_page+0xb7/0x3c0
      [  846.432468]  f2fs_get_lock_data_page+0x29/0x1e0
      [  846.432470]  f2fs_get_new_data_page+0x148/0x550
      [  846.432473]  f2fs_add_regular_entry+0x1d2/0x550
      [  846.432475]  ? __switch_to+0x12f/0x460
      [  846.432477]  f2fs_add_dentry+0x6a/0xd0
      [  846.432480]  f2fs_do_add_link+0xe9/0x140
      [  846.432483]  __recover_dot_dentries+0x260/0x280
      [  846.432485]  f2fs_lookup+0x343/0x390
      [  846.432488]  __lookup_slow+0x97/0x150
      [  846.432490]  lookup_slow+0x35/0x50
      [  846.432505]  walk_component+0x1c6/0x470
      [  846.432509]  ? memcg_kmem_charge_memcg+0x70/0x90
      [  846.432511]  ? page_add_file_rmap+0x13/0x200
      [  846.432513]  path_lookupat+0x76/0x230
      [  846.432515]  ? __alloc_pages_nodemask+0xfc/0x280
      [  846.432517]  filename_lookup+0xb8/0x1a0
      [  846.432520]  ? _cond_resched+0x16/0x40
      [  846.432522]  ? kmem_cache_alloc+0x160/0x1d0
      [  846.432525]  ? path_listxattr+0x41/0xa0
      [  846.432526]  path_listxattr+0x41/0xa0
      [  846.432529]  do_syscall_64+0x55/0x100
      [  846.432531]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  846.432533] RIP: 0033:0x7f882de1c0d7
      [  846.432533] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48
      [  846.432565] RSP: 002b:00007ffe8e66c238 EFLAGS: 00000202 ORIG_RAX: 00000000000000c2
      [  846.432567] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f882de1c0d7
      [  846.432568] RDX: 0000000000000071 RSI: 00007ffe8e66c280 RDI: 0000000001a880c0
      [  846.432569] RBP: 00007ffe8e66c300 R08: 0000000001a88010 R09: 0000000000000000
      [  846.432570] R10: 00000000000001ab R11: 0000000000000202 R12: 0000000000400550
      [  846.432571] R13: 00007ffe8e66c400 R14: 0000000000000000 R15: 0000000000000000
      [  846.432573] ---[ end trace abca54df39d14f5f ]---
      [  846.434280] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [  846.434424] PGD 80000001ebd3a067 P4D 80000001ebd3a067 PUD 1eb1ae067 PMD 0
      [  846.434551] Oops: 0000 [#1] SMP PTI
      [  846.434697] CPU: 0 PID: 44 Comm: kworker/u5:0 Tainted: G        W         4.18.0-rc3+ #1
      [  846.434805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  846.435000] Workqueue: fscrypt_read_queue decrypt_work
      [  846.435174] RIP: 0010:fscrypt_do_page_crypto+0x6e/0x2d0
      [  846.435351] Code: 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 88 00 00 00 31 c0 e8 43 c2 e0 ff 49 8b 86 48 02 00 00 85 ed c7 44 24 70 00 00 00 00 <48> 8b 58 08 0f 84 14 02 00 00 48 8b 78 10 48 8b 0c 24 48 c7 84 24
      [  846.435696] RSP: 0018:ffff961c40f9bd60 EFLAGS: 00010206
      [  846.435870] RAX: 0000000000000000 RBX: ffffc5f787719b80 RCX: ffffc5f787719b80
      [  846.436051] RDX: ffffffff8b9f4b88 RSI: ffffffff8b0ae622 RDI: ffff961c40f9bdb8
      [  846.436261] RBP: 0000000000001000 R08: ffffc5f787719b80 R09: 0000000000001000
      [  846.436433] R10: 0000000000000018 R11: fefefefefefefeff R12: ffffc5f787719b80
      [  846.436562] R13: ffffc5f787719b80 R14: ffff89dff4ff88d0 R15: 0ffff89dfaddee60
      [  846.436658] FS:  0000000000000000(0000) GS:ffff89dfffc00000(0000) knlGS:0000000000000000
      [  846.436758] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  846.436898] CR2: 0000000000000008 CR3: 00000001eddd0000 CR4: 00000000000006f0
      [  846.437001] Call Trace:
      [  846.437181]  ? check_preempt_wakeup+0xf2/0x230
      [  846.437276]  ? check_preempt_curr+0x7c/0x90
      [  846.437370]  fscrypt_decrypt_page+0x48/0x4d
      [  846.437466]  __fscrypt_decrypt_bio+0x5b/0x90
      [  846.437542]  decrypt_work+0x12/0x20
      [  846.437651]  process_one_work+0x15e/0x3d0
      [  846.437740]  worker_thread+0x4c/0x440
      [  846.437848]  kthread+0xf8/0x130
      [  846.437938]  ? rescuer_thread+0x350/0x350
      [  846.438022]  ? kthread_associate_blkcg+0x90/0x90
      [  846.438117]  ret_from_fork+0x35/0x40
      [  846.438201] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd input_leds joydev soundcore serio_raw i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear qxl ttm crct10dif_pclmul crc32_pclmul drm_kms_helper ghash_clmulni_intel syscopyarea sysfillrect sysimgblt fb_sys_fops pcbc drm 8139too aesni_intel 8139cp floppy psmouse mii aes_x86_64 crypto_simd pata_acpi cryptd glue_helper
      [  846.438653] CR2: 0000000000000008
      [  846.438713] ---[ end trace abca54df39d14f60 ]---
      [  846.438796] RIP: 0010:fscrypt_do_page_crypto+0x6e/0x2d0
      [  846.438844] Code: 00 65 48 8b 04 25 28 00 00 00 48 89 84 24 88 00 00 00 31 c0 e8 43 c2 e0 ff 49 8b 86 48 02 00 00 85 ed c7 44 24 70 00 00 00 00 <48> 8b 58 08 0f 84 14 02 00 00 48 8b 78 10 48 8b 0c 24 48 c7 84 24
      [  846.439084] RSP: 0018:ffff961c40f9bd60 EFLAGS: 00010206
      [  846.439176] RAX: 0000000000000000 RBX: ffffc5f787719b80 RCX: ffffc5f787719b80
      [  846.440927] RDX: ffffffff8b9f4b88 RSI: ffffffff8b0ae622 RDI: ffff961c40f9bdb8
      [  846.442083] RBP: 0000000000001000 R08: ffffc5f787719b80 R09: 0000000000001000
      [  846.443284] R10: 0000000000000018 R11: fefefefefefefeff R12: ffffc5f787719b80
      [  846.444448] R13: ffffc5f787719b80 R14: ffff89dff4ff88d0 R15: 0ffff89dfaddee60
      [  846.445558] FS:  0000000000000000(0000) GS:ffff89dfffc00000(0000) knlGS:0000000000000000
      [  846.446687] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  846.447796] CR2: 0000000000000008 CR3: 00000001eddd0000 CR4: 00000000000006f0
      
      - Location
      https://elixir.bootlin.com/linux/v4.18-rc4/source/fs/crypto/crypto.c#L149
      	struct crypto_skcipher *tfm = ci->ci_ctfm;
      Here ci can be NULL
      
      Note that this issue maybe require CONFIG_F2FS_FS_ENCRYPTION=y to reproduce.
      
      Reported-by Wen Xu <wen.xu@gatech.edu>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      91291e99
  19. 11 8月, 2018 2 次提交
    • C
      f2fs: fix to avoid broken of dnode block list · 50fa53ec
      Chao Yu 提交于
      f2fs recovery flow is relying on dnode block link list, it means fsynced
      file recovery depends on previous dnode's persistence in the list, so
      during fsync() we should wait on all regular inode's dnode writebacked
      before issuing flush.
      
      By this way, we can avoid dnode block list being broken by out-of-order
      IO submission due to IO scheduler or driver.
      
      Sheng Yong helps to do the test with this patch:
      
      Target:/data (f2fs, -)
      64MB / 32768KB / 4KB / 8
      
      1 / PERSIST / Index
      
      Base:
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	867.82		204.15		41440.03	41370.54	680.8		1025.94		1031.08
      2	871.87		205.87		41370.3		40275.2		791.14		1065.84		1101.7
      3	866.52		205.69		41795.67	40596.16	694.69		1037.16		1031.48
      Avg	868.7366667	205.2366667	41535.33333	40747.3		722.21		1042.98		1054.753333
      
      After:
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	798.81		202.5		41143		40613.87	602.71		838.08		913.83
      2	805.79		206.47		40297.2		41291.46	604.44		840.75		924.27
      3	814.83		206.17		41209.57	40453.62	602.85		834.66		927.91
      Avg	806.4766667	205.0466667	40883.25667	40786.31667	603.3333333	837.83		922.0033333
      
      Patched/Original:
      	0.928332713	0.999074239	0.984300676	1.000957528	0.835398753	0.803303994	0.874141189
      
      It looks like atomic write will suffer performance regression.
      
      I suspect that the criminal is that we forcing to wait all dnode being in
      storage cache before we issue PREFLUSH+FUA.
      
      BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
      cause the problem: we will lose data of last transaction after SPO, even if
      atomic write return no error:
      
      - atomic_open();
      - write() P1, P2, P3;
      - atomic_commit();
       - writeback data: P1, P2, P3;
       - writeback node: N1, N2, N3;  <--- If N1, N2 is not writebacked, N3 with fsync_mark is
      writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
      last transaction.
       - preflush + fua;
      - power-cut
      
      If we don't wait dnode writeback for atomic_write:
      
      	SEQ-RD(MB/s)	SEQ-WR(MB/s)	RND-RD(IOPS)	RND-WR(IOPS)	Insert(TPS)	Update(TPS)	Delete(TPS)
      1	779.91		206.03		41621.5		40333.16	716.9		1038.21		1034.85
      2	848.51		204.35		40082.44	39486.17	791.83		1119.96		1083.77
      3	772.12		206.27		41335.25	41599.65	723.29		1055.07		971.92
      Avg	800.18		205.55		41013.06333	40472.99333	744.0066667	1071.08		1030.18
      
      Patched/Original:
      	0.92108464	1.001526693	0.987425886	0.993268102	1.030180511	1.026942031	0.976702294
      
      SQLite's performance recovers.
      
      Jaegeuk:
      "Practically, I don't see db corruption becase of this. We can excuse to lose
      the last transaction."
      
      Finally, we decide to keep original implementation of atomic write interface
      sematics that we don't wait all dnode writeback before preflush+fua submission.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      50fa53ec
    • C
      f2fs: fix to clear PG_checked flag in set_page_dirty() · 66110abc
      Chao Yu 提交于
      PG_checked flag will be set on data page during GC, later, we can
      recognize such page by the flag and migrate page to cold segment.
      
      But previously, we don't clear this flag when invalidating data page,
      after page redirtying, we will write it into wrong log.
      
      Let's clear PG_checked flag in set_page_dirty() to avoid this.
      Signed-off-by: NWeichao Guo <guoweichao@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      66110abc
  20. 02 8月, 2018 1 次提交
    • J
      f2fs: don't allow any writes on aborted atomic writes · 455e3a58
      Jaegeuk Kim 提交于
      In order to prevent abusing atomic writes by abnormal users, we've added a
      threshold, 20% over memory footprint, which disallows further atomic writes.
      Previously, however, SQLite doesn't know the files became normal, so that
      it could write stale data and commit on revoked normal database file.
      
      Once f2fs detects such the abnormal behavior, this patch tries to avoid further
      writes in write_begin().
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      455e3a58