1. 06 4月, 2019 3 次提交
    • D
      f2fs: improve discard handling with multi-device volumes · 7f3d7719
      Damien Le Moal 提交于
      f2fs_hw_support_discard() only tests if the super block device supports
      discard. However, for a multi-device volume, not all disks used may
      support discard. Improve the check performed to test all devices of
      the volume and report discard as supported if at least one device of
      the volume supports discard. To implement this, introduce the helper
      function f2fs_bdev_support_discard(), which returns true for zoned block
      devices (where discard is processed as a zone reset) and for regular
      disks supporting the discard command.
      
      f2fs_bdev_support_discard() is also used in __queue_discard_cmd() to
      handle discard command issuing for a particular device of the volume.
      That is, prevent issuing a discard command for block devices that do
      not support it.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7f3d7719
    • D
      f2fs: Reduce zoned block device memory usage · 95175daf
      Damien Le Moal 提交于
      For zoned block devices, an array of zone types for each device is
      allocated and initialized in order to determine if a section is stored
      on a sequential zone (zone reset needed) or a conventional zone (no
      zone reset needed and regular discard applies). Considering this usage,
      the zone types stored in memory can be replaced with a bitmap to
      indicate an equivalent information, that is, if a zone is sequential or
      not. This reduces the memory usage for each zoned device by roughly 8:
      on a 14TB disk with zones of 256 MB, the zone type array consumes
      13x4KB pages while the bitmap uses only 2x4KB pages.
      
      This patch changes the f2fs_dev_info structure blkz_type field to the
      bitmap blkz_seq. Access to this bitmap is done using the helper
      function f2fs_blkz_is_seq(), which is a rewrite of the function
      get_blkz_type().
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      95175daf
    • D
      f2fs: Fix use of number of devices · 0916878d
      Damien Le Moal 提交于
      For a single device mount using a zoned block device, the zone
      information for the device is stored in the sbi->devs single entry
      array and sbi->s_ndevs is set to 1. This differs from a single device
      mount using a regular block device which does not allocate sbi->devs
      and sets sbi->s_ndevs to 0.
      
      However, sbi->s_devs == 0 condition is used throughout the code to
      differentiate a single device mount from a multi-device mount where
      sbi->s_ndevs is always larger than 1. This results in problems with
      single zoned block device volumes as these are treated as multi-device
      mounts but do not have the start_blk and end_blk information set. One
      of the problem observed is skipping of zone discard issuing resulting in
      write commands being issued to full zones or unaligned to a zone write
      pointer.
      
      Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single
      regular block device mount) and sbi->s_ndevs == 1 (single zoned block
      device mount) in the same manner. This is done by introducing the
      helper function f2fs_is_multi_device() and using this helper in place
      of direct tests of sbi->s_ndevs value, improving code readability.
      
      Fixes: 7bb3a371 ("f2fs: Fix zoned block device support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0916878d
  2. 13 3月, 2019 3 次提交
  3. 06 3月, 2019 2 次提交
    • C
      f2fs: fix to check inline_xattr_size boundary correctly · 500e0b28
      Chao Yu 提交于
      We use below condition to check inline_xattr_size boundary:
      
      	if (!F2FS_OPTION(sbi).inline_xattr_size ||
      		F2FS_OPTION(sbi).inline_xattr_size >=
      				DEF_ADDRS_PER_INODE -
      				F2FS_TOTAL_EXTRA_ATTR_SIZE -
      				DEF_INLINE_RESERVED_SIZE -
      				DEF_MIN_INLINE_SIZE)
      
      There is there problems in that check:
      - we should allow inline_xattr_size equaling to min size of inline
      {data,dentry} area.
      - F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
      different size unit, previous one is 4 bytes, latter one is 1 bytes.
      - DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
      however, we need to consider min size of inline dentry area as well,
      minimal inline dentry should at least contain two entries: '.' and
      '..', so that min inline_dentry size is 40 bytes.
      
      .bitmap		1 * 1 = 1
      .reserved	1 * 1 = 1
      .dentry		11 * 2 = 22
      .filename	8 * 2 = 16
      total		40
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      500e0b28
    • C
      f2fs: fix potential data inconsistence of checkpoint · c42d28ce
      Chao Yu 提交于
      Previously, we changed lock from cp_rwsem to node_change, it solved
      the deadlock issue which was caused by below race condition:
      
      Thread A			Thread B
      - f2fs_setattr
       - f2fs_lock_op  -- read_lock
       - dquot_transfer
        - __dquot_transfer
         - dquot_acquire
          - commit_dqblk
           - f2fs_quota_write
            - f2fs_write_begin
             - f2fs_write_failed
      				- write_checkpoint
      				 - block_operations
      				  - f2fs_lock_all  -- write_lock
              - f2fs_truncate_blocks
               - f2fs_lock_op  -- read_lock
      
      But it breaks the sematics of cp_rwsem, in other callers like:
      - f2fs_file_write_iter -> f2fs_write_begin -> f2fs_write_failed
      - f2fs_direct_IO -> f2fs_write_failed
      
      We allow to truncate dnode w/o cp_rwsem held, result in incorrect sit
      bitmap update, which can cause further data corruption.
      
      So this patch reverts previous fix implementation, and try to fix
      deadlock by skipping calling f2fs_truncate_blocks() in f2fs_write_failed()
      only for quota file, and keep the preallocated data/node in the tail of
      quota file, we can expecte that the preallocated space can be used to
      store quota info latter soon.
      
      Fixes: af033b2a ("f2fs: guarantee journalled quota data by checkpoint")
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c42d28ce
  4. 16 2月, 2019 2 次提交
  5. 05 2月, 2019 1 次提交
  6. 24 1月, 2019 2 次提交
  7. 23 1月, 2019 2 次提交
  8. 22 1月, 2019 1 次提交
  9. 09 1月, 2019 2 次提交
  10. 27 12月, 2018 6 次提交
    • C
      f2fs: check PageWriteback flag for ordered case · bae0ee7a
      Chao Yu 提交于
      For all ordered cases in f2fs_wait_on_page_writeback(), we need to
      check PageWriteback status, so let's clean up to relocate the check
      into f2fs_wait_on_page_writeback().
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bae0ee7a
    • C
      f2fs: clean up structure extent_node · c0362117
      Chao Yu 提交于
      The union in struct extent_node wass only to indicate below fields
      
      	struct rb_node rb_node;
      	union {
      		struct {
      			unsigned int fofs;
      			unsigned int len;
      		...
      	...
      
      can be parsed as fields in struct rb_entry, but they were never be
      used explicitly before, so let's remove them for cleanup.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c0362117
    • S
      f2fs: fix sbi->extent_list corruption issue · e4589fa5
      Sahitya Tummala 提交于
      When there is a failure in f2fs_fill_super() after/during
      the recovery of fsync'd nodes, it frees the current sbi and
      retries again. This time the mount is successful, but the files
      that got recovered before retry, still holds the extent tree,
      whose extent nodes list is corrupted since sbi and sbi->extent_list
      is freed up. The list_del corruption issue is observed when the
      file system is getting unmounted and when those recoverd files extent
      node is being freed up in the below context.
      
      list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
      <...>
      kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
      lr : __list_del_entry_valid+0x94/0xb4
      pc : __list_del_entry_valid+0x94/0xb4
      <...>
      Call trace:
      __list_del_entry_valid+0x94/0xb4
      __release_extent_node+0xb0/0x114
      __free_extent_tree+0x58/0x7c
      f2fs_shrink_extent_tree+0xdc/0x3b0
      f2fs_leave_shrinker+0x28/0x7c
      f2fs_put_super+0xfc/0x1e0
      generic_shutdown_super+0x70/0xf4
      kill_block_super+0x2c/0x5c
      kill_f2fs_super+0x44/0x50
      deactivate_locked_super+0x60/0x8c
      deactivate_super+0x68/0x74
      cleanup_mnt+0x40/0x78
      __cleanup_mnt+0x1c/0x28
      task_work_run+0x48/0xd0
      do_notify_resume+0x678/0xe98
      work_pending+0x8/0x14
      
      Fix this by not creating extents for those recovered files if shrinker is
      not registered yet. Once mount is successful and shrinker is registered,
      those files can have extents again.
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e4589fa5
    • J
      f2fs: correct wrong spelling, issing_* · 72691af6
      Jaegeuk Kim 提交于
      Let's use "queued" instead of "issuing".
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      72691af6
    • J
      f2fs: use kvmalloc, if kmalloc is failed · 5222595d
      Jaegeuk Kim 提交于
      One report says memalloc failure during mount.
      
       (unwind_backtrace) from [<c010cd4c>] (show_stack+0x10/0x14)
       (show_stack) from [<c049c6b8>] (dump_stack+0x8c/0xa0)
       (dump_stack) from [<c024fcf0>] (warn_alloc+0xc4/0x160)
       (warn_alloc) from [<c0250218>] (__alloc_pages_nodemask+0x3f4/0x10d0)
       (__alloc_pages_nodemask) from [<c0270450>] (kmalloc_order_trace+0x2c/0x120)
       (kmalloc_order_trace) from [<c03fa748>] (build_node_manager+0x35c/0x688)
       (build_node_manager) from [<c03de494>] (f2fs_fill_super+0xf0c/0x16cc)
       (f2fs_fill_super) from [<c02a5864>] (mount_bdev+0x15c/0x188)
       (mount_bdev) from [<c03da624>] (f2fs_mount+0x18/0x20)
       (f2fs_mount) from [<c02a68b8>] (mount_fs+0x158/0x19c)
       (mount_fs) from [<c02c3c9c>] (vfs_kern_mount+0x78/0x134)
       (vfs_kern_mount) from [<c02c76ac>] (do_mount+0x474/0xca4)
       (do_mount) from [<c02c8264>] (SyS_mount+0x94/0xbc)
       (SyS_mount) from [<c0108180>] (ret_fast_syscall+0x0/0x48)
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5222595d
    • Y
      f2fs: remove redundant comment of unused wio_mutex · af56b487
      Yunlong Song 提交于
      Commit 089842de ("f2fs: remove codes of unused wio_mutex") removes codes
      of unused wio_mutex, but missing the comment, so delete it.
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      af56b487
  11. 14 12月, 2018 1 次提交
  12. 28 11月, 2018 1 次提交
  13. 27 11月, 2018 9 次提交
  14. 23 10月, 2018 5 次提交
    • C
      f2fs: fix to keep project quota consistent · 78130819
      Chao Yu 提交于
      This patch does below changes to keep consistence of project quota data
      in sudden power-cut case:
      - update inode.i_projid and project quota atomically under lock_op() in
      f2fs_ioc_setproject()
      - recover inode.i_projid and project quota in recover_inode()
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      78130819
    • C
      f2fs: guarantee journalled quota data by checkpoint · af033b2a
      Chao Yu 提交于
      For journalled quota mode, let checkpoint to flush dquot dirty data
      and quota file data to guarntee persistence of all quota sysfile in
      last checkpoint, by this way, we can avoid corrupting quota sysfile
      when encountering SPO.
      
      The implementation is as below:
      
      1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
      cached dquot metadata changes in quota subsystem, and later checkpoint
      should:
       a) flush dquot metadata into quota file.
       b) flush quota file to storage to keep file usage be consistent.
      
      2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
      operation failed due to -EIO or -ENOSPC, so later,
       a) checkpoint will skip syncing dquot metadata.
       b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
          hint for fsck repairing.
      
      3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
      data updating is very heavy, it may cause hungtask in block_operation().
      To avoid this, if our retry time exceed threshold, let's just skip
      flushing and retry in next checkpoint().
      Signed-off-by: NWeichao Guo <guoweichao@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: avoid warnings and set fsck flag]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      af033b2a
    • S
      f2fs: fix data corruption issue with hardware encryption · 1e78e8bd
      Sahitya Tummala 提交于
      Direct IO can be used in case of hardware encryption. The following
      scenario results into data corruption issue in this path -
      
      Thread A -                          Thread B-
      -> write file#1 in direct IO
                                          -> GC gets kicked in
                                          -> GC submitted bio on meta mapping
      				       for file#1, but pending completion
      -> write file#1 again with new data
         in direct IO
                                          -> GC bio gets completed now
                                          -> GC writes old data to the new
                                             location and thus file#1 is
      				       corrupted.
      
      Fix this by submitting and waiting for pending io on meta mapping
      for direct IO case in f2fs_map_blocks().
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1e78e8bd
    • C
      f2fs: spread f2fs_set_inode_flags() · 9149a5eb
      Chao Yu 提交于
      This patch changes codes as below:
      - use f2fs_set_inode_flags() to update i_flags atomically to avoid
      potential race.
      - synchronize F2FS_I(inode)->i_flags to inode->i_flags in
      f2fs_new_inode().
      - use f2fs_set_inode_flags() to simply codes in f2fs_quota_{on,off}.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      9149a5eb
    • J
      f2fs: account read IOs and use IO counts for is_idle · 5f9abab4
      Jaegeuk Kim 提交于
      This patch adds issued read IO counts which is under block layer.
      
      Chao modified a bit, since:
      
      Below race can cause reversed reference on F2FS_RD_DATA, there is
      the same issue in f2fs_submit_page_bio(), fix them by relocate
      __submit_bio() and inc_page_count.
      
      Thread A			Thread B
      - f2fs_write_begin
       - f2fs_submit_page_read
       - __submit_bio
      				- f2fs_read_end_io
      				 - __read_end_io
      				 - dec_page_count(, F2FS_RD_DATA)
       - inc_page_count(, F2FS_RD_DATA)
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5f9abab4