1. 26 11月, 2016 1 次提交
    • J
      f2fs: support multiple devices · 3c62be17
      Jaegeuk Kim 提交于
      This patch implements multiple devices support for f2fs.
      Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big
      volume under one f2fs instance.
      
      Internal block management is very simple, but we will modify block allocation
      and background GC policy to boost IO speed by exploiting them accoording to
      each device speed.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3c62be17
  2. 24 11月, 2016 12 次提交
    • J
      f2fs: remove checkpoint in f2fs_freeze · b4b9d34c
      Jaegeuk Kim 提交于
      The generic freeze_super() calls sync_filesystems() before f2fs_freeze().
      So, basically we don't need to do checkpoint in f2fs_freeze(). But, in xfs/068,
      it triggers circular locking problem below due to gc_mutex for checkpoint.
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      4.9.0-rc1+ #132 Tainted: G           OE
      -------------------------------------------------------
      
      1. wait for __sb_start_write() by
      
       [<ffffffff9845f353>] dump_stack+0x85/0xc2
       [<ffffffff980e80bf>] print_circular_bug+0x1cf/0x230
       [<ffffffff980eb4d0>] __lock_acquire+0x19e0/0x1bc0
       [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220
       [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs]
       [<ffffffff9826bdd0>] __sb_start_write+0x130/0x200
       [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs]
       [<ffffffffc08c7c3b>] f2fs_drop_inode+0x9b/0x160 [f2fs]
       [<ffffffff98289991>] iput+0x171/0x2c0
       [<ffffffffc08cfccf>] f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
       [<ffffffffc08cfe04>] block_operations+0x84/0x110 [f2fs]
       [<ffffffffc08cff78>] write_checkpoint+0xe8/0xf20 [f2fs]
       [<ffffffff980e979d>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
       [<ffffffff9803e9d9>] ? sched_clock+0x9/0x10
       [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
       [<ffffffffc08c6df5>] f2fs_sync_fs+0x85/0x190 [f2fs]
       [<ffffffff982a4f90>] ? do_fsync+0x70/0x70
       [<ffffffff982a4f90>] ? do_fsync+0x70/0x70
       [<ffffffff982a4fb0>] sync_fs_one_sb+0x20/0x30
       [<ffffffff9826ca3e>] iterate_supers+0xae/0x100
       [<ffffffff982a50b5>] sys_sync+0x55/0x90
       [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6
      
      2. wait for sbi->gc_mutex by
      
       [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220
       [<ffffffff989063d6>] mutex_lock_nested+0x76/0x3f0
       [<ffffffffc08c6de9>] f2fs_sync_fs+0x79/0x190 [f2fs]
       [<ffffffffc08c7a6c>] f2fs_freeze+0x1c/0x20 [f2fs]
       [<ffffffff9826b6ef>] freeze_super+0xcf/0x190
       [<ffffffff9827eebc>] do_vfs_ioctl+0x53c/0x6a0
       [<ffffffff9827f099>] SyS_ioctl+0x79/0x90
       [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b4b9d34c
    • D
      f2fs: Cache zoned block devices zone type · 178053e2
      Damien Le Moal 提交于
      With the zoned block device feature enabled, section discard
      need to do a zone reset for sections contained in sequential
      zones, and a regular discard (if supported) for sections
      stored in conventional zones. Avoid the need for a costly
      report zones to obtain a section zone type when discarding it
      by caching the types of the device zones in the super block
      information. This cache is initialized at mount time for mounts
      with the zoned block device feature enabled.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      178053e2
    • D
      f2fs: Do not allow adaptive mode for host-managed zoned block devices · 3adc57e9
      Damien Le Moal 提交于
      The LFS mode is mandatory for host-managed zoned block devices as
      update in place optimizations are not possible for segments in
      sequential zones.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3adc57e9
    • D
      f2fs: Always enable discard for zoned blocks devices · 96ba2dec
      Damien Le Moal 提交于
      Zone write pointer reset acts as discard for zoned block
      devices. So if the zoned block device feature is enabled,
      always declare that discard is enabled, even if the device
      does not actually support the command.
      For the same reason, prevent the use the "nodicard" mount
      option.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      96ba2dec
    • D
      f2fs: Suppress discard warning message for zoned block devices · 0ab02998
      Damien Le Moal 提交于
      For zoned block devices, discard is replaced by zone reset. So
      do not warn if the device does not supports discard.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0ab02998
    • D
      f2fs: Check zoned block feature for host-managed zoned block devices · d1b959c8
      Damien Le Moal 提交于
      The F2FS_FEATURE_BLKZONED feature indicates that the drive was formatted
       with zone alignment optimization. This is optional for host-aware
      devices, but mandatory for host-managed zoned block devices.
      So check that the feature is set in this latter case.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d1b959c8
    • D
      f2fs: Use generic zoned block device terminology · 0bfd7a09
      Damien Le Moal 提交于
      SMR stands for "Shingled Magnetic Recording" which makes sense
      only for hard disk drives (spinning rust). The ZBC/ZAC standards
      enable management of SMR disks, but solid state drives may also
      support those standards. So rename the HMSMR feature to BLKZONED
      to avoid a HDD centric terminology. For the same reason, rename
      f2fs_sb_mounted_hmsmr to f2fs_sb_mounted_blkzoned.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0bfd7a09
    • D
      f2fs: Add missing break in switch-case · 487df616
      Damien Le Moal 提交于
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      487df616
    • J
      f2fs: avoid infinite loop in the EIO case on recover_orphan_inodes · 09922800
      Jaegeuk Kim 提交于
      This patch should fix an infinite loop case below.
      
      F2FS-fs : inject IO error in f2fs_read_end_io+0xf3/0x120 [f2fs]
      F2FS-fs (nvme0n1p1): recover_orphan_inode: orphan failed (ino=39ac1a), run fsck to fix.
      ...
      [<ffffffffc0b11ede>] sync_meta_pages+0xae/0x270 [f2fs]
      [<ffffffffc0b288dd>] ? flush_sit_entries+0x8d/0x960 [f2fs]
      [<ffffffffc0b13801>] write_checkpoint+0x361/0xf20 [f2fs]
      [<ffffffffb40e979d>] ? trace_hardirqs_on+0xd/0x10
      [<ffffffffc0b0a199>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
      [<ffffffffc0b0a1a5>] f2fs_sync_fs+0x85/0x190 [f2fs]
      [<ffffffffc0b2560e>] f2fs_balance_fs_bg+0x7e/0x1c0 [f2fs]
      [<ffffffffc0b216c4>] f2fs_write_node_pages+0x34/0x320 [f2fs]
      [<ffffffffb41dff21>] do_writepages+0x21/0x30
      [<ffffffffb429edb1>] __writeback_single_inode+0x61/0x760
      [<ffffffffb490a937>] ? _raw_spin_unlock+0x27/0x40
      [<ffffffffb42a0805>] writeback_single_inode+0xd5/0x190
      [<ffffffffb42a0959>] write_inode_now+0x99/0xc0
      [<ffffffffb4289a16>] iput+0x1f6/0x2c0
      [<ffffffffc0b0e3be>] f2fs_fill_super+0xe0e/0x1300 [f2fs]
      [<ffffffffb426c394>] ? sget_userns+0x4f4/0x530
      [<ffffffffb426c692>] mount_bdev+0x182/0x1b0
      [<ffffffffc0b0d5b0>] ? f2fs_commit_super+0x100/0x100 [f2fs]
      [<ffffffffc0b0a375>] f2fs_mount+0x15/0x20 [f2fs]
      [<ffffffffb426d038>] mount_fs+0x38/0x170
      [<ffffffffb428ec9b>] vfs_kern_mount+0x6b/0x160
      [<ffffffffb4291d9e>] do_mount+0x1be/0xd60
      [<ffffffffb4291a57>] ? copy_mount_options+0xb7/0x220
      [<ffffffffb4292c54>] SyS_mount+0x94/0xd0
      [<ffffffffb490b345>] entry_SYSCALL_64_fastpath+0x23/0xc6
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      09922800
    • J
      f2fs: remove percpu_count due to performance regression · 35782b23
      Jaegeuk Kim 提交于
      This patch removes percpu_count usage due to performance regression in iozone.
      
      Fixes: 523be8a6 ("f2fs: use percpu_counter for page counters")
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      35782b23
    • J
      f2fs: keep dirty inodes selectively for checkpoint · 7c45729a
      Jaegeuk Kim 提交于
      This is to avoid no free segment bug during checkpoint caused by a number of
      dirty inodes.
      
      The case was reported by Chao like this.
      1. mount with lazytime option
      2. fill 4k file until disk is full
      3. sync filesystem
      4. read all files in the image
      5. umount
      
      In this case, we actually don't need to flush dirty inode to inode page during
      checkpoint.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7c45729a
    • C
      f2fs: fix to release discard entries during checkpoint · 2dd15654
      Chao Yu 提交于
      In f2fs_fill_super, if there is any IO error occurs during recovery,
      cached discard entries will be leaked, in order to avoid this, make
      write_checkpoint() handle memory release by itself, besides, move
      clear_prefree_segments to write_checkpoint for readability.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2dd15654
  3. 01 10月, 2016 9 次提交
    • C
      f2fs: support checkpoint error injection · 0f348028
      Chao Yu 提交于
      This patch adds to support checkpoint error injection in f2fs for testing
      fatal error tolerance, it will be useful that it can simulate abnormal
      power off by f2fs itself instead of calling godown ioctl by running apps.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0f348028
    • C
      f2fs: fix to recover old fault injection config in ->remount_fs · 2443b8b3
      Chao Yu 提交于
      In ->remount_fs, we didn't recover original fault injection config if
      we encounter error, fix it.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2443b8b3
    • C
      f2fs: do fault injection initialization in default_options · 36dbd328
      Chao Yu 提交于
      Do fault injection initialization in default_options to keep consistent
      with other default option configurating.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      36dbd328
    • C
      f2fs: support configuring fault injection per superblock · 1ecc0c5c
      Chao Yu 提交于
      Previously, we only support global fault injection configuration, so that
      when we configure type/rate of fault injection through sysfs, mount
      option, it will influence all f2fs partition which is being used.
      
      It is not make sence, since it will be not convenient if developer want
      to test separated partitions with different fault injection rate/type
      simultaneously, also it's not possible to enable fault injection in one
      partition and disable fault injection in other one.
      
      >From now on, we move global configuration of fault injection in module
      into per-superblock, hence injection testing can be more flexible.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1ecc0c5c
    • C
      f2fs: adjust display format of segment bit · d32853de
      Chao Yu 提交于
      Just adjust segment bit info printed in procfs.
      
      Before:
      1008      5|0  |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      1009      3|183|0 0 61 20 20 0 0 21 80 c0 2 e4 e 54 0 21 21 17 a 44 d0 28 e4 50 40 30 8 0 2d 32 0 5 b0 80 1 43 2 8e f8 7b 2 25 93 bf e0 73 8e 9a 19 44 60 ff e4 cc e6 8e bf f9 ff 5 3d 31 3d 13
      1010      3|1  |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      
      After:
      1008      5|0  | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      1009      4|434| ff 7d ff bf d9 3f ff e7 ff bf d7 bf ff bb be ff fb df f7 fb fa bf fb fe bb df dd ff fe ef ff fe ef e2 27 bf ab bf fb df fd bd bf fb db fc ff ff 3f ff ff bf ff 5f db 3f fb fb bf fb bf 4f ff ef
      1010      4|422| ff bb fe ff ef d7 ee ff ff fc bf ef 7d eb ec fd fb 3f 97 7f ef ff af ff db ff ff 69 bf ff f6 e7 ff fb f7 7b fb df be ff ff ef f3 fe ff ff df fe f7 fa ff b7 77 be fe fb a9 7f 87 a2 ac c7 ff 75
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d32853de
    • J
      f2fs: remove dirty inode pages in error path · bb5dada7
      Jaegeuk Kim 提交于
      When getting EIO while handling orphan inodes, we can get some dirty node
      pages. Then, f2fs_write_node_pages() called by iput(node_inode) will try
      to flush node pages. But in this case, we should prevent to do that, since
      we will try again from the start.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bb5dada7
    • J
      f2fs: handle errors during recover_orphan_inodes · d41065e2
      Jaegeuk Kim 提交于
      This patch fixes to handle EIO during recover_orphan_inode() given the below
      panic.
      
      F2FS-fs : inject IO error in f2fs_read_end_io+0xe6/0x100 [f2fs]
      ------------[ cut here ]------------
      RIP: 0010:[<ffffffffc0b244e3>]  [<ffffffffc0b244e3>] f2fs_evict_inode+0x433/0x470 [f2fs]
      RSP: 0018:ffff92f8b7fb7c30  EFLAGS: 00010246
      RAX: ffff92fb88a13500 RBX: ffff92f890566ea0 RCX: 00000000fd3c255c
      RDX: 0000000000000001 RSI: ffff92fb88a13d90 RDI: ffff92fb8ee127e8
      RBP: ffff92f8b7fb7c58 R08: 0000000000000001 R09: ffff92fb88a13d58
      R10: 000000005a6a9373 R11: 0000000000000001 R12: 00000000fffffffb
      R13: ffff92fb8ee12000 R14: 00000000000034ca R15: ffff92fb8ee12620
      FS:  00007f1fefd8e880(0000) GS:ffff92fb95600000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fc211d34cdb CR3: 000000012d43a000 CR4: 00000000001406e0
      Stack:
       ffff92f890566ea0 ffff92f890567078 ffffffffc0b5a0c0 ffff92f890566f28
       ffff92fb888b2000 ffff92f8b7fb7c80 ffffffffbc27ff55 ffff92f890566ea0
       ffff92fb8bf10000 ffffffffc0b5a0c0 ffff92f8b7fb7cb0 ffffffffbc28090d
      Call Trace:
       [<ffffffffbc27ff55>] evict+0xc5/0x1a0
       [<ffffffffbc28090d>] iput+0x1ad/0x2c0
       [<ffffffffc0b3304c>] recover_orphan_inodes+0x10c/0x2e0 [f2fs]
       [<ffffffffc0b2e0f4>] f2fs_fill_super+0x884/0x1150 [f2fs]
       [<ffffffffbc2644ac>] mount_bdev+0x18c/0x1c0
       [<ffffffffc0b2d870>] ? f2fs_commit_super+0x100/0x100 [f2fs]
       [<ffffffffc0b2a755>] f2fs_mount+0x15/0x20 [f2fs]
       [<ffffffffbc264e49>] mount_fs+0x39/0x170
       [<ffffffffbc28555b>] vfs_kern_mount+0x6b/0x160
       [<ffffffffbc2881df>] do_mount+0x1cf/0xd00
       [<ffffffffbc287f2c>] ? copy_mount_options+0xac/0x170
       [<ffffffffbc289003>] SyS_mount+0x83/0xd0
       [<ffffffffbc8ee880>] entry_SYSCALL_64_fastpath+0x23/0xc1
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d41065e2
    • C
      f2fs: introduce cp_lock to protect updating of ckpt_flags · aaec2b1d
      Chao Yu 提交于
      This patch introduces spinlock to protect updating process of ckpt_flags
      field in struct f2fs_checkpoint, it avoids incorrectly updating in race
      condition.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: add __is_set_ckpt_flags likewise __set_ckpt_flags]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      aaec2b1d
    • J
      f2fs: use crc and cp version to determine roll-forward recovery · a468f0ef
      Jaegeuk Kim 提交于
      Previously, we used cp_version only to detect recoverable dnodes.
      In order to avoid same garbage cp_version, we needed to truncate the next
      dnode during checkpoint, resulting in additional discard or data write.
      If we can distinguish this by using crc in addition to cp_version, we can
      remove this overhead.
      
      There is backward compatibility concern where it changes node_footer layout.
      So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to
      detect new layout. New layout will be activated only when this flag is set.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a468f0ef
  4. 23 9月, 2016 1 次提交
  5. 30 8月, 2016 1 次提交
  6. 19 8月, 2016 1 次提交
  7. 16 7月, 2016 1 次提交
    • C
      f2fs: fix to avoid data update racing between GC and DIO · 82e0a5aa
      Chao Yu 提交于
      Datas in file can be operated by GC and DIO simultaneously, so we will
      face race case as below:
      
      For write case:
      Thread A				Thread B
      - generic_file_direct_write
       - invalidate_inode_pages2_range
       - f2fs_direct_IO
        - do_blockdev_direct_IO
         - do_direct_IO
          - get_more_blocks
      					- f2fs_gc
      					 - do_garbage_collect
      					  - gc_data_segment
      					   - move_data_page
      					    - do_write_data_page
      					    migrate data block to new block address
         - dio_bio_submit
         update user data to old block address
      
      For read case:
      Thread A                                Thread B
      - generic_file_direct_write
       - invalidate_inode_pages2_range
       - f2fs_direct_IO
        - do_blockdev_direct_IO
         - do_direct_IO
          - get_more_blocks
      					- f2fs_balance_fs
      					 - f2fs_gc
      					  - do_garbage_collect
      					   - gc_data_segment
      					    - move_data_page
      					     - do_write_data_page
      					     migrate data block to new block address
      					  - write_checkpoint
      					   - do_checkpoint
      					    - clear_prefree_segments
      					     - f2fs_issue_discard
                                                   discard old block adress
         - dio_bio_submit
         update user buffer from obsolete block address
      
      In order to fix this, for one file, we should let DIO and GC getting exclusion
      against with each other.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      82e0a5aa
  8. 09 7月, 2016 4 次提交
  9. 07 7月, 2016 2 次提交
  10. 14 6月, 2016 1 次提交
  11. 09 6月, 2016 1 次提交
  12. 08 6月, 2016 2 次提交
  13. 03 6月, 2016 4 次提交