1. 12 9月, 2020 3 次提交
    • D
      f2fs: change i_compr_blocks of inode to atomic value · c2759eba
      Daeho Jeong 提交于
      writepages() can be concurrently invoked for the same file by different
      threads such as a thread fsyncing the file and a kworker kernel thread.
      So, changing i_compr_blocks without protection is racy and we need to
      protect it by changing it with atomic type value. Plus, we don't need
      a 64bit value for i_compr_blocks, so just we will use a atomic value,
      not atomic64.
      Signed-off-by: NDaeho Jeong <daehojeong@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c2759eba
    • C
      f2fs: ignore compress mount option on image w/o compression feature · 69c0dd29
      Chao Yu 提交于
      to keep consistent with behavior when passing compress mount option
      to kernel w/o compression feature, so that mount may not fail on
      such condition.
      Reported-by: NKyungmin Park <kyungmin.park@samsung.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      69c0dd29
    • C
      f2fs: support age threshold based garbage collection · 093749e2
      Chao Yu 提交于
      There are several issues in current background GC algorithm:
      - valid blocks is one of key factors during cost overhead calculation,
      so if segment has less valid block, however even its age is young or
      it locates hot segment, CB algorithm will still choose the segment as
      victim, it's not appropriate.
      - GCed data/node will go to existing logs, no matter in-there datas'
      update frequency is the same or not, it may mix hot and cold data
      again.
      - GC alloctor mainly use LFS type segment, it will cost free segment
      more quickly.
      
      This patch introduces a new algorithm named age threshold based
      garbage collection to solve above issues, there are three steps
      mainly:
      
      1. select a source victim:
      - set an age threshold, and select candidates beased threshold:
      e.g.
       0 means youngest, 100 means oldest, if we set age threshold to 80
       then select dirty segments which has age in range of [80, 100] as
       candiddates;
      - set candidate_ratio threshold, and select candidates based the
      ratio, so that we can shrink candidates to those oldest segments;
      - select target segment with fewest valid blocks in order to
      migrate blocks with minimum cost;
      
      2. select a target victim:
      - select candidates beased age threshold;
      - set candidate_radius threshold, search candidates whose age is
      around source victims, searching radius should less than the
      radius threshold.
      - select target segment with most valid blocks in order to avoid
      migrating current target segment.
      
      3. merge valid blocks from source victim into target victim with
      SSR alloctor.
      
      Test steps:
      - create 160 dirty segments:
       * half of them have 128 valid blocks per segment
       * left of them have 384 valid blocks per segment
      - run background GC
      
      Benefit: GC count and block movement count both decrease obviously:
      
      - Before:
        - Valid: 86
        - Dirty: 1
        - Prefree: 11
        - Free: 6001 (6001)
      
      GC calls: 162 (BG: 220)
        - data segments : 160 (160)
        - node segments : 2 (2)
      Try to move 41454 blocks (BG: 41454)
        - data blocks : 40960 (40960)
        - node blocks : 494 (494)
      
      IPU: 0 blocks
      SSR: 0 blocks in 0 segments
      LFS: 41364 blocks in 81 segments
      
      - After:
      
        - Valid: 87
        - Dirty: 0
        - Prefree: 4
        - Free: 6008 (6008)
      
      GC calls: 75 (BG: 76)
        - data segments : 74 (74)
        - node segments : 1 (1)
      Try to move 12813 blocks (BG: 12813)
        - data blocks : 12544 (12544)
        - node blocks : 269 (269)
      
      IPU: 0 blocks
      SSR: 12032 blocks in 77 segments
      LFS: 855 blocks in 2 segments
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      093749e2
  2. 11 9月, 2020 3 次提交
    • D
      f2fs: Use generic casefolding support · eca4873e
      Daniel Rosenberg 提交于
      This switches f2fs over to the generic support provided in
      the previous patch.
      
      Since casefolded dentries behave the same in ext4 and f2fs, we decrease
      the maintenance burden by unifying them, and any optimizations will
      immediately apply to both.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      eca4873e
    • C
      f2fs: introduce inmem curseg · d0b9e42a
      Chao Yu 提交于
      Previous implementation of aligned pinfile allocation will:
      - allocate new segment on cold data log no matter whether last used
      segment is partially used or not, it makes IOs more random;
      - force concurrent cold data/GCed IO going into warm data area, it
      can make a bad effect on hot/cold data separation;
      
      In this patch, we introduce a new type of log named 'inmem curseg',
      the differents from normal curseg is:
      - it reuses existed segment type (CURSEG_XXX_NODE/DATA);
      - it only exists in memory, its segno, blkofs, summary will not b
       persisted into checkpoint area;
      
      With this new feature, we can enhance scalability of log, special
      allocators can be created for purposes:
      - pure lfs allocator for aligned pinfile allocation or file
      defragmentation
      - pure ssr allocator for later feature
      
      So that, let's update aligned pinfile allocation to use this new
      inmem curseg fwk.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d0b9e42a
    • A
      f2fs: support zone capacity less than zone size · de881df9
      Aravind Ramesh 提交于
      NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
      Zone-capacity indicates the maximum number of sectors that are usable in
      a zone beginning from the first sector of the zone. This makes the sectors
      sectors after the zone-capacity till zone-size to be unusable.
      This patch set tracks zone-size and zone-capacity in zoned devices and
      calculate the usable blocks per segment and usable segments per section.
      
      If zone-capacity is less than zone-size mark only those segments which
      start before zone-capacity as free segments. All segments at and beyond
      zone-capacity are treated as permanently used segments. In cases where
      zone-capacity does not align with segment size the last segment will start
      before zone-capacity and end beyond the zone-capacity of the zone. For
      such spanning segments only sectors within the zone-capacity are used.
      
      During writes and GC manage the usable segments in a section and usable
      blocks per segment. Segments which are beyond zone-capacity are never
      allocated, and do not need to be garbage collected, only the segments
      which are before zone-capacity needs to garbage collected.
      For spanning segments based on the number of usable blocks in that
      segment, write to blocks only up to zone-capacity.
      
      Zone-capacity is device specific and cannot be configured by the user.
      Since NVMe ZNS device zones are sequentially write only, a block device
      with conventional zones or any normal block device is needed along with
      the ZNS device for the metadata operations of F2fs.
      
      A typical nvme-cli output of a zoned device shows zone start and capacity
      and write pointer as below:
      
      SLBA: 0x0     WP: 0x0     Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
      SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
      SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
      
      Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
      are in EMPTY state. For each zone, only zone start + 49MB is usable area,
      any lba/sector after 49MB cannot be read or written to, the drive will fail
      any attempts to read/write. So, the second zone starts at 64MB and is
      usable till 113MB (64 + 49) and the range between 113 and 128MB is
      again unusable. The next zone starts at 128MB, and so on.
      Signed-off-by: NAravind Ramesh <aravind.ramesh@wdc.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      de881df9
  3. 04 8月, 2020 1 次提交
  4. 24 7月, 2020 1 次提交
  5. 09 7月, 2020 1 次提交
  6. 08 7月, 2020 1 次提交
    • J
      f2fs: avoid readahead race condition · 6b12367d
      Jaegeuk Kim 提交于
      If two readahead threads having same offset enter in readpages, every read
      IOs are split and issued to the disk which giving lower bandwidth.
      
      This patch tries to avoid redundant readahead calls.
      
      Fixes one build error reported by Randy.
      Fix build error when F2FS_FS_COMPRESSION is not set/enabled.
      This label is needed in either case.
      
      ../fs/f2fs/data.c: In function ‘f2fs_mpage_readpages’:
      ../fs/f2fs/data.c:2327:5: error: label ‘next_page’ used but not defined
           goto next_page;
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6b12367d
  7. 19 6月, 2020 2 次提交
  8. 09 6月, 2020 1 次提交
    • E
      f2fs: don't return vmalloc() memory from f2fs_kmalloc() · 0b6d4ca0
      Eric Biggers 提交于
      kmalloc() returns kmalloc'ed memory, and kvmalloc() returns either
      kmalloc'ed or vmalloc'ed memory.  But the f2fs wrappers, f2fs_kmalloc()
      and f2fs_kvmalloc(), both return both kinds of memory.
      
      It's redundant to have two functions that do the same thing, and also
      breaking the standard naming convention is causing bugs since people
      assume it's safe to kfree() memory allocated by f2fs_kmalloc().  See
      e.g. the various allocations in fs/f2fs/compress.c.
      
      Fix this by making f2fs_kmalloc() just use kmalloc().  And to avoid
      re-introducing the allocation failures that the vmalloc fallback was
      intended to fix, convert the largest allocations to use f2fs_kvmalloc().
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b6d4ca0
  9. 19 5月, 2020 2 次提交
    • E
      fscrypt: support test_dummy_encryption=v2 · ed318a6c
      Eric Biggers 提交于
      v1 encryption policies are deprecated in favor of v2, and some new
      features (e.g. encryption+casefolding) are only being added for v2.
      
      Therefore, the "test_dummy_encryption" mount option (which is used for
      encryption I/O testing with xfstests) needs to support v2 policies.
      
      To do this, extend its syntax to be "test_dummy_encryption=v1" or
      "test_dummy_encryption=v2".  The existing "test_dummy_encryption" (no
      argument) also continues to be accepted, to specify the default setting
      -- currently v1, but the next patch changes it to v2.
      
      To cleanly support both v1 and v2 while also making it easy to support
      specifying other encryption settings in the future (say, accepting
      "$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a
      pointer to the dummy fscrypt_context rather than using mount flags.
      
      To avoid concurrency issues, don't allow test_dummy_encryption to be set
      or changed during a remount.  (The former restriction is new, but
      xfstests doesn't run into it, so no one should notice.)
      
      Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'.  On ext4,
      there are two regressions, both of which are test bugs: ext4/023 and
      ext4/028 fail because they set an xattr and expect it to be stored
      inline, but the increase in size of the fscrypt_context from
      24 to 40 bytes causes this xattr to be spilled into an external block.
      
      Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.orgAcked-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      ed318a6c
    • J
      f2fs: fix checkpoint=disable:%u%% · 1ae18f71
      Jaegeuk Kim 提交于
      When parsing the mount option, we don't have sbi->user_block_count.
      Should do it after getting it.
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1ae18f71
  10. 12 5月, 2020 4 次提交
  11. 08 5月, 2020 1 次提交
  12. 18 4月, 2020 1 次提交
  13. 04 4月, 2020 1 次提交
  14. 31 3月, 2020 2 次提交
    • C
      f2fs: change default compression algorithm · 91faa534
      Chao Yu 提交于
      Use LZ4 as default compression algorithm, as compared to LZO, it shows
      almost the same compression ratio and much better decompression speed.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      91faa534
    • C
      f2fs: fix NULL pointer dereference in f2fs_write_begin() · 62f63eea
      Chao Yu 提交于
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      RIP: 0010:f2fs_write_begin+0x823/0xb90 [f2fs]
      Call Trace:
       f2fs_quota_write+0x139/0x1d0 [f2fs]
       write_blk+0x36/0x80 [quota_tree]
       get_free_dqblk+0x42/0xa0 [quota_tree]
       do_insert_tree+0x235/0x4a0 [quota_tree]
       do_insert_tree+0x26e/0x4a0 [quota_tree]
       do_insert_tree+0x26e/0x4a0 [quota_tree]
       do_insert_tree+0x26e/0x4a0 [quota_tree]
       qtree_write_dquot+0x70/0x190 [quota_tree]
       v2_write_dquot+0x43/0x90 [quota_v2]
       dquot_acquire+0x77/0x100
       f2fs_dquot_acquire+0x2f/0x60 [f2fs]
       dqget+0x310/0x450
       dquot_transfer+0x7e/0x120
       f2fs_setattr+0x11a/0x4a0 [f2fs]
       notify_change+0x349/0x480
       chown_common+0x168/0x1c0
       do_fchownat+0xbc/0xf0
       __x64_sys_fchownat+0x20/0x30
       do_syscall_64+0x5f/0x220
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Passing fsdata parameter to .write_{begin,end} in f2fs_quota_write(),
      so that if quota file is compressed one, we can avoid above NULL
      pointer dereference when updating quota content.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      62f63eea
  15. 25 3月, 2020 1 次提交
  16. 23 3月, 2020 1 次提交
  17. 20 3月, 2020 4 次提交
    • C
      f2fs: introduce DEFAULT_IO_TIMEOUT · 5df7731f
      Chao Yu 提交于
      As Geert Uytterhoeven reported:
      
      for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);
      
      On some platforms, HZ can be less than 50, then unexpected 0 timeout
      jiffies will be set in congestion_wait().
      
      This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
      value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.
      
      Quoted from Geert Uytterhoeven:
      
      "A timeout of HZ means 1 second.
      HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.
      
      If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
      as that takes care of the special cases, and never returns 0."
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      5df7731f
    • C
      f2fs: clean up bggc mount option · bbbc34fd
      Chao Yu 提交于
      There are three status for background gc: on, off and sync, it's
      a little bit confused to use test_opt(BG_GC) and test_opt(FORCE_FG_GC)
      combinations to indicate status of background gc.
      
      So let's remove F2FS_MOUNT_BG_GC and F2FS_MOUNT_FORCE_FG_GC mount
      options, and add F2FS_OPTION().bggc_mode with below three status
      to clean up codes and enhance bggc mode's scalability.
      
      enum {
      	BGGC_MODE_ON,		/* background gc is on */
      	BGGC_MODE_OFF,		/* background gc is off */
      	BGGC_MODE_SYNC,		/*
      				 * background gc is on, migrating blocks
      				 * like foreground gc
      				 */
      };
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bbbc34fd
    • C
      f2fs: clean up lfs/adaptive mount option · b0332a0f
      Chao Yu 提交于
      This patch removes F2FS_MOUNT_ADAPTIVE and F2FS_MOUNT_LFS mount options,
      and add F2FS_OPTION.fs_mode with below two status to indicate filesystem
      mode.
      
      enum {
      	FS_MODE_ADAPTIVE,	/* use both lfs/ssr allocation */
      	FS_MODE_LFS,		/* use lfs allocation only */
      };
      
      It can enhance code readability and fs mode's scalability.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b0332a0f
    • C
      f2fs: fix to show norecovery mount option · a9117eca
      Chao Yu 提交于
      Previously, 'norecovery' mount option will be shown as
      'disable_roll_forward', fix to show original option name correctly.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a9117eca
  18. 11 3月, 2020 2 次提交
    • C
      f2fs: fix inconsistent comments · 7a88ddb5
      Chao Yu 提交于
      Lack of maintenance on comments may mislead developers, fix them.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7a88ddb5
    • C
      f2fs: cover last_disk_size update with spinlock · c10c9820
      Chao Yu 提交于
      This change solves below hangtask issue:
      
      INFO: task kworker/u16:1:58 blocked for more than 122 seconds.
            Not tainted 5.6.0-rc2-00590-g9983bdae4974e #11
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      kworker/u16:1   D    0    58      2 0x00000000
      Workqueue: writeback wb_workfn (flush-179:0)
      Backtrace:
       (__schedule) from [<c0913234>] (schedule+0x78/0xf4)
       (schedule) from [<c017ec74>] (rwsem_down_write_slowpath+0x24c/0x4c0)
       (rwsem_down_write_slowpath) from [<c0915f2c>] (down_write+0x6c/0x70)
       (down_write) from [<c0435b80>] (f2fs_write_single_data_page+0x608/0x7ac)
       (f2fs_write_single_data_page) from [<c0435fd8>] (f2fs_write_cache_pages+0x2b4/0x7c4)
       (f2fs_write_cache_pages) from [<c043682c>] (f2fs_write_data_pages+0x344/0x35c)
       (f2fs_write_data_pages) from [<c0267ee8>] (do_writepages+0x3c/0xd4)
       (do_writepages) from [<c0310cbc>] (__writeback_single_inode+0x44/0x454)
       (__writeback_single_inode) from [<c03112d0>] (writeback_sb_inodes+0x204/0x4b0)
       (writeback_sb_inodes) from [<c03115cc>] (__writeback_inodes_wb+0x50/0xe4)
       (__writeback_inodes_wb) from [<c03118f4>] (wb_writeback+0x294/0x338)
       (wb_writeback) from [<c0312dac>] (wb_workfn+0x35c/0x54c)
       (wb_workfn) from [<c014f2b8>] (process_one_work+0x214/0x544)
       (process_one_work) from [<c014f634>] (worker_thread+0x4c/0x574)
       (worker_thread) from [<c01564fc>] (kthread+0x144/0x170)
       (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
      Reported-and-tested-by: NOndřej Jirman <megi@xff.cz>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c10c9820
  19. 28 2月, 2020 1 次提交
    • S
      f2fs: fix the panic in do_checkpoint() · bf22c3cc
      Sahitya Tummala 提交于
      There could be a scenario where f2fs_sync_meta_pages() will not
      ensure that all F2FS_DIRTY_META pages are submitted for IO. Thus,
      resulting in the below panic in do_checkpoint() -
      
      f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_META) &&
      				!f2fs_cp_error(sbi));
      
      This can happen in a low-memory condition, where shrinker could
      also be doing the writepage operation (stack shown below)
      at the same time when checkpoint is running on another core.
      
      schedule
      down_write
      f2fs_submit_page_write -> by this time, this page in page cache is tagged
      			as PAGECACHE_TAG_WRITEBACK and PAGECACHE_TAG_DIRTY
      			is cleared, due to which f2fs_sync_meta_pages()
      			cannot sync this page in do_checkpoint() path.
      f2fs_do_write_meta_page
      __f2fs_write_meta_page
      f2fs_write_meta_page
      shrink_page_list
      shrink_inactive_list
      shrink_node_memcg
      shrink_node
      kswapd
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bf22c3cc
  20. 18 1月, 2020 4 次提交
    • C
      f2fs: change to use rwsem for gc_mutex · fb24fea7
      Chao Yu 提交于
      Mutex lock won't serialize callers, in order to avoid starving of unlucky
      caller, let's use rwsem lock instead.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fb24fea7
    • C
      f2fs: code cleanup for f2fs_statfs_project() · bf2cbd3c
      Chengguang Xu 提交于
      Calling min_not_zero() to simplify complicated prjquota
      limit comparison in f2fs_statfs_project().
      Signed-off-by: NChengguang Xu <cgxu519@mykernel.net>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bf2cbd3c
    • C
      f2fs: fix miscounted block limit in f2fs_statfs_project() · acdf2172
      Chengguang Xu 提交于
      statfs calculates Total/Used/Avail disk space in block unit,
      so we should translate soft/hard prjquota limit to block unit
      as well.
      
      Below testing result shows the block/inode numbers of
      Total/Used/Avail from df command are all correct afer
      applying this patch.
      
      [root@localhost quota-tools]\# ./repquota -P /dev/sdb1
      *** Report for project quotas on device /dev/sdb1
      Block grace time: 7days; Inode grace time: 7days
                    Block limits                File limits
      Project   used soft    hard  grace  used  soft  hard  grace
      -----------------------------------------------------------
      \#0   --   4       0       0         1     0     0
      \#101 --   0       0       0         2     0     0
      \#102 --   0   10240       0         2    10     0
      \#103 --   0       0   20480         2     0    20
      \#104 --   0   10240   20480         2    10    20
      \#105 --   0   20480   10240         2    20    10
      
      [root@localhost sdb1]\# lsattr -p t{1,2,3,4,5}
        101 ----------------N-- t1/a1
        102 ----------------N-- t2/a2
        103 ----------------N-- t3/a3
        104 ----------------N-- t4/a4
        105 ----------------N-- t5/a5
      
      [root@localhost sdb1]\# df -hi t{1,2,3,4,5}
      Filesystem     Inodes IUsed IFree IUse% Mounted on
      /dev/sdb1        2.4M    21  2.4M    1% /mnt/sdb1
      /dev/sdb1          10     2     8   20% /mnt/sdb1
      /dev/sdb1          20     2    18   10% /mnt/sdb1
      /dev/sdb1          10     2     8   20% /mnt/sdb1
      /dev/sdb1          10     2     8   20% /mnt/sdb1
      
      [root@localhost sdb1]\# df -h t{1,2,3,4,5}
      Filesystem      Size  Used Avail Use% Mounted on
      /dev/sdb1        10G  489M  9.6G   5% /mnt/sdb1
      /dev/sdb1        10M     0   10M   0% /mnt/sdb1
      /dev/sdb1        20M     0   20M   0% /mnt/sdb1
      /dev/sdb1        10M     0   10M   0% /mnt/sdb1
      /dev/sdb1        10M     0   10M   0% /mnt/sdb1
      
      Fixes: 909110c0 ("f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()")
      Signed-off-by: NChengguang Xu <cgxu519@mykernel.net>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      acdf2172
    • C
      f2fs: support data compression · 4c8ff709
      Chao Yu 提交于
      This patch tries to support compression in f2fs.
      
      - New term named cluster is defined as basic unit of compression, file can
      be divided into multiple clusters logically. One cluster includes 4 << n
      (n >= 0) logical pages, compression size is also cluster size, each of
      cluster can be compressed or not.
      
      - In cluster metadata layout, one special flag is used to indicate cluster
      is compressed one or normal one, for compressed cluster, following metadata
      maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
      data including compress header and compressed data.
      
      - In order to eliminate write amplification during overwrite, F2FS only
      support compression on write-once file, data can be compressed only when
      all logical blocks in file are valid and cluster compress ratio is lower
      than specified threshold.
      
      - To enable compression on regular inode, there are three ways:
      * chattr +c file
      * chattr +c dir; touch dir/file
      * mount w/ -o compress_extension=ext; touch file.ext
      
      Compress metadata layout:
                                   [Dnode Structure]
                   +-----------------------------------------------+
                   | cluster 1 | cluster 2 | ......... | cluster N |
                   +-----------------------------------------------+
                   .           .                       .           .
             .                       .                .                      .
        .         Compressed Cluster       .        .        Normal Cluster            .
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
      |compr flag| block 1 | block 2 | block 3 |  | block 1 | block 2 | block 3 | block 4 |
      +----------+---------+---------+---------+  +---------+---------+---------+---------+
                 .                             .
               .                                           .
             .                                                           .
            +-------------+-------------+----------+----------------------------+
            | data length | data chksum | reserved |      compressed data       |
            +-------------+-------------+----------+----------------------------+
      
      Changelog:
      
      20190326:
      - fix error handling of read_end_io().
      - remove unneeded comments in f2fs_encrypt_one_page().
      
      20190327:
      - fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
      - don't jump into loop directly to avoid uninitialized variables.
      - add TODO tag in error path of f2fs_write_cache_pages().
      
      20190328:
      - fix wrong merge condition in f2fs_read_multi_pages().
      - check compressed file in f2fs_post_read_required().
      
      20190401
      - allow overwrite on non-compressed cluster.
      - check cluster meta before writing compressed data.
      
      20190402
      - don't preallocate blocks for compressed file.
      
      - add lz4 compress algorithm
      - process multiple post read works in one workqueue
        Now f2fs supports processing post read work in multiple workqueue,
        it shows low performance due to schedule overhead of multiple
        workqueue executing orderly.
      
      20190921
      - compress: support buffered overwrite
      C: compress cluster flag
      V: valid block address
      N: NEW_ADDR
      
      One cluster contain 4 blocks
      
       before overwrite   after overwrite
      
      - VVVV		->	CVNN
      - CVNN		->	VVVV
      
      - CVNN		->	CVNN
      - CVNN		->	CVVV
      
      - CVVV		->	CVNN
      - CVVV		->	CVVV
      
      20191029
      - add kconfig F2FS_FS_COMPRESSION to isolate compression related
      codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
      note that: will remove lzo backend if Jaegeuk agreed that too.
      - update codes according to Eric's comments.
      
      20191101
      - apply fixes from Jaegeuk
      
      20191113
      - apply fixes from Jaegeuk
      - split workqueue for fsverity
      
      20191216
      - apply fixes from Jaegeuk
      
      20200117
      - fix to avoid NULL pointer dereference
      
      [Jaegeuk Kim]
      - add tracepoint for f2fs_{,de}compress_pages()
      - fix many bugs and add some compression stats
      - fix overwrite/mmap bugs
      - address 32bit build error, reported by Geert.
      - bug fixes when handling errors and i_compressed_blocks
      
      Reported-by: <noreply@ellerman.id.au>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4c8ff709
  21. 16 1月, 2020 3 次提交
    • J
      f2fs: declare nested quota_sem and remove unnecessary sems · 2c4e0c52
      Jaegeuk Kim 提交于
      1.
      f2fs_quota_sync
       -> down_read(&sbi->quota_sem)
       -> dquot_writeback_dquots
        -> f2fs_dquot_commit
         -> down_read(&sbi->quota_sem)
      
      2.
      f2fs_quota_sync
       -> down_read(&sbi->quota_sem)
        -> f2fs_write_data_pages
         -> f2fs_write_single_data_page
          -> down_write(&F2FS_I(inode)->i_sem)
      
      f2fs_mkdir
       -> f2fs_do_add_link
         -> down_write(&F2FS_I(inode)->i_sem)
         -> f2fs_init_inode_metadata
          -> f2fs_new_node_page
           -> dquot_alloc_inode
            -> f2fs_dquot_mark_dquot_dirty
             -> down_read(&sbi->quota_sem)
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2c4e0c52
    • C
      f2fs: introduce private bioset · f543805f
      Chao Yu 提交于
      In low memory scenario, we can allocate multiple bios without
      submitting any of them.
      
      - f2fs_write_checkpoint()
       - block_operations()
        - f2fs_sync_node_pages()
         step 1) flush cold nodes, allocate new bio from mempool
         - bio_alloc()
          - mempool_alloc()
         step 2) flush hot nodes, allocate a bio from mempool
         - bio_alloc()
          - mempool_alloc()
         step 3) flush warm nodes, be stuck in below call path
         - bio_alloc()
          - mempool_alloc()
           - loop to wait mempool element release, as we only
             reserved memory for two bio allocation, however above
             allocated two bios may never be submitted.
      
      So we need avoid using default bioset, in this patch we introduce a
      private bioset, in where we enlarg mempool element count to total
      number of log header, so that we can make sure we have enough
      backuped memory pool in scenario of allocating/holding multiple
      bios.
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f543805f
    • S
      f2fs: Check write pointer consistency of non-open zones · d508c94e
      Shin'ichiro Kawasaki 提交于
      To catch f2fs bugs in write pointer handling code for zoned block
      devices, check write pointers of non-open zones that current segments do
      not point to. Do this check at mount time, after the fsync data recovery
      and current segments' write pointer consistency fix. Or when fsync data
      recovery is disabled by mount option, do the check when there is no fsync
      data.
      
      Check two items comparing write pointers with valid block maps in SIT.
      The first item is check for zones with no valid blocks. When there is no
      valid blocks in a zone, the write pointer should be at the start of the
      zone. If not, next write operation to the zone will cause unaligned write
      error. If write pointer is not at the zone start, reset the write pointer
      to place at the zone start.
      
      The second item is check between the write pointer position and the last
      valid block in the zone. It is unexpected that the last valid block
      position is beyond the write pointer. In such a case, report as a bug.
      Fix is not required for such zone, because the zone is not selected for
      next write operation until the zone get discarded.
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d508c94e
新手
引导
客服 返回
顶部