1. 28 1月, 2021 1 次提交
  2. 11 12月, 2020 1 次提交
    • C
      f2fs: fix shift-out-of-bounds in sanity_check_raw_super() · e584bbe8
      Chao Yu 提交于
      syzbot reported a bug which could cause shift-out-of-bounds issue,
      fix it.
      
      Call Trace:
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:120
       ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
       __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
       sanity_check_raw_super fs/f2fs/super.c:2812 [inline]
       read_raw_super_block fs/f2fs/super.c:3267 [inline]
       f2fs_fill_super.cold+0x16c9/0x16f6 fs/f2fs/super.c:3519
       mount_bdev+0x34d/0x410 fs/super.c:1366
       legacy_get_tree+0x105/0x220 fs/fs_context.c:592
       vfs_get_tree+0x89/0x2f0 fs/super.c:1496
       do_new_mount fs/namespace.c:2896 [inline]
       path_mount+0x12ae/0x1e70 fs/namespace.c:3227
       do_mount fs/namespace.c:3240 [inline]
       __do_sys_mount fs/namespace.c:3448 [inline]
       __se_sys_mount fs/namespace.c:3425 [inline]
       __x64_sys_mount+0x27f/0x300 fs/namespace.c:3425
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported-by: syzbot+ca9a785f8ac472085994@syzkaller.appspotmail.com
      Signed-off-by: NAnant Thazhemadam <anant.thazhemadam@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e584bbe8
  3. 09 12月, 2020 2 次提交
  4. 03 12月, 2020 8 次提交
  5. 02 12月, 2020 2 次提交
  6. 30 9月, 2020 2 次提交
  7. 29 9月, 2020 5 次提交
    • C
      f2fs: fix to do sanity check on segment/section count · 3a22e9ac
      Chao Yu 提交于
      As syzbot reported:
      
      BUG: KASAN: slab-out-of-bounds in init_min_max_mtime fs/f2fs/segment.c:4710 [inline]
      BUG: KASAN: slab-out-of-bounds in f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792
      Read of size 8 at addr ffff8880a1b934a8 by task syz-executor682/6878
      
      CPU: 1 PID: 6878 Comm: syz-executor682 Not tainted 5.9.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x198/0x1fd lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
       init_min_max_mtime fs/f2fs/segment.c:4710 [inline]
       f2fs_build_segment_manager+0x9302/0xa6d0 fs/f2fs/segment.c:4792
       f2fs_fill_super+0x381a/0x6e80 fs/f2fs/super.c:3633
       mount_bdev+0x32e/0x3f0 fs/super.c:1417
       legacy_get_tree+0x105/0x220 fs/fs_context.c:592
       vfs_get_tree+0x89/0x2f0 fs/super.c:1547
       do_new_mount fs/namespace.c:2875 [inline]
       path_mount+0x1387/0x20a0 fs/namespace.c:3192
       do_mount fs/namespace.c:3205 [inline]
       __do_sys_mount fs/namespace.c:3413 [inline]
       __se_sys_mount fs/namespace.c:3390 [inline]
       __x64_sys_mount+0x27f/0x300 fs/namespace.c:3390
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The root cause is: if segs_per_sec is larger than one, and segment count
      in last section is less than segs_per_sec, we will suffer out-of-boundary
      memory access on sit_i->sentries[] in init_min_max_mtime().
      
      Fix this by adding sanity check among segment count, section count and
      segs_per_sec value in sanity_check_raw_super().
      
      Reported-by: syzbot+481a3ffab50fed41dcc0@syzkaller.appspotmail.com
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3a22e9ac
    • W
      f2fs: fix wrong total_sections check and fsmeta check · f99ba9ad
      Wang Xiaojun 提交于
      Meta area is not included in section_count computation.
      So the minimum number of total_sections is 1 meanwhile it cannot be
      greater than segment_count_main.
      
      The minimum number of meta segments is 8 (SB + 2 (CP + SIT + NAT) + SSA).
      Signed-off-by: NWang Xiaojun <wangxiaojun11@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f99ba9ad
    • W
      f2fs: remove duplicated code in sanity_check_area_boundary · d89f5891
      Wang Xiaojun 提交于
      Use seg_end_blkaddr instead of "segment0_blkaddr + (segment_count <<
      log_blocks_per_seg)".
      Signed-off-by: NWang Xiaojun <wangxiaojun11@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d89f5891
    • C
      f2fs: relocate blkzoned feature check · d0660122
      Chao Yu 提交于
      Relocate blkzoned feature check into parse_options() like
      other feature check.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d0660122
    • C
      f2fs: do sanity check on zoned block device path · 07eb1d69
      Chao Yu 提交于
      sbi->devs would be initialized only if image enables multiple device
      feature or blkzoned feature, if blkzoned feature flag was set by fuzz
      in non-blkzoned device, we will suffer below panic:
      
      get_zone_idx fs/f2fs/segment.c:4892 [inline]
      f2fs_usable_zone_blks_in_seg fs/f2fs/segment.c:4943 [inline]
      f2fs_usable_blks_in_seg+0x39b/0xa00 fs/f2fs/segment.c:4999
      Call Trace:
       check_block_count+0x69/0x4e0 fs/f2fs/segment.h:704
       build_sit_entries fs/f2fs/segment.c:4403 [inline]
       f2fs_build_segment_manager+0x51da/0xa370 fs/f2fs/segment.c:5100
       f2fs_fill_super+0x3880/0x6ff0 fs/f2fs/super.c:3684
       mount_bdev+0x32e/0x3f0 fs/super.c:1417
       legacy_get_tree+0x105/0x220 fs/fs_context.c:592
       vfs_get_tree+0x89/0x2f0 fs/super.c:1547
       do_new_mount fs/namespace.c:2896 [inline]
       path_mount+0x12ae/0x1e70 fs/namespace.c:3216
       do_mount fs/namespace.c:3229 [inline]
       __do_sys_mount fs/namespace.c:3437 [inline]
       __se_sys_mount fs/namespace.c:3414 [inline]
       __x64_sys_mount+0x27f/0x300 fs/namespace.c:3414
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
      
      Add sanity check to inconsistency on factors: blkzoned flag, device
      path and device character to avoid above panic.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      07eb1d69
  8. 22 9月, 2020 2 次提交
    • E
      fscrypt: make fscrypt_set_test_dummy_encryption() take a 'const char *' · c8c868ab
      Eric Biggers 提交于
      fscrypt_set_test_dummy_encryption() requires that the optional argument
      to the test_dummy_encryption mount option be specified as a substring_t.
      That doesn't work well with filesystems that use the new mount API,
      since the new way of parsing mount options doesn't use substring_t.
      
      Make it take the argument as a 'const char *' instead.
      
      Instead of moving the match_strdup() into the callers in ext4 and f2fs,
      make them just use arg->from directly.  Since the pattern is
      "test_dummy_encryption=%s", the argument will be null-terminated.
      Acked-by: NJeff Layton <jlayton@kernel.org>
      Link: https://lore.kernel.org/r/20200917041136.178600-14-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>
      c8c868ab
    • E
      fscrypt: handle test_dummy_encryption in more logical way · ac4acb1f
      Eric Biggers 提交于
      The behavior of the test_dummy_encryption mount option is that when a
      new file (or directory or symlink) is created in an unencrypted
      directory, it's automatically encrypted using a dummy encryption policy.
      That's it; in particular, the encryption (or lack thereof) of existing
      files (or directories or symlinks) doesn't change.
      
      Unfortunately the implementation of test_dummy_encryption is a bit weird
      and confusing.  When test_dummy_encryption is enabled and a file is
      being created in an unencrypted directory, we set up an encryption key
      (->i_crypt_info) for the directory.  This isn't actually used to do any
      encryption, however, since the directory is still unencrypted!  Instead,
      ->i_crypt_info is only used for inheriting the encryption policy.
      
      One consequence of this is that the filesystem ends up providing a
      "dummy context" (policy + nonce) instead of a "dummy policy".  In
      commit ed318a6c ("fscrypt: support test_dummy_encryption=v2"), I
      mistakenly thought this was required.  However, actually the nonce only
      ends up being used to derive a key that is never used.
      
      Another consequence of this implementation is that it allows for
      'inode->i_crypt_info != NULL && !IS_ENCRYPTED(inode)', which is an edge
      case that can be forgotten about.  For example, currently
      FS_IOC_GET_ENCRYPTION_POLICY on an unencrypted directory may return the
      dummy encryption policy when the filesystem is mounted with
      test_dummy_encryption.  That seems like the wrong thing to do, since
      again, the directory itself is not actually encrypted.
      
      Therefore, switch to a more logical and maintainable implementation
      where the dummy encryption policy inheritance is done without setting up
      keys for unencrypted directories.  This involves:
      
      - Adding a function fscrypt_policy_to_inherit() which returns the
        encryption policy to inherit from a directory.  This can be a real
        policy, a dummy policy, or no policy.
      
      - Replacing struct fscrypt_dummy_context, ->get_dummy_context(), etc.
        with struct fscrypt_dummy_policy, ->get_dummy_policy(), etc.
      
      - Making fscrypt_fname_encrypted_size() take an fscrypt_policy instead
        of an inode.
      Acked-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Acked-by: NJeff Layton <jlayton@kernel.org>
      Link: https://lore.kernel.org/r/20200917041136.178600-13-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>
      ac4acb1f
  9. 19 9月, 2020 1 次提交
  10. 12 9月, 2020 3 次提交
    • D
      f2fs: change i_compr_blocks of inode to atomic value · c2759eba
      Daeho Jeong 提交于
      writepages() can be concurrently invoked for the same file by different
      threads such as a thread fsyncing the file and a kworker kernel thread.
      So, changing i_compr_blocks without protection is racy and we need to
      protect it by changing it with atomic type value. Plus, we don't need
      a 64bit value for i_compr_blocks, so just we will use a atomic value,
      not atomic64.
      Signed-off-by: NDaeho Jeong <daehojeong@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c2759eba
    • C
      f2fs: ignore compress mount option on image w/o compression feature · 69c0dd29
      Chao Yu 提交于
      to keep consistent with behavior when passing compress mount option
      to kernel w/o compression feature, so that mount may not fail on
      such condition.
      Reported-by: NKyungmin Park <kyungmin.park@samsung.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      69c0dd29
    • C
      f2fs: support age threshold based garbage collection · 093749e2
      Chao Yu 提交于
      There are several issues in current background GC algorithm:
      - valid blocks is one of key factors during cost overhead calculation,
      so if segment has less valid block, however even its age is young or
      it locates hot segment, CB algorithm will still choose the segment as
      victim, it's not appropriate.
      - GCed data/node will go to existing logs, no matter in-there datas'
      update frequency is the same or not, it may mix hot and cold data
      again.
      - GC alloctor mainly use LFS type segment, it will cost free segment
      more quickly.
      
      This patch introduces a new algorithm named age threshold based
      garbage collection to solve above issues, there are three steps
      mainly:
      
      1. select a source victim:
      - set an age threshold, and select candidates beased threshold:
      e.g.
       0 means youngest, 100 means oldest, if we set age threshold to 80
       then select dirty segments which has age in range of [80, 100] as
       candiddates;
      - set candidate_ratio threshold, and select candidates based the
      ratio, so that we can shrink candidates to those oldest segments;
      - select target segment with fewest valid blocks in order to
      migrate blocks with minimum cost;
      
      2. select a target victim:
      - select candidates beased age threshold;
      - set candidate_radius threshold, search candidates whose age is
      around source victims, searching radius should less than the
      radius threshold.
      - select target segment with most valid blocks in order to avoid
      migrating current target segment.
      
      3. merge valid blocks from source victim into target victim with
      SSR alloctor.
      
      Test steps:
      - create 160 dirty segments:
       * half of them have 128 valid blocks per segment
       * left of them have 384 valid blocks per segment
      - run background GC
      
      Benefit: GC count and block movement count both decrease obviously:
      
      - Before:
        - Valid: 86
        - Dirty: 1
        - Prefree: 11
        - Free: 6001 (6001)
      
      GC calls: 162 (BG: 220)
        - data segments : 160 (160)
        - node segments : 2 (2)
      Try to move 41454 blocks (BG: 41454)
        - data blocks : 40960 (40960)
        - node blocks : 494 (494)
      
      IPU: 0 blocks
      SSR: 0 blocks in 0 segments
      LFS: 41364 blocks in 81 segments
      
      - After:
      
        - Valid: 87
        - Dirty: 0
        - Prefree: 4
        - Free: 6008 (6008)
      
      GC calls: 75 (BG: 76)
        - data segments : 74 (74)
        - node segments : 1 (1)
      Try to move 12813 blocks (BG: 12813)
        - data blocks : 12544 (12544)
        - node blocks : 269 (269)
      
      IPU: 0 blocks
      SSR: 12032 blocks in 77 segments
      LFS: 855 blocks in 2 segments
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      093749e2
  11. 11 9月, 2020 3 次提交
    • D
      f2fs: Use generic casefolding support · eca4873e
      Daniel Rosenberg 提交于
      This switches f2fs over to the generic support provided in
      the previous patch.
      
      Since casefolded dentries behave the same in ext4 and f2fs, we decrease
      the maintenance burden by unifying them, and any optimizations will
      immediately apply to both.
      Signed-off-by: NDaniel Rosenberg <drosen@google.com>
      Reviewed-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      eca4873e
    • C
      f2fs: introduce inmem curseg · d0b9e42a
      Chao Yu 提交于
      Previous implementation of aligned pinfile allocation will:
      - allocate new segment on cold data log no matter whether last used
      segment is partially used or not, it makes IOs more random;
      - force concurrent cold data/GCed IO going into warm data area, it
      can make a bad effect on hot/cold data separation;
      
      In this patch, we introduce a new type of log named 'inmem curseg',
      the differents from normal curseg is:
      - it reuses existed segment type (CURSEG_XXX_NODE/DATA);
      - it only exists in memory, its segno, blkofs, summary will not b
       persisted into checkpoint area;
      
      With this new feature, we can enhance scalability of log, special
      allocators can be created for purposes:
      - pure lfs allocator for aligned pinfile allocation or file
      defragmentation
      - pure ssr allocator for later feature
      
      So that, let's update aligned pinfile allocation to use this new
      inmem curseg fwk.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d0b9e42a
    • A
      f2fs: support zone capacity less than zone size · de881df9
      Aravind Ramesh 提交于
      NVMe Zoned Namespace devices can have zone-capacity less than zone-size.
      Zone-capacity indicates the maximum number of sectors that are usable in
      a zone beginning from the first sector of the zone. This makes the sectors
      sectors after the zone-capacity till zone-size to be unusable.
      This patch set tracks zone-size and zone-capacity in zoned devices and
      calculate the usable blocks per segment and usable segments per section.
      
      If zone-capacity is less than zone-size mark only those segments which
      start before zone-capacity as free segments. All segments at and beyond
      zone-capacity are treated as permanently used segments. In cases where
      zone-capacity does not align with segment size the last segment will start
      before zone-capacity and end beyond the zone-capacity of the zone. For
      such spanning segments only sectors within the zone-capacity are used.
      
      During writes and GC manage the usable segments in a section and usable
      blocks per segment. Segments which are beyond zone-capacity are never
      allocated, and do not need to be garbage collected, only the segments
      which are before zone-capacity needs to garbage collected.
      For spanning segments based on the number of usable blocks in that
      segment, write to blocks only up to zone-capacity.
      
      Zone-capacity is device specific and cannot be configured by the user.
      Since NVMe ZNS device zones are sequentially write only, a block device
      with conventional zones or any normal block device is needed along with
      the ZNS device for the metadata operations of F2fs.
      
      A typical nvme-cli output of a zoned device shows zone start and capacity
      and write pointer as below:
      
      SLBA: 0x0     WP: 0x0     Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
      SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
      SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ
      
      Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones
      are in EMPTY state. For each zone, only zone start + 49MB is usable area,
      any lba/sector after 49MB cannot be read or written to, the drive will fail
      any attempts to read/write. So, the second zone starts at 64MB and is
      usable till 113MB (64 + 49) and the range between 113 and 128MB is
      again unusable. The next zone starts at 128MB, and so on.
      Signed-off-by: NAravind Ramesh <aravind.ramesh@wdc.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      de881df9
  12. 04 8月, 2020 1 次提交
  13. 24 7月, 2020 1 次提交
  14. 09 7月, 2020 1 次提交
  15. 08 7月, 2020 1 次提交
    • J
      f2fs: avoid readahead race condition · 6b12367d
      Jaegeuk Kim 提交于
      If two readahead threads having same offset enter in readpages, every read
      IOs are split and issued to the disk which giving lower bandwidth.
      
      This patch tries to avoid redundant readahead calls.
      
      Fixes one build error reported by Randy.
      Fix build error when F2FS_FS_COMPRESSION is not set/enabled.
      This label is needed in either case.
      
      ../fs/f2fs/data.c: In function ‘f2fs_mpage_readpages’:
      ../fs/f2fs/data.c:2327:5: error: label ‘next_page’ used but not defined
           goto next_page;
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6b12367d
  16. 19 6月, 2020 2 次提交
  17. 09 6月, 2020 1 次提交
    • E
      f2fs: don't return vmalloc() memory from f2fs_kmalloc() · 0b6d4ca0
      Eric Biggers 提交于
      kmalloc() returns kmalloc'ed memory, and kvmalloc() returns either
      kmalloc'ed or vmalloc'ed memory.  But the f2fs wrappers, f2fs_kmalloc()
      and f2fs_kvmalloc(), both return both kinds of memory.
      
      It's redundant to have two functions that do the same thing, and also
      breaking the standard naming convention is causing bugs since people
      assume it's safe to kfree() memory allocated by f2fs_kmalloc().  See
      e.g. the various allocations in fs/f2fs/compress.c.
      
      Fix this by making f2fs_kmalloc() just use kmalloc().  And to avoid
      re-introducing the allocation failures that the vmalloc fallback was
      intended to fix, convert the largest allocations to use f2fs_kvmalloc().
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b6d4ca0
  18. 19 5月, 2020 2 次提交
    • E
      fscrypt: support test_dummy_encryption=v2 · ed318a6c
      Eric Biggers 提交于
      v1 encryption policies are deprecated in favor of v2, and some new
      features (e.g. encryption+casefolding) are only being added for v2.
      
      Therefore, the "test_dummy_encryption" mount option (which is used for
      encryption I/O testing with xfstests) needs to support v2 policies.
      
      To do this, extend its syntax to be "test_dummy_encryption=v1" or
      "test_dummy_encryption=v2".  The existing "test_dummy_encryption" (no
      argument) also continues to be accepted, to specify the default setting
      -- currently v1, but the next patch changes it to v2.
      
      To cleanly support both v1 and v2 while also making it easy to support
      specifying other encryption settings in the future (say, accepting
      "$contents_mode:$filenames_mode:v2"), make ext4 and f2fs maintain a
      pointer to the dummy fscrypt_context rather than using mount flags.
      
      To avoid concurrency issues, don't allow test_dummy_encryption to be set
      or changed during a remount.  (The former restriction is new, but
      xfstests doesn't run into it, so no one should notice.)
      
      Tested with 'gce-xfstests -c {ext4,f2fs}/encrypt -g auto'.  On ext4,
      there are two regressions, both of which are test bugs: ext4/023 and
      ext4/028 fail because they set an xattr and expect it to be stored
      inline, but the increase in size of the fscrypt_context from
      24 to 40 bytes causes this xattr to be spilled into an external block.
      
      Link: https://lore.kernel.org/r/20200512233251.118314-4-ebiggers@kernel.orgAcked-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      ed318a6c
    • J
      f2fs: fix checkpoint=disable:%u%% · 1ae18f71
      Jaegeuk Kim 提交于
      When parsing the mount option, we don't have sbi->user_block_count.
      Should do it after getting it.
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1ae18f71
  19. 12 5月, 2020 1 次提交