1. 09 11月, 2021 1 次提交
    • N
      lib: zstd: Add kernel-specific API · cf30f6a5
      Nick Terrell 提交于
      This patch:
      - Moves `include/linux/zstd.h` -> `include/linux/zstd_lib.h`
      - Updates modified zstd headers to yearless copyright
      - Adds a new API in `include/linux/zstd.h` that is functionally
        equivalent to the in-use subset of the current API. Functions are
        renamed to avoid symbol collisions with zstd, to make it clear it is
        not the upstream zstd API, and to follow the kernel style guide.
      - Updates all callers to use the new API.
      
      There are no functional changes in this patch. Since there are no
      functional change, I felt it was okay to update all the callers in a
      single patch. Once the API is approved, the callers are mechanically
      changed.
      
      This patch is preparing for the 3rd patch in this series, which updates
      zstd to version 1.4.10. Since the upstream zstd API is no longer exposed
      to callers, the update can happen transparently.
      Signed-off-by: NNick Terrell <terrelln@fb.com>
      Tested By: Paul Jones <paul@pauljones.id.au>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM/Clang v13.0.0 on x86-64
      Tested-by: NJean-Denis Girard <jd.girard@sysnux.pf>
      cf30f6a5
  2. 21 9月, 2021 1 次提交
  3. 31 8月, 2021 3 次提交
  4. 24 8月, 2021 2 次提交
    • D
      f2fs: introduce periodic iostat io latency traces · a4b68176
      Daeho Jeong 提交于
      Whenever we notice some sluggish issues on our machines, we are always
      curious about how well all types of I/O in the f2fs filesystem are
      handled. But, it's hard to get this kind of real data. First of all,
      we need to reproduce the issue while turning on the profiling tool like
      blktrace, but the issue doesn't happen again easily. Second, with the
      intervention of any tools, the overall timing of the issue will be
      slightly changed and it sometimes makes us hard to figure it out.
      
      So, I added the feature printing out IO latency statistics tracepoint
      events, which are minimal things to understand filesystem's I/O related
      behaviors, into F2FS_IOSTAT kernel config. With "iostat_enable" sysfs
      node on, we can get this statistics info in a periodic way and it
      would cause the least overhead.
      
      [samples]
       f2fs_ckpt-254:1-507     [003] ....  2842.439683: f2fs_iostat_latency:
      dev = (254,11), iotype [peak lat.(ms)/avg lat.(ms)/count],
      rd_data [136/1/801], rd_node [136/1/1704], rd_meta [4/2/4],
      wr_sync_data [164/16/3331], wr_sync_node [152/3/648],
      wr_sync_meta [160/2/4243], wr_async_data [24/13/15],
      wr_async_node [0/0/0], wr_async_meta [0/0/0]
      
       f2fs_ckpt-254:1-507     [002] ....  2845.450514: f2fs_iostat_latency:
      dev = (254,11), iotype [peak lat.(ms)/avg lat.(ms)/count],
      rd_data [60/3/456], rd_node [60/3/1258], rd_meta [0/0/1],
      wr_sync_data [120/12/2285], wr_sync_node [88/5/428],
      wr_sync_meta [52/6/2990], wr_async_data [4/1/3],
      wr_async_node [0/0/0], wr_async_meta [0/0/0]
      Signed-off-by: NDaeho Jeong <daehojeong@google.com>
      Reviewed-by: NChao Yu <chao@kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      a4b68176
    • D
      f2fs: separate out iostat feature · 52118743
      Daeho Jeong 提交于
      Added F2FS_IOSTAT config option to support getting IO statistics through
      sysfs and printing out periodic IO statistics tracepoint events and
      moved I/O statistics related codes into separate files for better
      maintenance.
      Signed-off-by: NDaeho Jeong <daehojeong@google.com>
      Reviewed-by: NChao Yu <chao@kernel.org>
      [Jaegeuk Kim: set default=y]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      52118743
  5. 18 8月, 2021 1 次提交
  6. 06 8月, 2021 1 次提交
  7. 04 8月, 2021 2 次提交
    • D
      f2fs: add sysfs node to control ra_pages for fadvise seq file · 0f6b56ec
      Daeho Jeong 提交于
      fadvise() allows the user to expand the readahead window to double with
      POSIX_FADV_SEQUENTIAL, now. But, in some use cases, it is not that
      sufficient and we need to meet the need in a restricted way. We can
      control the multiplier value of bdi device readahead between 2 (default)
      and 256 for POSIX_FADV_SEQUENTIAL advise option.
      Signed-off-by: NDaeho Jeong <daehojeong@google.com>
      Reviewed-by: NChao Yu <chao@kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0f6b56ec
    • C
      f2fs: introduce discard_unit mount option · 4f993264
      Chao Yu 提交于
      As James Z reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=213877
      
      [1.] One-line summary of the problem:
      Mount multiple SMR block devices exceed certain number cause system non-response
      
      [2.] Full description of the problem/report:
      Created some F2FS on SMR devices (mkfs.f2fs -m), then mounted in sequence. Each device is the same Model: HGST HSH721414AL (Size 14TB).
      Empirically, found that when the amount of SMR device * 1.5Gb > System RAM, the system ran out of memory and hung. No dmesg output. For example, 24 SMR Disk need 24*1.5GB = 36GB. A system with 32G RAM can only mount 21 devices, the 22nd device will be a reproducible cause of system hang.
      The number of SMR devices with other FS mounted on this system does not interfere with the result above.
      
      [3.] Keywords (i.e., modules, networking, kernel):
      F2FS, SMR, Memory
      
      [4.] Kernel information
      [4.1.] Kernel version (uname -a):
      Linux 5.13.4-200.fc34.x86_64 #1 SMP Tue Jul 20 20:27:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
      
      [4.2.] Kernel .config file:
      Default Fedora 34 with f2fs-tools-1.14.0-2.fc34.x86_64
      
      [5.] Most recent kernel version which did not have the bug:
      None
      
      [6.] Output of Oops.. message (if applicable) with symbolic information
           resolved (see Documentation/admin-guide/oops-tracing.rst)
      None
      
      [7.] A small shell script or example program which triggers the
           problem (if possible)
      mount /dev/sdX /mnt/0X
      
      [8.] Memory consumption
      
      With 24 * 14T SMR Block device with F2FS
      free -g
                    total        used        free      shared  buff/cache   available
      Mem:             46          36           0           0          10          10
      Swap:             0           0           0
      
      With 3 * 14T SMR Block device with F2FS
      free -g
                     total        used        free      shared  buff/cache   available
      Mem:               7           5           0           0           1           1
      Swap:              7           0           7
      
      The root cause is, there are three bitmaps:
      - cur_valid_map
      - ckpt_valid_map
      - discard_map
      and each of them will cost ~500MB memory, {cur, ckpt}_valid_map are
      necessary, but discard_map is optional, since this bitmap will only be
      useful in mountpoint that small discard is enabled.
      
      For a blkzoned device such as SMR or ZNS devices, f2fs will only issue
      discard for a section(zone) when all blocks of that section are invalid,
      so, for such device, we don't need small discard functionality at all.
      
      This patch introduces a new mountoption "discard_unit=block|segment|
      section" to support issuing discard with different basic unit which is
      aligned to block, segment or section, so that user can specify
      "discard_unit=segment" or "discard_unit=section" to disable small
      discard functionality.
      
      Note that this mount option can not be changed by remount() due to
      related metadata need to be initialized during mount().
      
      In order to save memory, let's use "discard_unit=section" for blkzoned
      device by default.
      Signed-off-by: NChao Yu <chao@kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      4f993264
  8. 03 8月, 2021 1 次提交
  9. 20 7月, 2021 1 次提交
    • C
      f2fs: quota: fix potential deadlock · 9de71ede
      Chao Yu 提交于
      xfstest generic/587 reports a deadlock issue as below:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      5.14.0-rc1 #69 Not tainted
      ------------------------------------------------------
      repquota/8606 is trying to acquire lock:
      ffff888022ac9320 (&sb->s_type->i_mutex_key#18){+.+.}-{3:3}, at: f2fs_quota_sync+0x207/0x300 [f2fs]
      
      but task is already holding lock:
      ffff8880084bcde8 (&sbi->quota_sem){.+.+}-{3:3}, at: f2fs_quota_sync+0x59/0x300 [f2fs]
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (&sbi->quota_sem){.+.+}-{3:3}:
             __lock_acquire+0x648/0x10b0
             lock_acquire+0x128/0x470
             down_read+0x3b/0x2a0
             f2fs_quota_sync+0x59/0x300 [f2fs]
             f2fs_quota_on+0x48/0x100 [f2fs]
             do_quotactl+0x5e3/0xb30
             __x64_sys_quotactl+0x23a/0x4e0
             do_syscall_64+0x3b/0x90
             entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      -> #1 (&sbi->cp_rwsem){++++}-{3:3}:
             __lock_acquire+0x648/0x10b0
             lock_acquire+0x128/0x470
             down_read+0x3b/0x2a0
             f2fs_unlink+0x353/0x670 [f2fs]
             vfs_unlink+0x1c7/0x380
             do_unlinkat+0x413/0x4b0
             __x64_sys_unlinkat+0x50/0xb0
             do_syscall_64+0x3b/0x90
             entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      -> #0 (&sb->s_type->i_mutex_key#18){+.+.}-{3:3}:
             check_prev_add+0xdc/0xb30
             validate_chain+0xa67/0xb20
             __lock_acquire+0x648/0x10b0
             lock_acquire+0x128/0x470
             down_write+0x39/0xc0
             f2fs_quota_sync+0x207/0x300 [f2fs]
             do_quotactl+0xaff/0xb30
             __x64_sys_quotactl+0x23a/0x4e0
             do_syscall_64+0x3b/0x90
             entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      other info that might help us debug this:
      
      Chain exists of:
        &sb->s_type->i_mutex_key#18 --> &sbi->cp_rwsem --> &sbi->quota_sem
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&sbi->quota_sem);
                                     lock(&sbi->cp_rwsem);
                                     lock(&sbi->quota_sem);
        lock(&sb->s_type->i_mutex_key#18);
      
       *** DEADLOCK ***
      
      3 locks held by repquota/8606:
       #0: ffff88801efac0e0 (&type->s_umount_key#53){++++}-{3:3}, at: user_get_super+0xd9/0x190
       #1: ffff8880084bc380 (&sbi->cp_rwsem){++++}-{3:3}, at: f2fs_quota_sync+0x3e/0x300 [f2fs]
       #2: ffff8880084bcde8 (&sbi->quota_sem){.+.+}-{3:3}, at: f2fs_quota_sync+0x59/0x300 [f2fs]
      
      stack backtrace:
      CPU: 6 PID: 8606 Comm: repquota Not tainted 5.14.0-rc1 #69
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      Call Trace:
       dump_stack_lvl+0xce/0x134
       dump_stack+0x17/0x20
       print_circular_bug.isra.0.cold+0x239/0x253
       check_noncircular+0x1be/0x1f0
       check_prev_add+0xdc/0xb30
       validate_chain+0xa67/0xb20
       __lock_acquire+0x648/0x10b0
       lock_acquire+0x128/0x470
       down_write+0x39/0xc0
       f2fs_quota_sync+0x207/0x300 [f2fs]
       do_quotactl+0xaff/0xb30
       __x64_sys_quotactl+0x23a/0x4e0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f883b0b4efe
      
      The root cause is ABBA deadlock of inode lock and cp_rwsem,
      reorder locks in f2fs_quota_sync() as below to fix this issue:
      - lock inode
      - lock cp_rwsem
      - lock quota_sem
      
      Fixes: db6ec53b ("f2fs: add a rw_sem to cover quota flag changes")
      Signed-off-by: NChao Yu <chao@kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      9de71ede
  10. 13 7月, 2021 1 次提交
    • J
      f2fs: Convert to using invalidate_lock · edc6d01b
      Jan Kara 提交于
      Use invalidate_lock instead of f2fs' private i_mmap_sem. The intended
      purpose is exactly the same. By this conversion we fix a long standing
      race between hole punching and read(2) / readahead(2) paths that can
      lead to stale page cache contents.
      
      CC: Jaegeuk Kim <jaegeuk@kernel.org>
      CC: Chao Yu <yuchao0@huawei.com>
      CC: linux-f2fs-devel@lists.sourceforge.net
      Acked-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      edc6d01b
  11. 02 7月, 2021 1 次提交
    • F
      f2fs: compress: add nocompress extensions support · 151b1982
      Fengnan Chang 提交于
      When we create a directory with enable compression, all file write into
      directory will try to compress.But sometimes we may know, new file
      cannot meet compression ratio requirements.
      We need a nocompress extension to skip those files to avoid unnecessary
      compress page test.
      
      After add nocompress_extension, the priority should be:
      dir_flag < comp_extention,nocompress_extension < comp_file_flag,
      no_comp_file_flag.
      
      Priority in between FS_COMPR_FL, FS_NOCOMP_FS, extensions:
         * compress_extension=so; nocompress_extension=zip; chattr +c dir;
           touch dir/foo.so; touch dir/bar.zip; touch dir/baz.txt; then foo.so
           and baz.txt should be compresse, bar.zip should be non-compressed.
           chattr +c dir/bar.zip can enable compress on bar.zip.
         * compress_extension=so; nocompress_extension=zip; chattr -c dir;
           touch dir/foo.so; touch dir/bar.zip; touch dir/baz.txt; then foo.so
           should be compresse, bar.zip and baz.txt should be non-compressed.
           chattr+c dir/bar.zip; chattr+c dir/baz.txt; can enable compress on
           bar.zip and baz.txt.
      Signed-off-by: NFengnan Chang <changfengnan@vivo.com>
      Reviewed-by: NChao Yu <chao@kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      151b1982
  12. 23 6月, 2021 4 次提交
  13. 26 5月, 2021 1 次提交
  14. 15 5月, 2021 1 次提交
    • C
      f2fs: fix to avoid racing on fsync_entry_slab by multi filesystem instances · cad83c96
      Chao Yu 提交于
      As syzbot reported, there is an use-after-free issue during f2fs recovery:
      
      Use-after-free write at 0xffff88823bc16040 (in kfence-#10):
       kmem_cache_destroy+0x1f/0x120 mm/slab_common.c:486
       f2fs_recover_fsync_data+0x75b0/0x8380 fs/f2fs/recovery.c:869
       f2fs_fill_super+0x9393/0xa420 fs/f2fs/super.c:3945
       mount_bdev+0x26c/0x3a0 fs/super.c:1367
       legacy_get_tree+0xea/0x180 fs/fs_context.c:592
       vfs_get_tree+0x86/0x270 fs/super.c:1497
       do_new_mount fs/namespace.c:2905 [inline]
       path_mount+0x196f/0x2be0 fs/namespace.c:3235
       do_mount fs/namespace.c:3248 [inline]
       __do_sys_mount fs/namespace.c:3456 [inline]
       __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433
       do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The root cause is multi f2fs filesystem instances can race on accessing
      global fsync_entry_slab pointer, result in use-after-free issue of slab
      cache, fixes to init/destroy this slab cache only once during module
      init/destroy procedure to avoid this issue.
      
      Reported-by: syzbot+9d90dad32dd9727ed084@syzkaller.appspotmail.com
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      cad83c96
  15. 11 4月, 2021 1 次提交
  16. 02 4月, 2021 1 次提交
  17. 31 3月, 2021 2 次提交
  18. 26 3月, 2021 2 次提交
  19. 13 3月, 2021 2 次提交
  20. 11 3月, 2021 1 次提交
  21. 13 2月, 2021 1 次提交
  22. 09 2月, 2021 1 次提交
    • J
      f2fs: don't grab superblock freeze for flush/ckpt thread · d50dfc0c
      Jaegeuk Kim 提交于
      There are controlled by f2fs_freeze().
      
      This fixes xfstests/generic/068 which is stuck at
      
       task:f2fs_ckpt-252:3 state:D stack:    0 pid: 5761 ppid:     2 flags:0x00004000
       Call Trace:
        __schedule+0x44c/0x8a0
        schedule+0x4f/0xc0
        percpu_rwsem_wait+0xd8/0x140
        ? percpu_down_write+0xf0/0xf0
        __percpu_down_read+0x56/0x70
        issue_checkpoint_thread+0x12c/0x160 [f2fs]
        ? wait_woken+0x80/0x80
        kthread+0x114/0x150
        ? __checkpoint_and_complete_reqs+0x110/0x110 [f2fs]
        ? kthread_park+0x90/0x90
        ret_from_fork+0x22/0x30
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d50dfc0c
  23. 04 2月, 2021 1 次提交
    • D
      f2fs: introduce checkpoint_merge mount option · 261eeb9c
      Daeho Jeong 提交于
      We've added a new mount options, "checkpoint_merge" and "nocheckpoint_merge",
      which creates a kernel daemon and makes it to merge concurrent checkpoint
      requests as much as possible to eliminate redundant checkpoint issues. Plus,
      we can eliminate the sluggish issue caused by slow checkpoint operation
      when the checkpoint is done in a process context in a cgroup having
      low i/o budget and cpu shares. To make this do better, we set the
      default i/o priority of the kernel daemon to "3", to give one higher
      priority than other kernel threads. The below verification result
      explains this.
      The basic idea has come from https://opensource.samsung.com.
      
      [Verification]
      Android Pixel Device(ARM64, 7GB RAM, 256GB UFS)
      Create two I/O cgroups (fg w/ weight 100, bg w/ wight 20)
      Set "strict_guarantees" to "1" in BFQ tunables
      
      In "fg" cgroup,
      - thread A => trigger 1000 checkpoint operations
        "for i in `seq 1 1000`; do touch test_dir1/file; fsync test_dir1;
         done"
      - thread B => gererating async. I/O
        "fio --rw=write --numjobs=1 --bs=128k --runtime=3600 --time_based=1
             --filename=test_img --name=test"
      
      In "bg" cgroup,
      - thread C => trigger repeated checkpoint operations
        "echo $$ > /dev/blkio/bg/tasks; while true; do touch test_dir2/file;
         fsync test_dir2; done"
      
      We've measured thread A's execution time.
      
      [ w/o patch ]
      Elapsed Time: Avg. 68 seconds
      [ w/  patch ]
      Elapsed Time: Avg. 48 seconds
      Reported-by: Nkernel test robot <lkp@intel.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      [Jaegeuk Kim: fix the return value in f2fs_start_ckpt_thread, reported by Dan]
      Signed-off-by: NDaeho Jeong <daehojeong@google.com>
      Signed-off-by: NSungjong Seo <sj1557.seo@samsung.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      261eeb9c
  24. 02 2月, 2021 1 次提交
  25. 28 1月, 2021 6 次提交