1. 26 10月, 2017 7 次提交
    • J
      f2fs: limit # of inmemory pages · 57864ae5
      Jaegeuk Kim 提交于
      If some abnormal users try lots of atomic write operations, f2fs is able to
      produce pinned pages in the main memory which affects system performance.
      This patch limits that as 20% over total memory size, and if f2fs reaches
      to the limit, it will drop all the inmemory pages.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      57864ae5
    • C
      f2fs: give up CP_TRIMMED_FLAG if it drops discards · cf5c759f
      Chao Yu 提交于
      In ->umount, once we drop remained discard entries, we should not
      set CP_TRIMMED_FLAG with another checkpoint.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      cf5c759f
    • C
      f2fs: trace f2fs_remove_discard · 2ec6f2ef
      Chao Yu 提交于
      This patch adds tracepoint to trace f2fs_remove_discard.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2ec6f2ef
    • C
      f2fs: reduce cmd_lock coverage in __issue_discard_cmd · 33da62cf
      Chao Yu 提交于
      __submit_discard_cmd may lead long latency due to exhaustion of I/O
      request resource in block layer, so issuing all discard under cmd_lock
      may lead to hangtask, in order to avoid that, let's reduce it's coverage.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      33da62cf
    • C
      f2fs: split discard policy · 78997b56
      Chao Yu 提交于
      There are many different scenarios such as fstrim, umount, urgent or
      background where we will issue discards, actually, they need use
      different policy in aspect of io aware, discard granularity, delay
      interval and so on. But now they just share one common discard policy,
      so there will be race when changing policy in between these scenarios,
      the interference of changing discard policy will be very serious.
      
      This patch changes to split discard policy for different scenarios.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      78997b56
    • C
      f2fs: wrap discard policy · ecc9aa00
      Chao Yu 提交于
      This patch wraps scattered optional parameters into discard policy as
      below, later, with it we expect that we can adjust these parameters with
      proper strategy in different scenario.
      
      struct discard_policy {
      	unsigned int min_interval;	/* used for candidates exist */
      	unsigned int max_interval;	/* used for candidates not exist */
      	unsigned int max_requests;	/* # of discards issued per round */
      	unsigned int io_aware_gran;	/* minimum granularity discard not be aware of I/O */
      	bool io_aware;			/* issue discard in idle time */
      	bool sync;			/* submit discard with REQ_SYNC flag */
      };
      
      This patch doesn't change any logic of codes.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ecc9aa00
    • C
      f2fs: support issuing/waiting discard in range · 8412663d
      Chao Yu 提交于
      Fstrim intends to trim invalid blocks of filesystem only with specified
      range and granularity, but actually, it will issue all previous cached
      discard commands which may be out-of-range and be with unmatched
      granularity, it's unneeded.
      
      In order to fix above issues, this patch introduces new helps to support
      to issue and wait discard in range and adds a new fstrim_list for tracking
      in-flight discard from ->fstrim.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8412663d
  2. 11 10月, 2017 2 次提交
    • C
      f2fs: fix to flush multiple device in checkpoint · 1228b482
      Chao Yu 提交于
      If f2fs manages multiple devices, in checkpoint, we need to issue flush
      in those devices which contain dirty data/node in their cache before
      we write checkpoint region, otherwise, filesystem metadata could be
      corrupted if hitting SPO after checkpoint.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1228b482
    • C
      f2fs: enhance multiple device flush · 39d787be
      Chao Yu 提交于
      When multiple device feature is enabled, during ->fsync we will issue
      flush in all devices to make sure node/data of the file being persisted
      into storage. But some flushes of device could be unneeded as file's
      data may be not writebacked into those devices. So this patch adds and
      manage bitmap per inode in global cache to indicate which device is
      dirty and it needs to issue flush during ->fsync, hence, we could improve
      performance of fsync in scenario of multiple device.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      39d787be
  3. 03 10月, 2017 1 次提交
    • C
      f2fs: fix potential panic during fstrim · 638164a2
      Chao Yu 提交于
      As Ju Hyung Park reported:
      
      "When 'fstrim' is called for manual trim, a BUG() can be triggered
      randomly with this patch.
      
      I'm seeing this issue on both x86 Desktop and arm64 Android phone.
      
      On x86 Desktop, this was caused during Ubuntu boot-up. I have a
      cronjob installed which calls 'fstrim -v /' during boot. On arm64
      Android, this was caused during GC looping with 1ms gc_min_sleep_time
      & gc_max_sleep_time."
      
      Root cause of this issue is that f2fs_wait_discard_bios can only be
      used by f2fs_put_super, because during put_super there must be no
      other referrers, so it can ignore discard entry's reference count
      when removing the entry, otherwise in other caller we will hit bug_on
      in __remove_discard_cmd as there may be other issuer added reference
      count in discard entry.
      
      Thread A				Thread B
      					- issue_discard_thread
      - f2fs_ioc_fitrim
       - f2fs_trim_fs
        - f2fs_wait_discard_bios
         - __issue_discard_cmd
          - __submit_discard_cmd
      					 - __wait_discard_cmd
      					  - dc->ref++
      					  - __wait_one_discard_bio
         - __wait_discard_cmd
          - __remove_discard_cmd
           - f2fs_bug_on(sbi, dc->ref)
      
      Fixes: 969d1b18Reported-by: NJu Hyung Park <qkrwngud825@gmail.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      638164a2
  4. 13 9月, 2017 1 次提交
  5. 12 9月, 2017 2 次提交
  6. 06 9月, 2017 4 次提交
  7. 30 8月, 2017 4 次提交
  8. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  9. 22 8月, 2017 2 次提交
    • C
      f2fs: introduce discard_granularity sysfs entry · 969d1b18
      Chao Yu 提交于
      Commit d618ebaf ("f2fs: enable small discard by default") enables
      f2fs to issue 4K size discard in real-time discard mode. However, issuing
      smaller discard may cost more lifetime but releasing less free space in
      flash device. Since f2fs has ability of separating hot/cold data and
      garbage collection, we can expect that small-sized invalid region would
      expand soon with OPU, deletion or garbage collection on valid datas, so
      it's better to delay or skip issuing smaller size discards, it could help
      to reduce overmuch consumption of IO bandwidth and lifetime of flash
      storage.
      
      This patch makes f2fs selectng 64K size as its default minimal
      granularity, and issue discard with the size which is not smaller than
      minimal granularity. Also it exposes discard granularity as sysfs entry
      for configuration in different scenario.
      
      Jaegeuk Kim:
       We must issue all the accumulated discard commands when fstrim is called.
       So, I've added pend_list_tag[] to indicate whether we should issue the
       commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
       P_TRIM is set once at a time, given fstrim trigger.
       In addition, issue_discard_thread is calling too much due to the number of
       discard commands remaining in the pending list. I added a timer to control
       it likewise gc_thread.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      969d1b18
    • C
      f2fs: retry to revoke atomic commit in -ENOMEM case · 7f2b4e8e
      Chao Yu 提交于
      During atomic committing, if we encounter -ENOMEM in revoke path, it's
      better to give a chance to retry revoking.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7f2b4e8e
  10. 16 8月, 2017 1 次提交
  11. 10 8月, 2017 3 次提交
  12. 04 8月, 2017 1 次提交
  13. 01 8月, 2017 1 次提交
    • C
      f2fs: make background threads of f2fs being aware of freezing · dc6febb6
      Chao Yu 提交于
      When ->freeze_fs is called from lvm for doing snapshot, it needs to
      make sure there will be no more changes in filesystem's data, however,
      previously, background threads like GC thread wasn't aware of freezing,
      so in environment with active background threads, data of snapshot
      becomes unstable.
      
      This patch fixes this issue by adding sb_{start,end}_intwrite in
      below background threads:
      - GC thread
      - flush thread
      - discard thread
      
      Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
      
      generic/241 reports below bug:
      
       ======================================================
       WARNING: possible circular locking dependency detected
       4.13.0-rc1+ #32 Tainted: G           O
       ------------------------------------------------------
       f2fs_gc-250:0/22186 is trying to acquire lock:
        (&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
      
       but task is already holding lock:
        (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #2 (sb_internal#2){++++.-}:
              __lock_acquire+0x405/0x7b0
              lock_acquire+0xae/0x220
              __sb_start_write+0x11d/0x1f0
              f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
              evict+0xa8/0x170
              iput+0x1fb/0x2c0
              f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
              write_checkpoint+0x1b1/0x750 [f2fs]
              f2fs_sync_fs+0x85/0x1b0 [f2fs]
              f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
              f2fs_sync_file+0x34/0x40 [f2fs]
              vfs_fsync_range+0x4a/0xa0
              do_fsync+0x3c/0x60
              SyS_fdatasync+0x15/0x20
              do_fast_syscall_32+0xa1/0x1b0
              entry_SYSENTER_32+0x4c/0x7b
      
       -> #1 (&sbi->cp_mutex){+.+...}:
              __lock_acquire+0x405/0x7b0
              lock_acquire+0xae/0x220
              __mutex_lock+0x4f/0x830
              mutex_lock_nested+0x25/0x30
              write_checkpoint+0x2f/0x750 [f2fs]
              f2fs_sync_fs+0x85/0x1b0 [f2fs]
              sync_filesystem+0x67/0x80
              generic_shutdown_super+0x27/0x100
              kill_block_super+0x22/0x50
              kill_f2fs_super+0x3a/0x40 [f2fs]
              deactivate_locked_super+0x3d/0x70
              deactivate_super+0x40/0x60
              cleanup_mnt+0x39/0x70
              __cleanup_mnt+0x10/0x20
              task_work_run+0x69/0x80
              exit_to_usermode_loop+0x57/0x92
              do_fast_syscall_32+0x18c/0x1b0
              entry_SYSENTER_32+0x4c/0x7b
      
       -> #0 (&sbi->gc_mutex){+.+...}:
              validate_chain.isra.36+0xc50/0xdb0
              __lock_acquire+0x405/0x7b0
              lock_acquire+0xae/0x220
              __mutex_lock+0x4f/0x830
              mutex_lock_nested+0x25/0x30
              f2fs_sync_fs+0x7b/0x1b0 [f2fs]
              f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
              gc_thread_func+0x302/0x4a0 [f2fs]
              kthread+0xe9/0x120
              ret_from_fork+0x19/0x24
      
       other info that might help us debug this:
      
       Chain exists of:
         &sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(sb_internal#2);
                                      lock(&sbi->cp_mutex);
                                      lock(sb_internal#2);
         lock(&sbi->gc_mutex);
      
        *** DEADLOCK ***
      
       1 lock held by f2fs_gc-250:0/22186:
        #0:  (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
      
       stack backtrace:
       CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G           O    4.13.0-rc1+ #32
       Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
       Call Trace:
        dump_stack+0x5f/0x92
        print_circular_bug+0x1b3/0x1bd
        validate_chain.isra.36+0xc50/0xdb0
        ? __this_cpu_preempt_check+0xf/0x20
        __lock_acquire+0x405/0x7b0
        lock_acquire+0xae/0x220
        ? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
        __mutex_lock+0x4f/0x830
        ? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
        mutex_lock_nested+0x25/0x30
        ? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
        f2fs_sync_fs+0x7b/0x1b0 [f2fs]
        f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
        gc_thread_func+0x302/0x4a0 [f2fs]
        ? preempt_schedule_common+0x2f/0x4d
        ? f2fs_gc+0x540/0x540 [f2fs]
        kthread+0xe9/0x120
        ? f2fs_gc+0x540/0x540 [f2fs]
        ? kthread_create_on_node+0x30/0x30
        ret_from_fork+0x19/0x24
      
      The deadlock occurs in below condition:
      GC Thread			Thread B
      - sb_start_intwrite
      				- f2fs_sync_file
      				 - f2fs_sync_fs
      				  - mutex_lock(&sbi->gc_mutex)
      				   - write_checkpoint
      				    - block_operations
      				     - f2fs_sync_inode_meta
      				      - iput
      				       - sb_start_intwrite
       - mutex_lock(&sbi->gc_mutex)
      
      Fix this by altering sb_start_intwrite to sb_start_write_trylock.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      dc6febb6
  14. 29 7月, 2017 1 次提交
  15. 08 7月, 2017 3 次提交
  16. 04 7月, 2017 5 次提交
  17. 09 6月, 2017 1 次提交