1. 01 8月, 2017 1 次提交
    • C
      f2fs: make background threads of f2fs being aware of freezing · dc6febb6
      Chao Yu 提交于
      When ->freeze_fs is called from lvm for doing snapshot, it needs to
      make sure there will be no more changes in filesystem's data, however,
      previously, background threads like GC thread wasn't aware of freezing,
      so in environment with active background threads, data of snapshot
      becomes unstable.
      
      This patch fixes this issue by adding sb_{start,end}_intwrite in
      below background threads:
      - GC thread
      - flush thread
      - discard thread
      
      Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
      
      generic/241 reports below bug:
      
       ======================================================
       WARNING: possible circular locking dependency detected
       4.13.0-rc1+ #32 Tainted: G           O
       ------------------------------------------------------
       f2fs_gc-250:0/22186 is trying to acquire lock:
        (&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
      
       but task is already holding lock:
        (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #2 (sb_internal#2){++++.-}:
              __lock_acquire+0x405/0x7b0
              lock_acquire+0xae/0x220
              __sb_start_write+0x11d/0x1f0
              f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
              evict+0xa8/0x170
              iput+0x1fb/0x2c0
              f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
              write_checkpoint+0x1b1/0x750 [f2fs]
              f2fs_sync_fs+0x85/0x1b0 [f2fs]
              f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
              f2fs_sync_file+0x34/0x40 [f2fs]
              vfs_fsync_range+0x4a/0xa0
              do_fsync+0x3c/0x60
              SyS_fdatasync+0x15/0x20
              do_fast_syscall_32+0xa1/0x1b0
              entry_SYSENTER_32+0x4c/0x7b
      
       -> #1 (&sbi->cp_mutex){+.+...}:
              __lock_acquire+0x405/0x7b0
              lock_acquire+0xae/0x220
              __mutex_lock+0x4f/0x830
              mutex_lock_nested+0x25/0x30
              write_checkpoint+0x2f/0x750 [f2fs]
              f2fs_sync_fs+0x85/0x1b0 [f2fs]
              sync_filesystem+0x67/0x80
              generic_shutdown_super+0x27/0x100
              kill_block_super+0x22/0x50
              kill_f2fs_super+0x3a/0x40 [f2fs]
              deactivate_locked_super+0x3d/0x70
              deactivate_super+0x40/0x60
              cleanup_mnt+0x39/0x70
              __cleanup_mnt+0x10/0x20
              task_work_run+0x69/0x80
              exit_to_usermode_loop+0x57/0x92
              do_fast_syscall_32+0x18c/0x1b0
              entry_SYSENTER_32+0x4c/0x7b
      
       -> #0 (&sbi->gc_mutex){+.+...}:
              validate_chain.isra.36+0xc50/0xdb0
              __lock_acquire+0x405/0x7b0
              lock_acquire+0xae/0x220
              __mutex_lock+0x4f/0x830
              mutex_lock_nested+0x25/0x30
              f2fs_sync_fs+0x7b/0x1b0 [f2fs]
              f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
              gc_thread_func+0x302/0x4a0 [f2fs]
              kthread+0xe9/0x120
              ret_from_fork+0x19/0x24
      
       other info that might help us debug this:
      
       Chain exists of:
         &sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(sb_internal#2);
                                      lock(&sbi->cp_mutex);
                                      lock(sb_internal#2);
         lock(&sbi->gc_mutex);
      
        *** DEADLOCK ***
      
       1 lock held by f2fs_gc-250:0/22186:
        #0:  (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
      
       stack backtrace:
       CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G           O    4.13.0-rc1+ #32
       Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
       Call Trace:
        dump_stack+0x5f/0x92
        print_circular_bug+0x1b3/0x1bd
        validate_chain.isra.36+0xc50/0xdb0
        ? __this_cpu_preempt_check+0xf/0x20
        __lock_acquire+0x405/0x7b0
        lock_acquire+0xae/0x220
        ? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
        __mutex_lock+0x4f/0x830
        ? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
        mutex_lock_nested+0x25/0x30
        ? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
        f2fs_sync_fs+0x7b/0x1b0 [f2fs]
        f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
        gc_thread_func+0x302/0x4a0 [f2fs]
        ? preempt_schedule_common+0x2f/0x4d
        ? f2fs_gc+0x540/0x540 [f2fs]
        kthread+0xe9/0x120
        ? f2fs_gc+0x540/0x540 [f2fs]
        ? kthread_create_on_node+0x30/0x30
        ret_from_fork+0x19/0x24
      
      The deadlock occurs in below condition:
      GC Thread			Thread B
      - sb_start_intwrite
      				- f2fs_sync_file
      				 - f2fs_sync_fs
      				  - mutex_lock(&sbi->gc_mutex)
      				   - write_checkpoint
      				    - block_operations
      				     - f2fs_sync_inode_meta
      				      - iput
      				       - sb_start_intwrite
       - mutex_lock(&sbi->gc_mutex)
      
      Fix this by altering sb_start_intwrite to sb_start_write_trylock.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      dc6febb6
  2. 29 7月, 2017 3 次提交
  3. 27 7月, 2017 3 次提交
  4. 18 7月, 2017 2 次提交
  5. 16 7月, 2017 2 次提交
  6. 09 7月, 2017 1 次提交
    • C
      f2fs: support plain user/group quota · 0abd675e
      Chao Yu 提交于
      This patch adds to support plain user/group quota.
      
      Change Note by Jaegeuk Kim.
      
      - Use f2fs page cache for quota files in order to consider garbage collection.
        so, quota files are not tolerable for sudden power-cuts, so user needs to do
        quotacheck.
      
      - setattr() calls dquot_transfer which will transfer inode->i_blocks.
        We can't reclaim that during f2fs_evict_inode(). So, we need to count
        node blocks as well in order to match i_blocks with dquot's space.
      
        Note that, Chao wrote a patch to count inode->i_blocks without inode block.
        (f2fs: don't count inode block in in-memory inode.i_blocks)
      
      - in f2fs_remount, we need to make RW in prior to dquot_resume.
      
      - handle fault_injection case during f2fs_quota_off_umount
      
      - TODO: Project quota
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0abd675e
  7. 08 7月, 2017 13 次提交
    • J
      f2fs: avoid deadlock caused by lock order of page and lock_op · d29460e5
      Jaegeuk Kim 提交于
      - punch_hole
       - fill_zero
        - f2fs_lock_op
        - get_new_data_page
         - lock_page
      
      - f2fs_write_data_pages
       - lock_page
       - do_write_data_page
        - f2fs_lock_op
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d29460e5
    • C
      f2fs: use spin_{,un}lock_irq{save,restore} · d1aa2453
      Chao Yu 提交于
      generic/361 reports below warning, this is because: once, there is
      someone entering into critical region of sbi.cp_lock, if write_end_io.
      f2fs_stop_checkpoint is invoked from an triggered IRQ, we will encounter
      deadlock.
      
      So this patch changes to use spin_{,un}lock_irq{save,restore} to create
      critical region without IRQ enabled to avoid potential deadlock.
      
       irq event stamp: 83391573
       loop: Write error at byte offset 438729728, length 1024.
       hardirqs last  enabled at (83391573): [<c1809752>] restore_all+0xf/0x65
       hardirqs last disabled at (83391572): [<c1809eac>] reschedule_interrupt+0x30/0x3c
       loop: Write error at byte offset 438860288, length 1536.
       softirqs last  enabled at (83389244): [<c180cc4e>] __do_softirq+0x1ae/0x476
       softirqs last disabled at (83389237): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
       loop: Write error at byte offset 438990848, length 2048.
       ================================
       WARNING: inconsistent lock state
       4.12.0-rc2+ #30 Tainted: G           O
       --------------------------------
       inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
       xfs_io/7959 [HC1[1]:SC0[0]:HE0:SE1] takes:
        (&(&sbi->cp_lock)->rlock){?.+...}, at: [<f96f96cc>] f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
       {HARDIRQ-ON-W} state was registered at:
         __lock_acquire+0x527/0x7b0
         lock_acquire+0xae/0x220
         _raw_spin_lock+0x42/0x50
         do_checkpoint+0x165/0x9e0 [f2fs]
         write_checkpoint+0x33f/0x740 [f2fs]
         __f2fs_sync_fs+0x92/0x1f0 [f2fs]
         f2fs_sync_fs+0x12/0x20 [f2fs]
         sync_filesystem+0x67/0x80
         generic_shutdown_super+0x27/0x100
         kill_block_super+0x22/0x50
         kill_f2fs_super+0x3a/0x40 [f2fs]
         deactivate_locked_super+0x3d/0x70
         deactivate_super+0x40/0x60
         cleanup_mnt+0x39/0x70
         __cleanup_mnt+0x10/0x20
         task_work_run+0x69/0x80
         exit_to_usermode_loop+0x57/0x85
         do_fast_syscall_32+0x18c/0x1b0
         entry_SYSENTER_32+0x4c/0x7b
       irq event stamp: 1957420
       hardirqs last  enabled at (1957419): [<c1808f37>] _raw_spin_unlock_irq+0x27/0x50
       hardirqs last disabled at (1957420): [<c1809f9c>] call_function_single_interrupt+0x30/0x3c
       softirqs last  enabled at (1953784): [<c180cc4e>] __do_softirq+0x1ae/0x476
       softirqs last disabled at (1953773): [<c101ca7c>] do_softirq_own_stack+0x2c/0x40
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&(&sbi->cp_lock)->rlock);
         <Interrupt>
           lock(&(&sbi->cp_lock)->rlock);
      
        *** DEADLOCK ***
      
       2 locks held by xfs_io/7959:
        #0:  (sb_writers#13){.+.+.+}, at: [<c11fd7ca>] vfs_write+0x16a/0x190
        #1:  (&sb->s_type->i_mutex_key#16){+.+.+.}, at: [<f96e33f5>] f2fs_file_write_iter+0x25/0x140 [f2fs]
      
       stack backtrace:
       CPU: 2 PID: 7959 Comm: xfs_io Tainted: G           O    4.12.0-rc2+ #30
       Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
       Call Trace:
        dump_stack+0x5f/0x92
        print_usage_bug+0x1d3/0x1dd
        ? check_usage_backwards+0xe0/0xe0
        mark_lock+0x23d/0x280
        __lock_acquire+0x699/0x7b0
        ? __this_cpu_preempt_check+0xf/0x20
        ? trace_hardirqs_off_caller+0x91/0xe0
        lock_acquire+0xae/0x220
        ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
        _raw_spin_lock+0x42/0x50
        ? f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
        f2fs_stop_checkpoint+0x1c/0x50 [f2fs]
        f2fs_write_end_io+0x147/0x150 [f2fs]
        bio_endio+0x7a/0x1e0
        blk_update_request+0xad/0x410
        blk_mq_end_request+0x16/0x60
        lo_complete_rq+0x3c/0x70
        __blk_mq_complete_request_remote+0x11/0x20
        flush_smp_call_function_queue+0x6d/0x120
        ? debug_smp_processor_id+0x12/0x20
        generic_smp_call_function_single_interrupt+0x12/0x30
        smp_call_function_single_interrupt+0x25/0x40
        call_function_single_interrupt+0x37/0x3c
       EIP: _raw_spin_unlock_irq+0x2d/0x50
       EFLAGS: 00000296 CPU: 2
       EAX: 00000001 EBX: d2ccc51c ECX: 00000001 EDX: c1aacebd
       ESI: 00000000 EDI: 00000000 EBP: c96c9d1c ESP: c96c9d18
        DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
        ? inherit_task_group.isra.98.part.99+0x6b/0xb0
        __add_to_page_cache_locked+0x1d4/0x290
        add_to_page_cache_lru+0x38/0xb0
        pagecache_get_page+0x8e/0x200
        f2fs_write_begin+0x96/0xf00 [f2fs]
        ? trace_hardirqs_on_caller+0xdd/0x1c0
        ? current_time+0x17/0x50
        ? trace_hardirqs_on+0xb/0x10
        generic_perform_write+0xa9/0x170
        __generic_file_write_iter+0x1a2/0x1f0
        ? f2fs_preallocate_blocks+0x137/0x160 [f2fs]
        f2fs_file_write_iter+0x6e/0x140 [f2fs]
        ? __lock_acquire+0x429/0x7b0
        __vfs_write+0xc1/0x140
        vfs_write+0x9b/0x190
        SyS_pwrite64+0x63/0xa0
        do_fast_syscall_32+0xa1/0x1b0
        entry_SYSENTER_32+0x4c/0x7b
       EIP: 0xb7786c61
       EFLAGS: 00000293 CPU: 2
       EAX: ffffffda EBX: 00000003 ECX: 08416000 EDX: 00001000
       ESI: 18b24000 EDI: 00000000 EBP: 00000003 ESP: bf9b36b0
        DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      
      Fixes: aaec2b1d ("f2fs: introduce cp_lock to protect updating of ckpt_flags")
      Cc: stable@vger.kernel.org
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d1aa2453
    • J
      f2fs: relax migratepage for atomic written page · ff1048e7
      Jaegeuk Kim 提交于
      In order to avoid lock contention for atomic written pages, we'd better give
      EBUSY in f2fs_migrate_page when mode is asynchronous. We expect it will be
      released soon as transaction commits.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ff1048e7
    • C
      f2fs: don't count inode block in in-memory inode.i_blocks · 000519f2
      Chao Yu 提交于
      Previously, we count all inode consumed blocks including inode block,
      xattr block, index block, data block into i_blocks, for other generic
      filesystems, they won't count inode block into i_blocks, so for
      userspace applications or quota system, they may detect incorrect block
      count according to i_blocks value in inode.
      
      This patch changes to count all blocks into inode.i_blocks excluding
      inode block, for on-disk i_blocks, we keep counting inode block for
      backward compatibility.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      000519f2
    • C
      Revert "f2fs: fix to clean previous mount option when remount_fs" · 6ac851ba
      Chao Yu 提交于
      Don't clear old mount option before parse new option during ->remount_fs
      like other generic filesystems.
      
      This reverts commit 26666c8a.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6ac851ba
    • S
      f2fs: do not set LOST_PINO for renamed dir · b855bf0e
      Sheng Yong 提交于
      After renaming a directory, fsck could detect unmatched pino. The scenario
      can be reproduced as the following:
      
      	$ mkdir /bar/subbar /foo
      	$ rename /bar/subbar /foo
      
      Then fsck will report:
      [ASSERT] (__chk_dots_dentries:1182)  --> Bad inode number[0x3] for '..', parent parent ino is [0x4]
      
      Rename sets LOST_PINO for old_inode. However, the flag cannot be cleared,
      since dir is written back with CP. So, let's get rid of LOST_PINO for a
      renamed dir and fix the pino directly at the end of rename.
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b855bf0e
    • S
      f2fs: do not set LOST_PINO for newly created dir · d58dfb75
      Sheng Yong 提交于
      Since directories will be written back with checkpoint and fsync a
      directory will always write CP, there is no need to set LOST_PINO
      after creating a directory.
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d58dfb75
    • C
      f2fs: skip ->writepages for {mete,node}_inode during recovery · 0771fcc7
      Chao Yu 提交于
      Skip ->writepages in prior to ->writepage for {meta,node}_inode during
      recovery, hence unneeded loop in ->writepages can be avoided.
      
      Moreover, check SBI_POR_DOING earlier while writebacking pages.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0771fcc7
    • C
      f2fs: introduce __check_sit_bitmap · 6915ea9d
      Chao Yu 提交于
      After we introduce discard thread, discard command can be issued
      concurrently with data allocating, this patch adds new function to
      heck sit bitmap to ensure that userdata was invalid in which on-going
      discard command covered.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6915ea9d
    • C
      f2fs: stop gc/discard thread in prior during umount · cce13252
      Chao Yu 提交于
      This patch resolves kernel panic for xfstests/081, caused by recent f2fs_bug_on
      
       f2fs: add f2fs_bug_on in __remove_discard_cmd
      
      For fixing, we will stop gc/discard thread in prior in ->kill_sb in order to
      avoid referring and releasing race among them.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      cce13252
    • C
      f2fs: introduce reserved_blocks in sysfs · daeb433e
      Chao Yu 提交于
      In this patch, we add a new sysfs interface, with it, we can control
      number of reserved blocks in system which could not be used by user,
      it enable f2fs to let user to configure for adjusting over-provision
      ratio dynamically instead of changing it by mkfs.
      
      So we can expect it will help to reserve more free space for relieving
      GC in both filesystem and flash device.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      daeb433e
    • Y
      f2fs: avoid redundant f2fs_flush after remount · d871cd04
      Yunlong Song 提交于
      create_flush_cmd_control will create redundant issue_flush_thread after each
      remount with flush_merge option.
      Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d871cd04
    • J
      f2fs: report # of free inodes more precisely · 0cc091d0
      Jaegeuk Kim 提交于
      If the partition is small, we don't need to report total # of inodes including
      hidden free nodes.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0cc091d0
  8. 04 7月, 2017 15 次提交