1. 11 4月, 2015 24 次提交
    • C
      f2fs: fix to truncate inline data past EOF · 0bfcfcca
      Chao Yu 提交于
      Previously if inode is with inline data, we will try to invalid partial inline
      data in page #0 when we truncate size of inode in truncate_partial_data_page().
      And then we set page #0 to dirty, after this we can synchronize inode page with
      page #0 at ->writepage().
      
      But sometimes we will fail to operate page #0 in truncate_partial_data_page()
      due to below reason:
      a) if offset is zero, we will skip setting page #0 to dirty.
      b) if page #0 is not uptodate, we will fail to update it as it has no mapping
      data.
      
      So with following operations, we will meet recent data which should be
      truncated.
      
      1.write inline data to file
      2.sync first data page to inode page
      3.truncate file size to 0
      4.truncate file size to max_inline_size
      5.echo 1 > /proc/sys/vm/drop_caches
      6.read file --> meet original inline data which is remained in inode page.
      
      This patch renames truncate_inline_data() to truncate_inline_inode() for code
      readability, then use truncate_inline_inode() to truncate inline data in inode
      page in truncate_blocks() and truncate page #0 in truncate_partial_data_page()
      for fixing.
      
      v2:
       o truncate partially #0 page in truncate_partial_data_page to avoid keeping
         old data in #0 page.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0bfcfcca
    • C
      f2fs: fix reference leaks in f2fs_acl_create · 83dfe53c
      Chao Yu 提交于
      Our f2fs_acl_create is copied and modified from posix_acl_create to avoid
      deadlock bug when inline_dentry feature is enabled.
      
      Now, we got reference leaks in posix_acl_create, and this has been fixed in
      commit fed0b588 ("posix_acl: fix reference leaks in posix_acl_create")
      by Omar Sandoval.
      https://lkml.org/lkml/2015/2/9/5
      
      Let's fix this issue in f2fs_acl_create too.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Reviewed-by: NChangman Lee <cm224.lee@ssamsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      83dfe53c
    • C
      f2fs: fix to calculate max length of contiguous free slots correctly · bda19076
      Chao Yu 提交于
      When lookuping for creating, we will try to record the level of current dentry
      hash table if current dentry has enough contiguous slots for storing name of new
      file which will be created later, this can save our lookup time when add a link
      into parent dir.
      
      But currently in find_target_dentry, our current length of contiguous free slots
      is not calculated correctly. This make us leaving some holes in dentry block
      occasionally, it wastes our space of dentry block.
      
      Let's refactor the lookup flow for max slots as following to fix this issue:
      a) increase max_len if current slot is free;
      b) update max_slots with max_len if max_len is larger than max_slots;
      c) reset max_len to zero if current slot is not free.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      bda19076
    • W
      f2fs: fix unlocked nat set cache operation · 57ed1e95
      Wanpeng Li 提交于
      nm_i->nat_tree_lock is used to sync both the operations of nat entry
      cache tree and nat set cache tree, however, it isn't held when flush
      nat entries during checkpoint which lead to potential race, this patch
      fix it by holding the lock when gang lookup nat set cache and delete
      item from nat set cache.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      57ed1e95
    • C
      f2fs: cleanup statement about max orphan inodes calc · e0150392
      Changman Lee 提交于
      Through each macro, we can read the meaning easily.
      Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e0150392
    • Y
      f2fs: remove unnecessary condition judgment · d9f46bb1
      Yuan Zhong 提交于
      Remove the unnecessary condition judgment, because
      'max_slots' has been initialized to '0' at the beginging
      of the function, as following:
      if (max_slots)
             *max_slots = 0;
      Signed-off-by: NYuan Zhong <yuan.mark.zhong@samsung.com>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d9f46bb1
    • Y
      f2fs: set the correct place of initializing *res_page · b1f73b79
      Yuan Zhong 提交于
      The function 'find_in_inline_dir()' contain 'res_page'
      as an argument. So, we should initiaize 'res_page' before
      this function.
      Signed-off-by: NYuan Zhong <yuan.mark.zhong@samsung.com>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b1f73b79
    • W
      f2fs: reduce searching region of segmap when set free section · 7fd97019
      Wanpeng Li 提交于
      In __set_free we will check whether all segment are free in one section
      when free one segment, in order to set section to free status. But the
      searching region of segmap is from start segno to last segno of main
      area, it's not necessary. So let's just only check all segment bitmap
      of target section.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7fd97019
    • W
      f2fs: fix extent cache memory leak · fdf6c8be
      Wanpeng Li 提交于
      extent tree/node slab cache is created during f2fs insmod,
      how, it isn't destroyed during f2fs rmmod, this patch fix
      it by destroy extent tree/node slab cache once rmmod f2fs.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fdf6c8be
    • J
      f2fs: relocate Kconfig from misc filesystems · d7196c5a
      Jaegeuk Kim 提交于
      The f2fs has been shipped on many smartphone devices during a couple of years.
      So, it is worth to relocate Kconfig into main page from misc filesystems for
      developers to choose it more easily.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d7196c5a
    • J
      f2fs: report -ENOENT for unreached data indices · 76629165
      Jaegeuk Kim 提交于
      If inode has inline_data, it should report -ENOENT when accessing out-of-bound
      region.
      This is used by f2fs_fiemap which treats -ENOENT with no error.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      76629165
    • J
      f2fs: clear append/update flags once fsync is done · cff28521
      Jaegeuk Kim 提交于
      When fsync is done through checkpoint, previous f2fs missed to clear append
      and update flag. This patch fixes to clear them.
      
      This was originally catched by Changman Lee before.
      Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      cff28521
    • J
      f2fs: avoid to trigger writepage during POR · d5669f7b
      Jaegeuk Kim 提交于
      This patch doesn't make any effect on previous behavior, since
      f2fs_write_data_page bypasses writing the page during POR.
      
      But, the difference is that this patch avoids holding writepages mutex.
      This is to avoid the following false warning, since this can happen only
      when mount and shutdown are triggered at the same time.
      
       ======================================================
       [ INFO: possible circular locking dependency detected ]
       4.0.0-rc1+ #3 Tainted: G           O
       -------------------------------------------------------
       kworker/u8:0/2270 is trying to acquire lock:
        (&sbi->gc_mutex){+.+.+.}, at: [<ffffffffa02bdd33>] f2fs_balance_fs+0x73/0x90 [f2fs]
      
       but task is already holding lock:
        (&sbi->writepages){+.+...}, at: [<ffffffffa02b261b>] f2fs_write_data_pages+0xcb/0x3a0 [f2fs]
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #2 (&sbi->writepages){+.+...}:
              [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0
              [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530
              [<ffffffffa02b261b>] f2fs_write_data_pages+0xcb/0x3a0 [f2fs]
              [<ffffffff811c38c1>] do_writepages+0x21/0x50
              [<ffffffff8126c5a6>] __writeback_single_inode+0x76/0xbf0
              [<ffffffff8126e23a>] writeback_single_inode+0xea/0x1c0
              [<ffffffff8126e425>] write_inode_now+0x95/0xa0
              [<ffffffff81259dab>] iput+0x20b/0x3f0
              [<ffffffffa02c1c8b>] recover_data.constprop.14+0x26b/0xa80 [f2fs]
              [<ffffffffa02c2776>] recover_fsync_data+0x2b6/0x5e0 [f2fs]
              [<ffffffffa02a9744>] f2fs_fill_super+0xb24/0xb90 [f2fs]
              [<ffffffff8123d7f4>] mount_bdev+0x1a4/0x1e0
              [<ffffffffa02a3c85>] f2fs_mount+0x15/0x20 [f2fs]
              [<ffffffff8123e159>] mount_fs+0x39/0x180
              [<ffffffff8125e51b>] vfs_kern_mount+0x6b/0x160
              [<ffffffff81261554>] do_mount+0x204/0xbe0
              [<ffffffff8126223b>] SyS_mount+0x8b/0xe0
              [<ffffffff81863e6d>] system_call_fastpath+0x16/0x1b
      
       -> #1 (&sbi->cp_mutex){+.+...}:
              [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0
              [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530
              [<ffffffffa02acbf2>] write_checkpoint+0x42/0x1230 [f2fs]
              [<ffffffffa02a847d>] f2fs_sync_fs+0x9d/0x2a0 [f2fs]
              [<ffffffff81272f82>] sync_filesystem+0x82/0xb0
              [<ffffffff8123c214>] generic_shutdown_super+0x34/0x100
              [<ffffffff8123c5f7>] kill_block_super+0x27/0x70
              [<ffffffffa02a3c60>] kill_f2fs_super+0x20/0x30 [f2fs]
              [<ffffffff8123ca49>] deactivate_locked_super+0x49/0x80
              [<ffffffff8123d05e>] deactivate_super+0x4e/0x70
              [<ffffffff8125df63>] cleanup_mnt+0x43/0x90
              [<ffffffff8125e002>] __cleanup_mnt+0x12/0x20
              [<ffffffff810a82e4>] task_work_run+0xc4/0xf0
              [<ffffffff8101f0bd>] do_notify_resume+0x8d/0xa0
              [<ffffffff81864141>] int_signal+0x12/0x17
      
       -> #0 (&sbi->gc_mutex){+.+.+.}:
              [<ffffffff810e2866>] __lock_acquire+0x1ac6/0x1c90
              [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0
              [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530
              [<ffffffffa02bdd33>] f2fs_balance_fs+0x73/0x90 [f2fs]
              [<ffffffffa02b5938>] f2fs_write_data_page+0x348/0x5b0 [f2fs]
              [<ffffffffa02af9da>] __f2fs_writepage+0x1a/0x50 [f2fs]
              [<ffffffff811c1b54>] write_cache_pages+0x274/0x6f0
              [<ffffffffa02b2630>] f2fs_write_data_pages+0xe0/0x3a0 [f2fs]
              [<ffffffff811c38c1>] do_writepages+0x21/0x50
              [<ffffffff8126c5a6>] __writeback_single_inode+0x76/0xbf0
              [<ffffffff8126d44a>] writeback_sb_inodes+0x32a/0x710
              [<ffffffff8126d8cf>] __writeback_inodes_wb+0x9f/0xd0
              [<ffffffff8126dcdb>] wb_writeback+0x3db/0x850
              [<ffffffff8126e848>] bdi_writeback_workfn+0x148/0x980
              [<ffffffff810a3782>] process_one_work+0x1e2/0x840
              [<ffffffff810a3f01>] worker_thread+0x121/0x460
              [<ffffffff810a9dc8>] kthread+0xf8/0x110
              [<ffffffff81863dbc>] ret_from_fork+0x7c/0xb0
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d5669f7b
    • C
      f2fs: add stat info for moved blocks by background gc · e1235983
      Changman Lee 提交于
      This patch is for looking into gc performance of f2fs in detail.
      Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
      [Jaegeuk Kim: fix build errors]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e1235983
    • C
      f2fs: fix to issue small discard in real-time mode discard · b28c3f94
      Chao Yu 提交于
      Now in f2fs, we share functions and structures for batch mode and real-time mode
      discard. For real-time mode discard, in shared function add_discard_addrs, we
      will use uninitialized trim_minlen in struct cp_control to compare with length
      of contiguous free blocks to decide whether skipping discard fragmented freespace
      or not, this makes us ignore small discard sometimes. Fix it.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Reviewed-by : Changman Lee <cm224.lee@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b28c3f94
    • S
      f2fs: add cond_resched() to sync_dirty_dir_inodes() · 7ecebe5e
      Sebastian Andrzej Siewior 提交于
      In a preempt-off enviroment a alot of FS activity (write/delete) I run
      into a CPU stall:
      
      | NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u2:2:59]
      | Modules linked in:
      | CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G        W      3.19.0-00010-g10c11c51ffed #153
      | Workqueue: writeback bdi_writeback_workfn (flush-179:0)
      | task: df230000 ti: df23e000 task.ti: df23e000
      | PC is at __submit_merged_bio+0x6c/0x110
      | LR is at f2fs_submit_merged_bio+0x74/0x80
      …
      | [<c00085c4>] (gic_handle_irq) from [<c0012e84>] (__irq_svc+0x44/0x5c)
      | Exception stack(0xdf23fb48 to 0xdf23fb90)
      | fb40:                   deef3484 ffff0001 ffff0001 00000027 deef3484 00000000
      | fb60: deef3440 00000000 de426000 deef34ec deefc440 df23fbb4 df23fbb8 df23fb90
      | fb80: c02191f0 c0218fa0 60000013 ffffffff
      | [<c0012e84>] (__irq_svc) from [<c0218fa0>] (__submit_merged_bio+0x6c/0x110)
      | [<c0218fa0>] (__submit_merged_bio) from [<c02191f0>] (f2fs_submit_merged_bio+0x74/0x80)
      | [<c02191f0>] (f2fs_submit_merged_bio) from [<c021624c>] (sync_dirty_dir_inodes+0x70/0x78)
      | [<c021624c>] (sync_dirty_dir_inodes) from [<c0216358>] (write_checkpoint+0x104/0xc10)
      | [<c0216358>] (write_checkpoint) from [<c021231c>] (f2fs_sync_fs+0x80/0xbc)
      | [<c021231c>] (f2fs_sync_fs) from [<c0221eb8>] (f2fs_balance_fs_bg+0x4c/0x68)
      | [<c0221eb8>] (f2fs_balance_fs_bg) from [<c021e9b8>] (f2fs_write_node_pages+0x40/0x110)
      | [<c021e9b8>] (f2fs_write_node_pages) from [<c00de620>] (do_writepages+0x34/0x48)
      | [<c00de620>] (do_writepages) from [<c0145714>] (__writeback_single_inode+0x50/0x228)
      | [<c0145714>] (__writeback_single_inode) from [<c0146184>] (writeback_sb_inodes+0x1a8/0x378)
      | [<c0146184>] (writeback_sb_inodes) from [<c01463e4>] (__writeback_inodes_wb+0x90/0xc8)
      | [<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>] (wb_writeback+0x1dc/0x28c)
      | [<c01465f8>] (wb_writeback) from [<c0146dd8>] (bdi_writeback_workfn+0x2ac/0x460)
      | [<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>] (process_one_work+0x11c/0x3a4)
      | [<c003c3fc>] (process_one_work) from [<c003c844>] (worker_thread+0x17c/0x490)
      | [<c003c844>] (worker_thread) from [<c0041398>] (kthread+0xec/0x100)
      | [<c0041398>] (kthread) from [<c000ed10>] (ret_from_fork+0x14/0x24)
      
      As it turns out, the code loops in sync_dirty_dir_inodes() and waits for
      others to make progress but since it never leaves the CPU there is no
      progress made. At the time of this stall, there is also a rm process
      blocked:
      | rm              R running      0  1989   1774 0x00000000
      | [<c047c55c>] (__schedule) from [<c00486dc>] (__cond_resched+0x30/0x4c)
      | [<c00486dc>] (__cond_resched) from [<c047c8c8>] (_cond_resched+0x4c/0x54)
      | [<c047c8c8>] (_cond_resched) from [<c00e1aec>] (truncate_inode_pages_range+0x1f0/0x5e8)
      | [<c00e1aec>] (truncate_inode_pages_range) from [<c00e1fd8>] (truncate_inode_pages+0x28/0x30)
      | [<c00e1fd8>] (truncate_inode_pages) from [<c00e2148>] (truncate_inode_pages_final+0x60/0x64)
      | [<c00e2148>] (truncate_inode_pages_final) from [<c020c92c>] (f2fs_evict_inode+0x4c/0x268)
      | [<c020c92c>] (f2fs_evict_inode) from [<c0137214>] (evict+0x94/0x140)
      | [<c0137214>] (evict) from [<c01377e8>] (iput+0xc8/0x134)
      | [<c01377e8>] (iput) from [<c01333e4>] (d_delete+0x154/0x180)
      | [<c01333e4>] (d_delete) from [<c0129870>] (vfs_rmdir+0x114/0x12c)
      | [<c0129870>] (vfs_rmdir) from [<c012d644>] (do_rmdir+0x158/0x168)
      | [<c012d644>] (do_rmdir) from [<c012dd90>] (SyS_unlinkat+0x30/0x3c)
      | [<c012dd90>] (SyS_unlinkat) from [<c000ec40>] (ret_fast_syscall+0x0/0x4c)
      
      As explained by Jaegeuk Kim:
      |This inode is the directory (c.f., do_rmdir) causing a infinite loop on
      |sync_dirty_dir_inodes.
      |The sync_dirty_dir_inodes tries to flush dirty dentry pages, but if the
      |inode is under eviction, it submits bios and do it again until eviction
      |is finished.
      
      This patch adds a cond_resched() (as suggested by Jaegeuk) after a BIO
      is submitted so other thread can make progress.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      [Jaegeuk Kim: change fs/f2fs to f2fs in subject as naming convention]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7ecebe5e
    • W
      f2fs: fix max orphan inodes calculation · 14b42817
      Wanpeng Li 提交于
      cp_payload is introduced for sit bitmap to support large volume, and it is
      just after the block of f2fs_checkpoint + nat bitmap, so the first segment
      should include F2FS_CP_PACKS + NR_CURSEG_TYPE + cp_payload + orphan blocks.
      However, current max orphan inodes calculation don't consider cp_payload,
      this patch fix it by reducing the number of cp_payload from total blocks of
      the first segment when calculate max orphan inodes.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      14b42817
    • W
      f2fs: don't need to collect dirty sit entries and flush journal when there's no dirty sit entries · 2b11a74b
      Wanpeng Li 提交于
       Don't need to collect dirty sit entries and flush sit journal to sit
       entries when there's no dirty sit entries. This patch check dirty_sentries
       earlier just like flush_nat_entries.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2b11a74b
    • W
      f2fs: fix block_ops trace point · 2bda542d
      Wanpeng Li 提交于
      block operations is used to flush all dirty node and dentry blocks in
      the page cache and suspend ordinary writing activities, however, there
      are some facts such like cp error or mount read-only etc which lead to
      block operations can't be invoked. Current trace point print block_ops
      start premature even if block_ops doesn't have opportunity to execute.
      This patch fix it by move block_ops trace point just before block_ops.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2bda542d
    • J
      f2fs: check its block allocation to avoid producing wrong dirty pages · b7f204cc
      Jaegeuk Kim 提交于
      If a page is cached but its block was deallocated, we don't need to make
      the page dirty again by gc and truncate_partial_data_page.
      
      In that case, it needs to check its block allocation all the time instead
      of giving up-to-date page.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b7f204cc
    • J
      f2fs: clear page's up-to-date if block was deallocated · 2bca1e23
      Jaegeuk Kim 提交于
      If page's on-disk block was deallocated, let's remove up-to-date flag to avoid
      further access with wrong contents.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2bca1e23
    • W
      f2fs: fix the number of orphan inode blocks · 3c642985
      Wanpeng Li 提交于
      cp_pack_start_sum is calculated in do_checkpoint and is equal to
      cpu_to_le32(1 + cp_payload_blks + orphan_blocks). The number of
      orphan inode blocks is take advantage of by recover_orphan_inodes
      to readahead meta pages and recovery inodes. However, current codes
      forget to reduce the number of cp payload blocks when calculate
      the number of orphan inode blocks. This patch fix it.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3c642985
    • W
      f2fs: introduce macro __cp_payload · 55141486
      Wanpeng Li 提交于
      This patch introduce macro __cp_payload.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      55141486
    • J
      f2fs: support fs shutdown · 1abff93d
      Jaegeuk Kim 提交于
      This patch introduces a generic ioctl for fs shutdown, which was used by xfs.
      
      If this shutdown is triggered, filesystem stops any further IOs according to the
      following options.
      
      1. FS_GOING_DOWN_FULLSYNC
       : this will flush all the data and dentry blocks, and do checkpoint before
         shutdown.
      
      2. FS_GOING_DOWN_METASYNC
       : this will do checkpoint before shutdown.
      
      3. FS_GOING_DOWN_NOSYNC
       : this will trigger shutdown as is.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1abff93d
  2. 04 3月, 2015 16 次提交