1. 11 4月, 2015 11 次提交
    • C
      f2fs: add stat info for moved blocks by background gc · e1235983
      Changman Lee 提交于
      This patch is for looking into gc performance of f2fs in detail.
      Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
      [Jaegeuk Kim: fix build errors]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e1235983
    • C
      f2fs: fix to issue small discard in real-time mode discard · b28c3f94
      Chao Yu 提交于
      Now in f2fs, we share functions and structures for batch mode and real-time mode
      discard. For real-time mode discard, in shared function add_discard_addrs, we
      will use uninitialized trim_minlen in struct cp_control to compare with length
      of contiguous free blocks to decide whether skipping discard fragmented freespace
      or not, this makes us ignore small discard sometimes. Fix it.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Reviewed-by : Changman Lee <cm224.lee@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b28c3f94
    • S
      f2fs: add cond_resched() to sync_dirty_dir_inodes() · 7ecebe5e
      Sebastian Andrzej Siewior 提交于
      In a preempt-off enviroment a alot of FS activity (write/delete) I run
      into a CPU stall:
      
      | NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u2:2:59]
      | Modules linked in:
      | CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G        W      3.19.0-00010-g10c11c51ffed #153
      | Workqueue: writeback bdi_writeback_workfn (flush-179:0)
      | task: df230000 ti: df23e000 task.ti: df23e000
      | PC is at __submit_merged_bio+0x6c/0x110
      | LR is at f2fs_submit_merged_bio+0x74/0x80
      …
      | [<c00085c4>] (gic_handle_irq) from [<c0012e84>] (__irq_svc+0x44/0x5c)
      | Exception stack(0xdf23fb48 to 0xdf23fb90)
      | fb40:                   deef3484 ffff0001 ffff0001 00000027 deef3484 00000000
      | fb60: deef3440 00000000 de426000 deef34ec deefc440 df23fbb4 df23fbb8 df23fb90
      | fb80: c02191f0 c0218fa0 60000013 ffffffff
      | [<c0012e84>] (__irq_svc) from [<c0218fa0>] (__submit_merged_bio+0x6c/0x110)
      | [<c0218fa0>] (__submit_merged_bio) from [<c02191f0>] (f2fs_submit_merged_bio+0x74/0x80)
      | [<c02191f0>] (f2fs_submit_merged_bio) from [<c021624c>] (sync_dirty_dir_inodes+0x70/0x78)
      | [<c021624c>] (sync_dirty_dir_inodes) from [<c0216358>] (write_checkpoint+0x104/0xc10)
      | [<c0216358>] (write_checkpoint) from [<c021231c>] (f2fs_sync_fs+0x80/0xbc)
      | [<c021231c>] (f2fs_sync_fs) from [<c0221eb8>] (f2fs_balance_fs_bg+0x4c/0x68)
      | [<c0221eb8>] (f2fs_balance_fs_bg) from [<c021e9b8>] (f2fs_write_node_pages+0x40/0x110)
      | [<c021e9b8>] (f2fs_write_node_pages) from [<c00de620>] (do_writepages+0x34/0x48)
      | [<c00de620>] (do_writepages) from [<c0145714>] (__writeback_single_inode+0x50/0x228)
      | [<c0145714>] (__writeback_single_inode) from [<c0146184>] (writeback_sb_inodes+0x1a8/0x378)
      | [<c0146184>] (writeback_sb_inodes) from [<c01463e4>] (__writeback_inodes_wb+0x90/0xc8)
      | [<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>] (wb_writeback+0x1dc/0x28c)
      | [<c01465f8>] (wb_writeback) from [<c0146dd8>] (bdi_writeback_workfn+0x2ac/0x460)
      | [<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>] (process_one_work+0x11c/0x3a4)
      | [<c003c3fc>] (process_one_work) from [<c003c844>] (worker_thread+0x17c/0x490)
      | [<c003c844>] (worker_thread) from [<c0041398>] (kthread+0xec/0x100)
      | [<c0041398>] (kthread) from [<c000ed10>] (ret_from_fork+0x14/0x24)
      
      As it turns out, the code loops in sync_dirty_dir_inodes() and waits for
      others to make progress but since it never leaves the CPU there is no
      progress made. At the time of this stall, there is also a rm process
      blocked:
      | rm              R running      0  1989   1774 0x00000000
      | [<c047c55c>] (__schedule) from [<c00486dc>] (__cond_resched+0x30/0x4c)
      | [<c00486dc>] (__cond_resched) from [<c047c8c8>] (_cond_resched+0x4c/0x54)
      | [<c047c8c8>] (_cond_resched) from [<c00e1aec>] (truncate_inode_pages_range+0x1f0/0x5e8)
      | [<c00e1aec>] (truncate_inode_pages_range) from [<c00e1fd8>] (truncate_inode_pages+0x28/0x30)
      | [<c00e1fd8>] (truncate_inode_pages) from [<c00e2148>] (truncate_inode_pages_final+0x60/0x64)
      | [<c00e2148>] (truncate_inode_pages_final) from [<c020c92c>] (f2fs_evict_inode+0x4c/0x268)
      | [<c020c92c>] (f2fs_evict_inode) from [<c0137214>] (evict+0x94/0x140)
      | [<c0137214>] (evict) from [<c01377e8>] (iput+0xc8/0x134)
      | [<c01377e8>] (iput) from [<c01333e4>] (d_delete+0x154/0x180)
      | [<c01333e4>] (d_delete) from [<c0129870>] (vfs_rmdir+0x114/0x12c)
      | [<c0129870>] (vfs_rmdir) from [<c012d644>] (do_rmdir+0x158/0x168)
      | [<c012d644>] (do_rmdir) from [<c012dd90>] (SyS_unlinkat+0x30/0x3c)
      | [<c012dd90>] (SyS_unlinkat) from [<c000ec40>] (ret_fast_syscall+0x0/0x4c)
      
      As explained by Jaegeuk Kim:
      |This inode is the directory (c.f., do_rmdir) causing a infinite loop on
      |sync_dirty_dir_inodes.
      |The sync_dirty_dir_inodes tries to flush dirty dentry pages, but if the
      |inode is under eviction, it submits bios and do it again until eviction
      |is finished.
      
      This patch adds a cond_resched() (as suggested by Jaegeuk) after a BIO
      is submitted so other thread can make progress.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      [Jaegeuk Kim: change fs/f2fs to f2fs in subject as naming convention]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7ecebe5e
    • W
      f2fs: fix max orphan inodes calculation · 14b42817
      Wanpeng Li 提交于
      cp_payload is introduced for sit bitmap to support large volume, and it is
      just after the block of f2fs_checkpoint + nat bitmap, so the first segment
      should include F2FS_CP_PACKS + NR_CURSEG_TYPE + cp_payload + orphan blocks.
      However, current max orphan inodes calculation don't consider cp_payload,
      this patch fix it by reducing the number of cp_payload from total blocks of
      the first segment when calculate max orphan inodes.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      14b42817
    • W
      f2fs: don't need to collect dirty sit entries and flush journal when there's no dirty sit entries · 2b11a74b
      Wanpeng Li 提交于
       Don't need to collect dirty sit entries and flush sit journal to sit
       entries when there's no dirty sit entries. This patch check dirty_sentries
       earlier just like flush_nat_entries.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2b11a74b
    • W
      f2fs: fix block_ops trace point · 2bda542d
      Wanpeng Li 提交于
      block operations is used to flush all dirty node and dentry blocks in
      the page cache and suspend ordinary writing activities, however, there
      are some facts such like cp error or mount read-only etc which lead to
      block operations can't be invoked. Current trace point print block_ops
      start premature even if block_ops doesn't have opportunity to execute.
      This patch fix it by move block_ops trace point just before block_ops.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2bda542d
    • J
      f2fs: check its block allocation to avoid producing wrong dirty pages · b7f204cc
      Jaegeuk Kim 提交于
      If a page is cached but its block was deallocated, we don't need to make
      the page dirty again by gc and truncate_partial_data_page.
      
      In that case, it needs to check its block allocation all the time instead
      of giving up-to-date page.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b7f204cc
    • J
      f2fs: clear page's up-to-date if block was deallocated · 2bca1e23
      Jaegeuk Kim 提交于
      If page's on-disk block was deallocated, let's remove up-to-date flag to avoid
      further access with wrong contents.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2bca1e23
    • W
      f2fs: fix the number of orphan inode blocks · 3c642985
      Wanpeng Li 提交于
      cp_pack_start_sum is calculated in do_checkpoint and is equal to
      cpu_to_le32(1 + cp_payload_blks + orphan_blocks). The number of
      orphan inode blocks is take advantage of by recover_orphan_inodes
      to readahead meta pages and recovery inodes. However, current codes
      forget to reduce the number of cp payload blocks when calculate
      the number of orphan inode blocks. This patch fix it.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3c642985
    • W
      f2fs: introduce macro __cp_payload · 55141486
      Wanpeng Li 提交于
      This patch introduce macro __cp_payload.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      55141486
    • J
      f2fs: support fs shutdown · 1abff93d
      Jaegeuk Kim 提交于
      This patch introduces a generic ioctl for fs shutdown, which was used by xfs.
      
      If this shutdown is triggered, filesystem stops any further IOs according to the
      following options.
      
      1. FS_GOING_DOWN_FULLSYNC
       : this will flush all the data and dentry blocks, and do checkpoint before
         shutdown.
      
      2. FS_GOING_DOWN_METASYNC
       : this will do checkpoint before shutdown.
      
      3. FS_GOING_DOWN_NOSYNC
       : this will trigger shutdown as is.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1abff93d
  2. 04 3月, 2015 19 次提交
  3. 01 3月, 2015 1 次提交
    • R
      nilfs2: fix potential memory overrun on inode · 957ed60b
      Ryusuke Konishi 提交于
      Each inode of nilfs2 stores a root node of a b-tree, and it turned out to
      have a memory overrun issue:
      
      Each b-tree node of nilfs2 stores a set of key-value pairs and the number
      of them (in "bn_nchildren" member of nilfs_btree_node struct), as well as
      a few other "bn_*" members.
      
      Since the value of "bn_nchildren" is used for operations on the key-values
      within the b-tree node, it can cause memory access overrun if a large
      number is incorrectly set to "bn_nchildren".
      
      For instance, nilfs_btree_node_lookup() function determines the range of
      binary search with it, and too large "bn_nchildren" leads
      nilfs_btree_node_get_key() in that function to overrun.
      
      As for intermediate b-tree nodes, this is prevented by a sanity check
      performed when each node is read from a drive, however, no sanity check
      has been done for root nodes stored in inodes.
      
      This patch fixes the issue by adding missing sanity check against b-tree
      root nodes so that it's called when on-memory inodes are read from ifile,
      inode metadata file.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      957ed60b
  4. 24 2月, 2015 2 次提交
  5. 23 2月, 2015 7 次提交
    • D
      xfs: ensure truncate forces zeroed blocks to disk · 5885ebda
      Dave Chinner 提交于
      A new fsync vs power fail test in xfstests indicated that XFS can
      have unreliable data consistency when doing extending truncates that
      require block zeroing. The blocks beyond EOF get zeroed in memory,
      but we never force those changes to disk before we run the
      transaction that extends the file size and exposes those blocks to
      userspace. This can result in the blocks not being correctly zeroed
      after a crash.
      
      Because in-memory behaviour is correct, tools like fsx don't pick up
      any coherency problems - it's not until the filesystem is shutdown
      or the system crashes after writing the truncate transaction to the
      journal but before the zeroed data in the page cache is flushed that
      the issue is exposed.
      
      Fix this by also flushing the dirty data in memory region between
      the old size and new size when we've found blocks that need zeroing
      in the truncate process.
      Reported-by: NLiu Bo <bo.li.liu@oracle.com>
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5885ebda
    • J
      xfs: Fix quota type in quota structures when reusing quota file · dfcc70a8
      Jan Kara 提交于
      For filesystems without separate project quota inode field in the
      superblock we just reuse project quota file for group quotas (and vice
      versa) if project quota file is allocated and we need group quota file.
      When we reuse the file, quota structures on disk suddenly have wrong
      type stored in d_flags though. Nobody really cares about this (although
      structure type reported to userspace was wrong as well) except
      that after commit 14bf61ff (quota: Switch ->get_dqblk() and
      ->set_dqblk() to use bytes as space units) assertion in
      xfs_qm_scall_getquota() started to trigger on xfs/106 test (apparently I
      was testing without XFS_DEBUG so I didn't notice when submitting the
      above commit).
      
      Fix the problem by properly resetting ddq->d_flags when running quotacheck
      for a quota file.
      
      CC: stable@vger.kernel.org
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      dfcc70a8
    • A
      autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation · 0a280962
      Al Viro 提交于
      X-Coverup: just ask spender
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0a280962
    • A
      procfs: fix race between symlink removals and traversals · 7e0e953b
      Al Viro 提交于
      use_pde()/unuse_pde() in ->follow_link()/->put_link() resp.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7e0e953b
    • A
      debugfs: leave freeing a symlink body until inode eviction · 0db59e59
      Al Viro 提交于
      As it is, we have debugfs_remove() racing with symlink traversals.
      Supply ->evict_inode() and do freeing there - inode will remain
      pinned until we are done with the symlink body.
      
      And rip the idiocy with checking if dentry is positive right after
      we'd verified debugfs_positive(), which is a stronger check...
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0db59e59
    • K
      trylock_super(): replacement for grab_super_passive() · eb6ef3df
      Konstantin Khlebnikov 提交于
      I've noticed significant locking contention in memory reclaimer around
      sb_lock inside grab_super_passive(). Grab_super_passive() is called from
      two places: in icache/dcache shrinkers (function super_cache_scan) and
      from writeback (function __writeback_inodes_wb). Both are required for
      progress in memory allocator.
      
      Grab_super_passive() acquires sb_lock to increment sb->s_count and check
      sb->s_instances. It seems sb->s_umount locked for read is enough here:
      super-block deactivation always runs under sb->s_umount locked for write.
      Protecting super-block itself isn't a problem: in super_cache_scan() sb
      is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers
      are still active. Inside writeback super-block comes from inode from bdi
      writeback list under wb->list_lock.
      
      This patch removes locking sb_lock and checks s_instances under s_umount:
      generic_shutdown_super() unlinks it under sb->s_umount locked for write.
      New variant is called trylock_super() and since it only locks semaphore,
      callers must call up_read(&sb->s_umount) instead of drop_super(sb) when
      they're done.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eb6ef3df
    • D
      fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions · 54f2a2f4
      David Howells 提交于
      Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup()
      rather than d_is_dir() when checking a dir watch and give an error on fake
      directories.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      54f2a2f4