1. 23 2月, 2016 14 次提交
    • J
      f2fs: use wait_for_stable_page to avoid contention · fec1d657
      Jaegeuk Kim 提交于
      In write_begin, if storage supports stable_page, we don't need to wait for
      writeback to update its contents.
      This patch introduces to use wait_for_stable_page instead of
      wait_on_page_writeback.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      fec1d657
    • C
      f2fs: enhance foreground GC · 718e53fa
      Chao Yu 提交于
      If we configure section consist of multiple segments, foreground GC will
      do the garbage collection with following approach:
      
      	for each segment in victim section
      		blk_start_plug
      		for each valid block in segment
      			write out by OPU method
      		submit bio cache   <---
      		blk_finish_plug   <---
      
      There are two issue:
      1) for most of the time, 'submit bio cache' will break the merging in
      current bio buffer from writes of next segments, making a smaller bio
      submitting.
      2) block plug only cover IO submitting in one segment, which reduce
      opportunity of merging IOs in plug with multiple segments.
      
      So refactor the code as below structure to strive for biggest
      opportunity of merging IOs:
      
      	blk_start_plug
      	for each segment in victim section
      		for each valid block in segment
      			write out by OPU method
      	submit bio cache
      	blk_finish_plug
      
      Test method:
      1. mkfs.f2fs -s 8 /dev/sdX
      2. touch 32 files
      3. write 2M data into each file
      4. punch 1.5M data from offset 0 for each file
      5. trigger foreground gc through ioctl
      
      Before patch, there are totoally 40 bios submitted.
      f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
      f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
      f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
      f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
      f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
      ----repeat for 8 times
      
      After patch, there are totally 35 bios submitted.
      f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
      ----repeat 34 times
      f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      718e53fa
    • J
      f2fs: don't need to call set_page_dirty for io error · e3ef1876
      Jaegeuk Kim 提交于
      If end_io gets an error, we don't need to set the page as dirty, since we
      already set f2fs_stop_checkpoint which will not flush any data.
      
      This will resolve the following warning.
      
      ======================================================
      [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
      4.4.0+ #9 Tainted: G           O
      ------------------------------------------------------
      xfs_io/26773 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
       (&(&sbi->inode_lock[i])->rlock){+.+...}, at: [<ffffffffc025483f>] update_dirty_page+0x6f/0xd0 [f2fs]
      
      and this task is already holding:
       (&(&q->__queue_lock)->rlock){-.-.-.}, at: [<ffffffff81396ea2>] blk_queue_bio+0x422/0x490
      which would create a new lock dependency:
       (&(&q->__queue_lock)->rlock){-.-.-.} -> (&(&sbi->inode_lock[i])->rlock){+.+...}
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      e3ef1876
    • J
      f2fs: avoid needless sync_inode_page when reading inline_data · ae96e7bd
      Jaegeuk Kim 提交于
      In write_begin, if there is an inline_data, f2fs loads it into 0'th data page.
      Since it's the read path, we don't need to sync its inode page.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ae96e7bd
    • J
      f2fs: don't need to sync node page at every time · 52f80337
      Jaegeuk Kim 提交于
      In write_end, we don't need to sync inode page at every time.
      Instead, we can expect f2fs_write_inode will update later.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      52f80337
    • J
      f2fs: avoid multiple node page writes due to inline_data · 2049d4fc
      Jaegeuk Kim 提交于
      The sceanrio is:
      1. create fully node blocks
      2. flush node blocks
      3. write inline_data for all the node blocks again
      4. flush node blocks redundantly
      
      So, this patch tries to flush inline_data when flushing node blocks.
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2049d4fc
    • J
      f2fs: do f2fs_balance_fs when block is allocated · 3c082b7b
      Jaegeuk Kim 提交于
      We should consider data block allocation to trigger f2fs_balance_fs.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3c082b7b
    • J
      f2fs: fix to overcome inline_data floods · 6e17bfbc
      Jaegeuk Kim 提交于
      The scenario is:
      1. create lots of node blocks
      2. sync
      3. write lots of inline_data
      -> got panic due to no free space
      
      In that case, we should flush node blocks when writing inline_data in #3,
      and trigger gc as well.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6e17bfbc
    • J
      f2fs: use writepages->lock for WB_SYNC_ALL · 25c13551
      Jaegeuk Kim 提交于
      If there are many writepages calls by multiple threads in background, we don't
      need to serialize to merge all the bios, since it's background.
      In such the case, it'd better to run writepages concurrently.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      25c13551
    • J
      f2fs: remove needless condition check · b483fadf
      Jaegeuk Kim 提交于
      This patch removes needless condition variable.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b483fadf
    • C
      f2fs: correct search area in get_new_segment · 0ab14356
      Chao Yu 提交于
      get_new_segment starts from current segment position, tries to search a
      free segment among its right neighbors locate in same section.
      
      But previously our search area was set as [current segment, max segment],
      which means we have to search to more bits in free_segmap bitmap for some
      worse cases. So here we correct the search area to [current segment, last
      segment in section] to avoid unnecessary searching.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0ab14356
    • C
      f2fs: export dirty_nats_ratio in sysfs · 2304cb0c
      Chao Yu 提交于
      This patch exports a new sysfs entry 'dirty_nat_ratio' to control threshold
      of dirty nat entries, if current ratio exceeds configured threshold,
      checkpoint will be triggered in f2fs_balance_fs_bg for flushing dirty nats.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      2304cb0c
    • C
      f2fs: flush dirty nat entries when exceeding threshold · 7d768d2c
      Chao Yu 提交于
      When testing f2fs with xfstest, generic/251 is stuck for long time,
      the case uses below serials to obtain fresh released space in device,
      in order to prepare for following fstrim test.
      
      1. rm -rf /mnt/dir
      2. mkdir /mnt/dir/
      3. cp -axT `pwd`/ /mnt/dir/
      4. goto 1
      
      During preparing step, all nat entries will be cached in nat cache,
      most of them are dirty entries with invalid blkaddr, which means
      nodes related to these entries have been truncated, and they could
      be reused after the dirty entries been checkpointed.
      
      However, there was no checkpoint been triggered, so nid allocators
      (e.g. mkdir, creat) will run into long journey of iterating all NAT
      pages, looking for free nids in alloc_nid->build_free_nids.
      
      Here, in f2fs_balance_fs_bg we give another chance to do checkpoint
      to flush nat entries for reusing them in free nid cache when dirty
      entry count exceeds 10% of max count.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7d768d2c
    • C
      f2fs: relocate is_merged_page · 0fd785eb
      Chao Yu 提交于
      Operations in is_merged_page is related to inner bio cache, move it to
      data.c.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0fd785eb
  2. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  3. 15 1月, 2016 1 次提交
    • V
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov 提交于
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d097056
  4. 12 1月, 2016 5 次提交
  5. 09 1月, 2016 9 次提交
  6. 07 1月, 2016 4 次提交
  7. 04 1月, 2016 1 次提交
  8. 01 1月, 2016 5 次提交
    • C
      f2fs crypto: check CONFIG_F2FS_FS_XATTR for encrypted symlink · 3a9e6433
      Chao Yu 提交于
      Add missed CONFIG_F2FS_FS_XATTR for encrypted symlink inode in order
      to avoid unneeded registry of ->{get,set,remove}xattr.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3a9e6433
    • J
      f2fs: introduce zombie list for fast shrinking extent trees · 137d09f0
      Jaegeuk Kim 提交于
      This patch removes refcount, and instead, adds zombie_list to shrink directly
      without radix tree traverse.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      137d09f0
    • J
      f2fs: monitor zombie_tree count · c00ba554
      Jaegeuk Kim 提交于
      This patch adds an entry to show the number of zombie extent_tree.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c00ba554
    • J
      f2fs: use IPU for fdatasync · c46a155b
      Jaegeuk Kim 提交于
      This patch fixes missing IPU condition when fdatasync is called.
      With this patch, fdatasync is able to avoid additional node writes for recovery.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      c46a155b
    • J
      f2fs: write pending bios when cp_error is set · 8d4ea29b
      Jaegeuk Kim 提交于
      When testing ioc_shutdown, put_super is able to be hanged by waiting for
      writebacking pages as follows.
      
      INFO: task umount:2723 blocked for more than 120 seconds.
            Tainted: G           O    4.4.0-rc3+ #8
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      umount          D ffff88000859f9d8     0  2723   2110 0x00000000
       ffff88000859f9d8 0000000000000000 0000000000000000 ffffffff81e11540
       ffff880078c225c0 ffff8800085a0000 ffff88007fc17440 7fffffffffffffff
       ffffffff818239f0 ffff88000859fb48 ffff88000859f9f0 ffffffff8182310c
      Call Trace:
       [<ffffffff818239f0>] ? bit_wait+0x50/0x50
       [<ffffffff8182310c>] schedule+0x3c/0x90
       [<ffffffff81827fb9>] schedule_timeout+0x2d9/0x430
       [<ffffffff810e0f8f>] ? mark_held_locks+0x6f/0xa0
       [<ffffffff8111614d>] ? ktime_get+0x7d/0x140
       [<ffffffff818239f0>] ? bit_wait+0x50/0x50
       [<ffffffff8106a655>] ? kvm_clock_get_cycles+0x25/0x30
       [<ffffffff8111617c>] ? ktime_get+0xac/0x140
       [<ffffffff818239f0>] ? bit_wait+0x50/0x50
       [<ffffffff81822564>] io_schedule_timeout+0xa4/0x110
       [<ffffffff81823a25>] bit_wait_io+0x35/0x50
       [<ffffffff818235bd>] __wait_on_bit+0x5d/0x90
       [<ffffffff811b9e8b>] wait_on_page_bit+0xcb/0xf0
       [<ffffffff810d5f90>] ? autoremove_wake_function+0x40/0x40
       [<ffffffff811cf84c>] truncate_inode_pages_range+0x4bc/0x840
       [<ffffffff811cfc3d>] truncate_inode_pages_final+0x4d/0x60
       [<ffffffffc023ced5>] f2fs_evict_inode+0x75/0x400 [f2fs]
       [<ffffffff812639bc>] evict+0xbc/0x190
       [<ffffffff81263d19>] iput+0x229/0x2c0
       [<ffffffffc0241885>] f2fs_put_super+0x105/0x1a0 [f2fs]
       [<ffffffff8124756a>] generic_shutdown_super+0x6a/0xf0
       [<ffffffff812478f7>] kill_block_super+0x27/0x70
       [<ffffffffc0241290>] kill_f2fs_super+0x20/0x30 [f2fs]
       [<ffffffff81247b03>] deactivate_locked_super+0x43/0x70
       [<ffffffff81247f4c>] deactivate_super+0x5c/0x60
       [<ffffffff81268d2f>] cleanup_mnt+0x3f/0x90
       [<ffffffff81268dc2>] __cleanup_mnt+0x12/0x20
       [<ffffffff810ac463>] task_work_run+0x73/0xa0
       [<ffffffff810032ac>] exit_to_usermode_loop+0xcc/0xd0
       [<ffffffff81003e7c>] syscall_return_slowpath+0xcc/0xe0
       [<ffffffff81829ea2>] int_ret_from_sys_call+0x25/0x9f
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      8d4ea29b