1. 23 9月, 2016 1 次提交
  2. 14 9月, 2016 1 次提交
    • J
      f2fs: fix to set PageUptodate in f2fs_write_end correctly · 649d7df2
      Jaegeuk Kim 提交于
      Previously, f2fs_write_begin sets PageUptodate all the time. But, when user
      tries to update the entire page (i.e., len == PAGE_SIZE), we need to consider
      that the page is able to be copied partially afterwards. In such the case,
      we will lose the remaing region in the page.
      
      This patch fixes this by setting PageUptodate in f2fs_write_end as given copied
      result. In the short copy case, it returns zero to let generic_perform_write
      retry copying user data again.
      
      As a result, f2fs_write_end() works:
         PageUptodate      len      copied    return   retry
      1. no                4096     4096      4096     false  -> return 4096
      2. no                4096     1024      0        true   -> goto #1 case
      3. yes               2048     2048      2048     false  -> return 2048
      4. yes               2048     1024      1024     false  -> return 1024
      Suggested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      649d7df2
  3. 13 9月, 2016 1 次提交
  4. 08 9月, 2016 1 次提交
  5. 30 8月, 2016 3 次提交
  6. 19 8月, 2016 1 次提交
    • C
      Revert "f2fs: move i_size_write in f2fs_write_end" · 3024c9a1
      Chao Yu 提交于
      This reverts commit a2ee0a30.
      
      When testing with generic/032 of xfstest suit, failure message will be
      reported as below:
      
      generic/032 8s ... [failed, exit status 1] - output mismatch (see results/generic/032.out.bad)
          --- tests/generic/032.out	2015-01-11 16:52:27.643681072 +0800
          +++ results/generic/032.out.bad	2016-08-06 13:44:43.861330500 +0800
          @@ -1,5 +1,5 @@
           QA output created by 032
          -100 iterations
          -0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
          -*
          -0100000
          +1: [768..775]: unwritten
          +Unwritten extents found!
          ...
          (Run 'diff -u tests/generic/032.out results/generic/032.out.bad'  to see the entire diff)
      Ran: generic/032
      Failures: generic/032
      Failed 1 of 1 tests
      
      In write_end(), we should update i_size of inode before unlock page,
      otherwise, we will lose newly updated data in following race condition.
      
      Thread A			Thread B
      - write_end
       - unlock page
      				- writepages
      				 - lock_page
      				  - writepage
      				  if page is out-of-range of file size,
      				  we will skip writting the page.
       - update i_size
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      3024c9a1
  7. 05 8月, 2016 1 次提交
  8. 27 7月, 2016 1 次提交
    • M
      mm, memcg: use consistent gfp flags during readahead · 8a5c743e
      Michal Hocko 提交于
      Vladimir has noticed that we might declare memcg oom even during
      readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
      restriction) while __do_page_cache_readahead uses
      page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
      OOMs.  This gfp mask discrepancy is really unfortunate and easily
      fixable.  Drop page_cache_alloc_readahead() which only has one user and
      outsource the gfp_mask logic into readahead_gfp_mask and propagate this
      mask from __do_page_cache_readahead down to read_pages.
      
      This alone would have only very limited impact as most filesystems are
      implementing ->readpages and the common implementation mpage_readpages
      does GFP_KERNEL (with mapping_gfp restriction) again.  We can tell it to
      use readahead_gfp_mask instead as this function is called only during
      readahead as well.  The same applies to read_cache_pages.
      
      ext4 has its own ext4_mpage_readpages but the path which has pages !=
      NULL can use the same gfp mask.  Btrfs, cifs, f2fs and orangefs are
      doing a very similar pattern to mpage_readpages so the same can be
      applied to them as well.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [mhocko@suse.com: restrict gfp mask in mpage_alloc]
        Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Changman Lee <cm224.lee@samsung.com>
      Cc: Chao Yu <yuchao0@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a5c743e
  9. 26 7月, 2016 1 次提交
  10. 16 7月, 2016 3 次提交
    • J
      f2fs: use blk_plug in all the possible paths · 9dfa1baf
      Jaegeuk Kim 提交于
      This patch reverts 19a5f5e2 (f2fs: drop any block plugging),
      and adds blk_plug in write paths additionally.
      
      The main reason is that blk_start_plug can be used to wake up from low-power
      mode before submitting further bios.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      9dfa1baf
    • C
      f2fs: fix to avoid data update racing between GC and DIO · 82e0a5aa
      Chao Yu 提交于
      Datas in file can be operated by GC and DIO simultaneously, so we will
      face race case as below:
      
      For write case:
      Thread A				Thread B
      - generic_file_direct_write
       - invalidate_inode_pages2_range
       - f2fs_direct_IO
        - do_blockdev_direct_IO
         - do_direct_IO
          - get_more_blocks
      					- f2fs_gc
      					 - do_garbage_collect
      					  - gc_data_segment
      					   - move_data_page
      					    - do_write_data_page
      					    migrate data block to new block address
         - dio_bio_submit
         update user data to old block address
      
      For read case:
      Thread A                                Thread B
      - generic_file_direct_write
       - invalidate_inode_pages2_range
       - f2fs_direct_IO
        - do_blockdev_direct_IO
         - do_direct_IO
          - get_more_blocks
      					- f2fs_balance_fs
      					 - f2fs_gc
      					  - do_garbage_collect
      					   - gc_data_segment
      					    - move_data_page
      					     - do_write_data_page
      					     migrate data block to new block address
      					  - write_checkpoint
      					   - do_checkpoint
      					    - clear_prefree_segments
      					     - f2fs_issue_discard
                                                   discard old block adress
         - dio_bio_submit
         update user buffer from obsolete block address
      
      In order to fix this, for one file, we should let DIO and GC getting exclusion
      against with each other.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      82e0a5aa
    • J
      f2fs: fix ERR_PTR returned by bio · 1d353eb7
      Jaegeuk Kim 提交于
      This is to fix wrong error pointer handling flow reported by Dan.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NChao Yu <chao@kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1d353eb7
  11. 09 7月, 2016 5 次提交
  12. 07 7月, 2016 2 次提交
  13. 14 6月, 2016 1 次提交
  14. 09 6月, 2016 2 次提交
  15. 08 6月, 2016 2 次提交
  16. 03 6月, 2016 9 次提交
  17. 21 5月, 2016 1 次提交
  18. 19 5月, 2016 1 次提交
  19. 12 5月, 2016 2 次提交
    • C
      f2fs: fix deadlock when flush inline data · ab47036d
      Chao Yu 提交于
      Below backtrace info was reported by Yunlei He:
      
      Call Trace:
       [<ffffffff817a9395>] schedule+0x35/0x80
       [<ffffffff817abb7d>] rwsem_down_read_failed+0xed/0x130
       [<ffffffff813c12a8>] call_rwsem_down_read_failed+0x18/0x
       [<ffffffff817ab1d0>] down_read+0x20/0x30
       [<ffffffffa02a1a12>] f2fs_evict_inode+0x242/0x3a0 [f2fs]
       [<ffffffff81217057>] evict+0xc7/0x1a0
       [<ffffffff81217cd6>] iput+0x196/0x200
       [<ffffffff812134f9>] __dentry_kill+0x179/0x1e0
       [<ffffffff812136f9>] dput+0x199/0x1f0
       [<ffffffff811fe77b>] __fput+0x18b/0x220
       [<ffffffff811fe84e>] ____fput+0xe/0x10
       [<ffffffff81097427>] task_work_run+0x77/0x90
       [<ffffffff81074d62>] exit_to_usermode_loop+0x73/0xa2
       [<ffffffff81003b7a>] do_syscall_64+0xfa/0x110
       [<ffffffff817acf65>] entry_SYSCALL64_slow_path+0x25/0x25
      
      Call Trace:
       [<ffffffff817a9395>] schedule+0x35/0x80
       [<ffffffff81216dc3>] __wait_on_freeing_inode+0xa3/0xd0
       [<ffffffff810bc300>] ? autoremove_wake_function+0x40/0x4
       [<ffffffff8121771d>] find_inode_fast+0x7d/0xb0
       [<ffffffff8121794a>] ilookup+0x6a/0xd0
       [<ffffffffa02bc740>] sync_node_pages+0x210/0x650 [f2fs]
       [<ffffffff8122e690>] ? do_fsync+0x70/0x70
       [<ffffffffa02b085e>] block_operations+0x9e/0xf0 [f2fs]
       [<ffffffff8137b795>] ? bio_endio+0x55/0x60
       [<ffffffffa02b0942>] write_checkpoint+0x92/0xba0 [f2fs]
       [<ffffffff8117da57>] ? mempool_free_slab+0x17/0x20
       [<ffffffff8117de8b>] ? mempool_free+0x2b/0x80
       [<ffffffff8122e690>] ? do_fsync+0x70/0x70
       [<ffffffffa02a53e3>] f2fs_sync_fs+0x63/0xd0 [f2fs]
       [<ffffffff8129630f>] ? ext4_sync_fs+0xbf/0x190
       [<ffffffff8122e6b0>] sync_fs_one_sb+0x20/0x30
       [<ffffffff812002e9>] iterate_supers+0xb9/0x110
       [<ffffffff8122e7b5>] sys_sync+0x55/0x90
       [<ffffffff81003ae9>] do_syscall_64+0x69/0x110
       [<ffffffff817acf65>] entry_SYSCALL64_slow_path+0x25/0x25
      
      With following excuting serials, we will set inline_node in inode page
      after inode was unlinked, result in a deadloop described as below:
      1. open file
      2. write file
      3. unlink file
      4. write file
      5. close file
      
      Thread A				Thread B
       - dput
        - iput_final
         - inode->i_state |= I_FREEING
         - evict
          - f2fs_evict_inode
      					 - f2fs_sync_fs
      					  - write_checkpoint
      					   - block_operations
      					    - f2fs_lock_all (down_write(cp_rwsem))
           - f2fs_lock_op (down_read(cp_rwsem))
      					    - sync_node_pages
      					     - ilookup
      					      - find_inode_fast
      					       - __wait_on_freeing_inode
      					         (wait on I_FREEING clear)
      
      Here, we change to set inline_node flag only for linked inode for fixing.
      Reported-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Tested-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Cc: stable@vger.kernel.org # v4.6
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ab47036d
    • C
      f2fs: support in batch multi blocks preallocation · 46008c6d
      Chao Yu 提交于
      This patch introduces reserve_new_blocks to make preallocation of multi
      blocks as in batch operation, so it can avoid lots of redundant
      operation, result in better performance.
      
      In virtual machine, with rotational device:
      
      time fallocate -l 32G /mnt/f2fs/file
      
      Before:
      real	0m4.584s
      user	0m0.000s
      sys	0m4.580s
      
      After:
      real	0m0.292s
      user	0m0.000s
      sys	0m0.272s
      
      In x86, with SSD:
      
      time fallocate -l 500G $MNT/testfile
      
      Before : 24.758 s
      After  :  1.604 s
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix bugs and add performance numbers measured in x86.]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      46008c6d
  20. 08 5月, 2016 1 次提交