1. 09 7月, 2016 2 次提交
    • C
      f2fs: fix to detect truncation prior rather than EIO during read · 1563ac75
      Chao Yu 提交于
      In procedure of synchonized read, after sending out the read request, reader
      will try to lock the page for waiting device to finish the read jobs and
      unlock the page, but meanwhile, truncater will race with reader, so after
      reader get lock of the page, it should check page's mapping to detect
      whether someone has truncated the page in advance, then reader has the
      chance to do the retry if truncation was done, otherwise read can be failed
      due to previous condition check.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1563ac75
    • C
      f2fs: fix to avoid reading out encrypted data in page cache · 78682f79
      Chao Yu 提交于
      For encrypted inode, if user overwrites data of the inode, f2fs will read
      encrypted data into page cache, and then do the decryption.
      
      However reader can race with overwriter, and it will see encrypted data
      which has not been decrypted by overwriter yet. Fix it by moving decrypting
      work to background and keep page non-uptodated until data is decrypted.
      
      Thread A				Thread B
      - f2fs_file_write_iter
       - __generic_file_write_iter
        - generic_perform_write
         - f2fs_write_begin
          - f2fs_submit_page_bio
      					- generic_file_read_iter
      					 - do_generic_file_read
      					  - lock_page_killable
      					  - unlock_page
      					  - copy_page_to_iter
      					  hit the encrypted data in updated page
          - lock_page
          - fscrypt_decrypt_page
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      78682f79
  2. 07 7月, 2016 2 次提交
  3. 14 6月, 2016 1 次提交
  4. 09 6月, 2016 2 次提交
  5. 03 6月, 2016 9 次提交
  6. 21 5月, 2016 1 次提交
  7. 19 5月, 2016 1 次提交
  8. 12 5月, 2016 2 次提交
    • C
      f2fs: fix deadlock when flush inline data · ab47036d
      Chao Yu 提交于
      Below backtrace info was reported by Yunlei He:
      
      Call Trace:
       [<ffffffff817a9395>] schedule+0x35/0x80
       [<ffffffff817abb7d>] rwsem_down_read_failed+0xed/0x130
       [<ffffffff813c12a8>] call_rwsem_down_read_failed+0x18/0x
       [<ffffffff817ab1d0>] down_read+0x20/0x30
       [<ffffffffa02a1a12>] f2fs_evict_inode+0x242/0x3a0 [f2fs]
       [<ffffffff81217057>] evict+0xc7/0x1a0
       [<ffffffff81217cd6>] iput+0x196/0x200
       [<ffffffff812134f9>] __dentry_kill+0x179/0x1e0
       [<ffffffff812136f9>] dput+0x199/0x1f0
       [<ffffffff811fe77b>] __fput+0x18b/0x220
       [<ffffffff811fe84e>] ____fput+0xe/0x10
       [<ffffffff81097427>] task_work_run+0x77/0x90
       [<ffffffff81074d62>] exit_to_usermode_loop+0x73/0xa2
       [<ffffffff81003b7a>] do_syscall_64+0xfa/0x110
       [<ffffffff817acf65>] entry_SYSCALL64_slow_path+0x25/0x25
      
      Call Trace:
       [<ffffffff817a9395>] schedule+0x35/0x80
       [<ffffffff81216dc3>] __wait_on_freeing_inode+0xa3/0xd0
       [<ffffffff810bc300>] ? autoremove_wake_function+0x40/0x4
       [<ffffffff8121771d>] find_inode_fast+0x7d/0xb0
       [<ffffffff8121794a>] ilookup+0x6a/0xd0
       [<ffffffffa02bc740>] sync_node_pages+0x210/0x650 [f2fs]
       [<ffffffff8122e690>] ? do_fsync+0x70/0x70
       [<ffffffffa02b085e>] block_operations+0x9e/0xf0 [f2fs]
       [<ffffffff8137b795>] ? bio_endio+0x55/0x60
       [<ffffffffa02b0942>] write_checkpoint+0x92/0xba0 [f2fs]
       [<ffffffff8117da57>] ? mempool_free_slab+0x17/0x20
       [<ffffffff8117de8b>] ? mempool_free+0x2b/0x80
       [<ffffffff8122e690>] ? do_fsync+0x70/0x70
       [<ffffffffa02a53e3>] f2fs_sync_fs+0x63/0xd0 [f2fs]
       [<ffffffff8129630f>] ? ext4_sync_fs+0xbf/0x190
       [<ffffffff8122e6b0>] sync_fs_one_sb+0x20/0x30
       [<ffffffff812002e9>] iterate_supers+0xb9/0x110
       [<ffffffff8122e7b5>] sys_sync+0x55/0x90
       [<ffffffff81003ae9>] do_syscall_64+0x69/0x110
       [<ffffffff817acf65>] entry_SYSCALL64_slow_path+0x25/0x25
      
      With following excuting serials, we will set inline_node in inode page
      after inode was unlinked, result in a deadloop described as below:
      1. open file
      2. write file
      3. unlink file
      4. write file
      5. close file
      
      Thread A				Thread B
       - dput
        - iput_final
         - inode->i_state |= I_FREEING
         - evict
          - f2fs_evict_inode
      					 - f2fs_sync_fs
      					  - write_checkpoint
      					   - block_operations
      					    - f2fs_lock_all (down_write(cp_rwsem))
           - f2fs_lock_op (down_read(cp_rwsem))
      					    - sync_node_pages
      					     - ilookup
      					      - find_inode_fast
      					       - __wait_on_freeing_inode
      					         (wait on I_FREEING clear)
      
      Here, we change to set inline_node flag only for linked inode for fixing.
      Reported-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Tested-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Cc: stable@vger.kernel.org # v4.6
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ab47036d
    • C
      f2fs: support in batch multi blocks preallocation · 46008c6d
      Chao Yu 提交于
      This patch introduces reserve_new_blocks to make preallocation of multi
      blocks as in batch operation, so it can avoid lots of redundant
      operation, result in better performance.
      
      In virtual machine, with rotational device:
      
      time fallocate -l 32G /mnt/f2fs/file
      
      Before:
      real	0m4.584s
      user	0m0.000s
      sys	0m4.580s
      
      After:
      real	0m0.292s
      user	0m0.000s
      sys	0m0.272s
      
      In x86, with SSD:
      
      time fallocate -l 500G $MNT/testfile
      
      Before : 24.758 s
      After  :  1.604 s
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      [Jaegeuk Kim: fix bugs and add performance numbers measured in x86.]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      46008c6d
  9. 08 5月, 2016 2 次提交
  10. 04 5月, 2016 1 次提交
  11. 02 5月, 2016 1 次提交
  12. 27 4月, 2016 2 次提交
  13. 15 4月, 2016 1 次提交
  14. 13 4月, 2016 1 次提交
  15. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  16. 18 3月, 2016 1 次提交
    • J
      fs crypto: move per-file encryption from f2fs tree to fs/crypto · 0b81d077
      Jaegeuk Kim 提交于
      This patch adds the renamed functions moved from the f2fs crypto files.
      
      1. definitions for per-file encryption used by ext4 and f2fs.
      
      2. crypto.c for encrypt/decrypt functions
       a. IO preparation:
        - fscrypt_get_ctx / fscrypt_release_ctx
       b. before IOs:
        - fscrypt_encrypt_page
        - fscrypt_decrypt_page
        - fscrypt_zeroout_range
       c. after IOs:
        - fscrypt_decrypt_bio_pages
        - fscrypt_pullback_bio_page
        - fscrypt_restore_control_page
      
      3. policy.c supporting context management.
       a. For ioctls:
        - fscrypt_process_policy
        - fscrypt_get_policy
       b. For context permission
        - fscrypt_has_permitted_context
        - fscrypt_inherit_context
      
      4. keyinfo.c to handle permissions
        - fscrypt_get_encryption_info
        - fscrypt_free_encryption_info
      
      5. fname.c to support filename encryption
       a. general wrapper functions
        - fscrypt_fname_disk_to_usr
        - fscrypt_fname_usr_to_disk
        - fscrypt_setup_filename
        - fscrypt_free_filename
      
       b. specific filename handling functions
        - fscrypt_fname_alloc_buffer
        - fscrypt_fname_free_buffer
      
      6. Makefile and Kconfig
      
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
      Signed-off-by: NIldar Muslukhov <ildarm@google.com>
      Signed-off-by: NUday Savagaonkar <savagaon@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0b81d077
  17. 27 2月, 2016 2 次提交
  18. 23 2月, 2016 8 次提交
    • C
      f2fs: trace old block address for CoWed page · 7a9d7548
      Chao Yu 提交于
      This patch enables to trace old block address of CoWed page for better
      debugging.
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
      f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA
      
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
      f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      7a9d7548
    • C
      f2fs: support revoking atomic written pages · 28bc106b
      Chao Yu 提交于
      f2fs support atomic write with following semantics:
      1. open db file
      2. ioctl start atomic write
      3. (write db file) * n
      4. ioctl commit atomic write
      5. close db file
      
      With this flow we can avoid file becoming corrupted when abnormal power
      cut, because we hold data of transaction in referenced pages linked in
      inmem_pages list of inode, but without setting them dirty, so these data
      won't be persisted unless we commit them in step 4.
      
      But we should still hold journal db file in memory by using volatile
      write, because our semantics of 'atomic write support' is incomplete, in
      step 4, we could fail to submit all dirty data of transaction, once
      partial dirty data was committed in storage, then after a checkpoint &
      abnormal power-cut, db file will be corrupted forever.
      
      So this patch tries to improve atomic write flow by adding a revoking flow,
      once inner error occurs in committing, this gives another chance to try to
      revoke these partial submitted data of current transaction, it makes
      committing operation more like aotmical one.
      
      If we're not lucky, once revoking operation was failed, EAGAIN will be
      reported to user for suggesting doing the recovery with held journal file,
      or retrying current transaction again.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      28bc106b
    • J
      f2fs crypto: f2fs_page_crypto() doesn't need a encryption context · ce855a3b
      Jaegeuk Kim 提交于
      This patch adopts:
      	ext4 crypto: ext4_page_crypto() doesn't need a encryption context
      
      Since ext4_page_crypto() doesn't need an encryption context (at least
      not any more), this allows us to simplify a number function signature
      and also allows us to avoid needing to allocate a context in
      ext4_block_write_begin().  It also means we no longer need a separate
      ext4_decrypt_one() function.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      ce855a3b
    • J
      f2fs: preallocate blocks for buffered aio writes · 24b84912
      Jaegeuk Kim 提交于
      This patch preallocates data blocks for buffered aio writes.
      With this patch, we can avoid redundant locking and unlocking of node pages
      given consecutive aio request.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      24b84912
    • J
      f2fs: move dio preallocation into f2fs_file_write_iter · b439b103
      Jaegeuk Kim 提交于
      This patch moves preallocation code for direct IOs into f2fs_file_write_iter.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b439b103
    • Y
      f2fs: fix missing skip pages info · d31c7c3f
      Yunlei He 提交于
      fix missing skip pages info in f2fs_writepages trace event.
      Signed-off-by: NYunlei He <heyunlei@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      d31c7c3f
    • C
      f2fs: introduce f2fs_submit_merged_bio_cond · 0c3a5797
      Chao Yu 提交于
      f2fs use single bio buffer per type data (META/NODE/DATA) for caching
      writes locating in continuous block address as many as possible, after
      submitting, these writes may be still cached in bio buffer, so we have
      to flush cached writes in bio buffer by calling f2fs_submit_merged_bio.
      
      Unfortunately, in the scenario of high concurrency, bio buffer could be
      flushed by someone else before we submit it as below reasons:
      a) there is no space in bio buffer.
      b) add a request of different type (SYNC, ASYNC).
      c) add a discontinuous block address.
      
      For this condition, f2fs_submit_merged_bio will be devastating, because
      it could break the following merging of writes in bio buffer, split one
      big bio into two smaller one.
      
      This patch introduces f2fs_submit_merged_bio_cond which can do a
      conditional submitting with bio buffer, before submitting it will judge
      whether:
       - page in DATA type bio buffer is matching with specified page;
       - page in DATA type bio buffer is belong to specified inode;
       - page in NODE type bio buffer is belong to specified inode;
      If there is no eligible page in bio buffer, we will skip submitting step,
      result in gaining more chance to merge consecutive block IOs in bio cache.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0c3a5797
    • C
      f2fs: speed up handling holes in fiemap · da85985c
      Chao Yu 提交于
      This patch makes f2fs_map_blocks supporting returning next potential
      page offset which skips hole region in indirect tree of inode, and
      use it to speed up fiemap in handling big hole case.
      
      Test method:
      xfs_io -f /mnt/f2fs/file  -c "pwrite 1099511627776 4096"
      time xfs_io -f /mnt/f2fs/file -c "fiemap -v"
      
      Before:
      time xfs_io -f /mnt/f2fs/file -c "fiemap -v"
      /mnt/f2fs/file:
       EXT: FILE-OFFSET              BLOCK-RANGE      TOTAL FLAGS
         0: [0..2147483647]:         hole             2147483648
         1: [2147483648..2147483655]: 81920..81927         8   0x1
      
      real    3m3.518s
      user    0m0.000s
      sys     3m3.456s
      
      After:
      time xfs_io -f /mnt/f2fs/file -c "fiemap -v"
      /mnt/f2fs/file:
       EXT: FILE-OFFSET              BLOCK-RANGE      TOTAL FLAGS
         0: [0..2147483647]:         hole             2147483648
         1: [2147483648..2147483655]: 81920..81927         8   0x1
      
      real    0m0.008s
      user    0m0.000s
      sys     0m0.008s
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      da85985c