1. 10 10月, 2015 1 次提交
  2. 25 8月, 2015 2 次提交
    • C
      f2fs: fix to release inode correctly · 13ec7297
      Chao Yu 提交于
      In following call stack, if unfortunately we lose all chances to truncate
      inode page in remove_inode_page, eventually we will add the nid allocated
      previously into free nid cache, this nid is with NID_NEW status and with
      NEW_ADDR in its blkaddr pointer:
      
       - f2fs_create
        - f2fs_add_link
         - __f2fs_add_link
          - init_inode_metadata
           - new_inode_page
            - new_node_page
             - set_node_addr(, NEW_ADDR)
           - f2fs_init_acl   failed
           - remove_inode_page  failed
        - handle_failed_inode
         - remove_inode_page  failed
         - iput
          - f2fs_evict_inode
           - remove_inode_page  failed
           - alloc_nid_failed   cache a nid with valid blkaddr: NEW_ADDR
      
      This may not only cause resource leak of previous inode, but also may cause
      incorrect use of the previous blkaddr which is located in NO.nid node entry
      when this nid is reused by others.
      
      This patch tries to add this inode to orphan list if we fail to truncate
      inode, so that we can obtain a second chance to release it in orphan
      recovery flow.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      13ec7297
    • Z
      f2fs: atomically set inode->i_flags · 6a678857
      Zhang Zhen 提交于
      According to commit 5f16f322 ("ext4: atomically set inode->i_flags in
      ext4_set_inode_flags()").
      Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com>
      Reviewed-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      6a678857
  3. 05 8月, 2015 5 次提交
  4. 02 6月, 2015 1 次提交
    • J
      f2fs crypto: use per-inode tfm structure · 26bf3dc7
      Jaegeuk Kim 提交于
      This patch applies the following ext4 patch:
      
        ext4 crypto: use per-inode tfm structure
      
      As suggested by Herbert Xu, we shouldn't allocate a new tfm each time
      we read or write a page.  Instead we can use a single tfm hanging off
      the inode's crypt_info structure for all of our encryption needs for
      that inode, since the tfm can be used by multiple crypto requests in
      parallel.
      
      Also use cmpxchg() to avoid races that could result in crypt_info
      structure getting doubly allocated or doubly freed.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      26bf3dc7
  5. 29 5月, 2015 2 次提交
  6. 11 4月, 2015 4 次提交
  7. 04 3月, 2015 2 次提交
    • C
      f2fs: enable rb-tree extent cache · 1dcc336b
      Chao Yu 提交于
      This patch enables rb-tree based extent cache in f2fs.
      
      When we mount with "-o extent_cache", f2fs will try to add recently accessed
      page-block mappings into rb-tree based extent cache as much as possible, instead
      of original one extent info cache.
      
      By this way, f2fs can support more effective cache between dnode page cache and
      disk. It will supply high hit ratio in the cache with fewer memory when dnode
      page cache are reclaimed in environment of low memory.
      
      Storage: Sandisk sd card 64g
      1.append write file (offset: 0, size: 128M);
      2.override write file (offset: 2M, size: 1M);
      3.override write file (offset: 4M, size: 1M);
      ...
      4.override write file (offset: 48M, size: 1M);
      ...
      5.override write file (offset: 112M, size: 1M);
      6.sync
      7.echo 3 > /proc/sys/vm/drop_caches
      8.read file (size:128M, unit: 4k, count: 32768)
      (time dd if=/mnt/f2fs/128m bs=4k count=32768)
      
      Extent Hit Ratio:
      		before		patched
      Hit Ratio	121 / 1071	1071 / 1071
      
      Performance:
      		before		patched
      real    	0m37.051s	0m35.556s
      user    	0m0.040s	0m0.026s
      sys     	0m2.990s	0m2.251s
      
      Memory Cost:
      		before		patched
      Tree Count:	0		1 (size: 24 bytes)
      Node Count:	0		45 (size: 1440 bytes)
      
      v3:
       o retest and given more details of test result.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1dcc336b
    • C
      f2fs: move ext_lock out of struct extent_info · 0c872e2d
      Chao Yu 提交于
      Move ext_lock out of struct extent_info, then in the following patches we can
      use variables with struct extent_info type as a parameter to pass pure data.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      0c872e2d
  8. 10 1月, 2015 2 次提交
    • C
      f2fs: get rid of kzalloc in __recover_inline_status · 9e5ba77f
      Chao Yu 提交于
      We use kzalloc to allocate memory in __recover_inline_status, and use this
      all-zero memory to check the inline date content of inode page by comparing
      them. This is low effective and not needed, let's check inline date content
      directly.
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      [Jaegeuk Kim: make the code more neat]
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      9e5ba77f
    • J
      f2fs: change atomic and volatile write policies · 1e84371f
      Jaegeuk Kim 提交于
      This patch adds two new ioctls to release inmemory pages grabbed by atomic
      writes.
       o f2fs_ioc_abort_volatile_write
        - If transaction was failed, all the grabbed pages and data should be written.
       o f2fs_ioc_release_volatile_write
        - This is to enhance the performance of PERSIST mode in sqlite.
      
      In order to avoid huge memory consumption which causes OOM, this patch changes
      volatile writes to use normal dirty pages, instead blocked flushing to the disk
      as long as system does not suffer from memory pressure.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1e84371f
  9. 09 12月, 2014 1 次提交
  10. 05 11月, 2014 1 次提交
    • J
      f2fs: revisit inline_data to avoid data races and potential bugs · b3d208f9
      Jaegeuk Kim 提交于
      This patch simplifies the inline_data usage with the following rule.
      1. inline_data is set during the file creation.
      2. If new data is requested to be written ranges out of inline_data,
       f2fs converts that inode permanently.
      3. There is no cases which converts non-inline_data inode to inline_data.
      4. The inline_data flag should be changed under inode page lock.
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      b3d208f9
  11. 04 11月, 2014 3 次提交
  12. 08 10月, 2014 1 次提交
    • J
      f2fs: support volatile operations for transient data · 02a1335f
      Jaegeuk Kim 提交于
      This patch adds support for volatile writes which keep data pages in memory
      until f2fs_evict_inode is called by iput.
      
      For instance, we can use this feature for the sqlite database as follows.
      While supporting atomic writes for main database file, we can keep its journal
      data temporarily in the page cache by the following sequence.
      
      1. open
       -> ioctl(F2FS_IOC_START_VOLATILE_WRITE);
      2. writes
       : keep all the data in the page cache.
      3. flush to the database file with atomic writes
        a. ioctl(F2FS_IOC_START_ATOMIC_WRITE);
        b. writes
        c. ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
      4. close
       -> drop the cached data
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      02a1335f
  13. 07 10月, 2014 1 次提交
    • J
      f2fs: support atomic writes · 88b88a66
      Jaegeuk Kim 提交于
      This patch introduces a very limited functionality for atomic write support.
      In order to support atomic write, this patch adds two ioctls:
       o F2FS_IOC_START_ATOMIC_WRITE
       o F2FS_IOC_COMMIT_ATOMIC_WRITE
      
      The database engine should be aware of the following sequence.
      1. open
       -> ioctl(F2FS_IOC_START_ATOMIC_WRITE);
      2. writes
        : all the written data will be treated as atomic pages.
      3. commit
       -> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
        : this flushes all the data blocks to the disk, which will be shown all or
        nothing by f2fs recovery procedure.
      4. repeat to #2.
      
      The IO pattens should be:
      
        ,- START_ATOMIC_WRITE                  ,- COMMIT_ATOMIC_WRITE
       CP | D D D D D D | FSYNC | D D D D | FSYNC ...
                            `- COMMIT_ATOMIC_WRITE
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      88b88a66
  14. 01 10月, 2014 1 次提交
  15. 16 9月, 2014 1 次提交
  16. 10 9月, 2014 1 次提交
  17. 04 9月, 2014 1 次提交
  18. 05 8月, 2014 1 次提交
  19. 29 7月, 2014 1 次提交
  20. 25 7月, 2014 1 次提交
    • C
      f2fs: avoid use invalid mapping of node_inode when evict meta inode · dbf20cb2
      Chao Yu 提交于
      Andrey Tsyvarev reported:
      "Using memory error detector reveals the following use-after-free error
      in 3.15.0:
      
      AddressSanitizer: heap-use-after-free in f2fs_evict_inode
      Read of size 8 by thread T22279:
        [<ffffffffa02d8702>] f2fs_evict_inode+0x102/0x2e0 [f2fs]
        [<ffffffff812359af>] evict+0x15f/0x290
        [<     inlined    >] iput+0x196/0x280 iput_final
        [<ffffffff812369a6>] iput+0x196/0x280
        [<ffffffffa02dc416>] f2fs_put_super+0xd6/0x170 [f2fs]
        [<ffffffff81210095>] generic_shutdown_super+0xc5/0x1b0
        [<ffffffff812105fd>] kill_block_super+0x4d/0xb0
        [<ffffffff81210a86>] deactivate_locked_super+0x66/0x80
        [<ffffffff81211c98>] deactivate_super+0x68/0x80
        [<ffffffff8123cc88>] mntput_no_expire+0x198/0x250
        [<     inlined    >] SyS_umount+0xe9/0x1a0 SYSC_umount
        [<ffffffff8123f1c9>] SyS_umount+0xe9/0x1a0
        [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b
      
      Freed by thread T3:
        [<ffffffffa02dc337>] f2fs_i_callback+0x27/0x30 [f2fs]
        [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 __rcu_reclaim
        [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 rcu_do_batch
        [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 invoke_rcu_callbacks
        [<     inlined    >] rcu_process_callbacks+0x2d6/0x930 __rcu_process_callbacks
        [<ffffffff810fd266>] rcu_process_callbacks+0x2d6/0x930
        [<ffffffff8107cce2>] __do_softirq+0x142/0x380
        [<ffffffff8107cf50>] run_ksoftirqd+0x30/0x50
        [<ffffffff810b2a87>] smpboot_thread_fn+0x197/0x280
        [<ffffffff810a8238>] kthread+0x148/0x160
        [<ffffffff81cc8d4c>] ret_from_fork+0x7c/0xb0
      
      Allocated by thread T22276:
        [<ffffffffa02dc7dd>] f2fs_alloc_inode+0x2d/0x170 [f2fs]
        [<ffffffff81235e2a>] iget_locked+0x10a/0x230
        [<ffffffffa02d7495>] f2fs_iget+0x35/0xa80 [f2fs]
        [<ffffffffa02e2393>] f2fs_fill_super+0xb53/0xff0 [f2fs]
        [<ffffffff81211bce>] mount_bdev+0x1de/0x240
        [<ffffffffa02dbce0>] f2fs_mount+0x10/0x20 [f2fs]
        [<ffffffff81212a85>] mount_fs+0x55/0x220
        [<ffffffff8123c026>] vfs_kern_mount+0x66/0x200
        [<     inlined    >] do_mount+0x2b4/0x1120 do_new_mount
        [<ffffffff812400d4>] do_mount+0x2b4/0x1120
        [<     inlined    >] SyS_mount+0xb2/0x110 SYSC_mount
        [<ffffffff812414a2>] SyS_mount+0xb2/0x110
        [<ffffffff81cc8df9>] system_call_fastpath+0x16/0x1b
      
      The buggy address ffff8800587866c8 is located 48 bytes inside
        of 680-byte region [ffff880058786698, ffff880058786940)
      
      Memory state around the buggy address:
        ffff880058786100: ffffffff ffffffff ffffffff ffffffff
        ffff880058786200: ffffffff ffffffff ffffffrr rrrrrrrr
        ffff880058786300: rrrrrrrr rrffffff ffffffff ffffffff
        ffff880058786400: ffffffff ffffffff ffffffff ffffffff
        ffff880058786500: ffffffff ffffffff ffffffff fffffffr
       >ffff880058786600: rrrrrrrr rrrrrrrr rrrfffff ffffffff
                                                      ^
        ffff880058786700: ffffffff ffffffff ffffffff ffffffff
        ffff880058786800: ffffffff ffffffff ffffffff ffffffff
        ffff880058786900: ffffffff rrrrrrrr rrrrrrrr rrrr....
        ffff880058786a00: ........ ........ ........ ........
        ffff880058786b00: ........ ........ ........ ........
      Legend:
        f - 8 freed bytes
        r - 8 redzone bytes
        . - 8 allocated bytes
        x=1..7 - x allocated bytes + (8-x) redzone bytes
      
      Investigation shows, that f2fs_evict_inode, when called for
      'meta_inode', uses invalidate_mapping_pages() for 'node_inode'.
      But 'node_inode' is deleted before 'meta_inode' in f2fs_put_super via
      iput().
      
      It seems that in common usage scenario this use-after-free is benign,
      because 'node_inode' remains partially valid data even after
      kmem_cache_free().
      But things may change if, while 'meta_inode' is evicted in one f2fs
      filesystem, another (mounted) f2fs filesystem requests inode from cache,
      and formely
      'node_inode' of the first filesystem is returned."
      
      Nids for both meta_inode and node_inode are reservation, so it's not necessary
      for us to invalidate pages which will never be allocated.
      To fix this issue, let's skipping needlessly invalidating pages for
      {meta,node}_inode in f2fs_evict_inode.
      Reported-by: NAndrey Tsyvarev <tsyvarev@ispras.ru>
      Tested-by: NAndrey Tsyvarev <tsyvarev@ispras.ru>
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NChao Yu <chao2.yu@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      dbf20cb2
  21. 09 7月, 2014 1 次提交
  22. 07 5月, 2014 2 次提交
  23. 04 4月, 2014 1 次提交
    • J
      mm + fs: store shadow entries in page cache · 91b0abe3
      Johannes Weiner 提交于
      Reclaim will be leaving shadow entries in the page cache radix tree upon
      evicting the real page.  As those pages are found from the LRU, an
      iput() can lead to the inode being freed concurrently.  At this point,
      reclaim must no longer install shadow pages because the inode freeing
      code needs to ensure the page tree is really empty.
      
      Add an address_space flag, AS_EXITING, that the inode freeing code sets
      under the tree lock before doing the final truncate.  Reclaim will check
      for this flag before installing shadow pages.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91b0abe3
  24. 18 3月, 2014 1 次提交
  25. 27 2月, 2014 1 次提交
    • J
      f2fs: introduce large directory support · 38431545
      Jaegeuk Kim 提交于
      This patch introduces an i_dir_level field to support large directory.
      
      Previously, f2fs maintains multi-level hash tables to find a dentry quickly
      from a bunch of chiild dentries in a directory, and the hash tables consist of
      the following tree structure as below.
      
      In Documentation/filesystems/f2fs.txt,
      
      ----------------------
      A : bucket
      B : block
      N : MAX_DIR_HASH_DEPTH
      ----------------------
      
      level #0   | A(2B)
                 |
      level #1   | A(2B) - A(2B)
                 |
      level #2   | A(2B) - A(2B) - A(2B) - A(2B)
           .     |   .       .       .       .
      level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
           .     |   .       .       .       .
      level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
      
      But, if we can guess that a directory will handle a number of child files,
      we don't need to traverse the tree from level #0 to #N all the time.
      Since the lower level tables contain relatively small number of dentries,
      the miss ratio of the target dentry is likely to be high.
      
      In order to avoid that, we can configure the hash tables sparsely from level #0
      like this.
      
      level #0   | A(2B) - A(2B) - A(2B) - A(2B)
      
      level #1   | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
           .     |   .       .       .       .
      level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
           .     |   .       .       .       .
      level #N   | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
      
      With this structure, we can skip the ineffective tree searches in lower level
      hash tables.
      
      This patch adds just a facility for this by introducing i_dir_level in
      f2fs_inode.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      38431545
  26. 17 2月, 2014 1 次提交