1. 16 1月, 2014 2 次提交
    • C
      f2fs: missing REQ_META and REQ_PRIO when sync_meta_pages(META_FLUSH) · c434cbc0
      Changman Lee 提交于
      Doing sync_meta_pages with META_FLUSH when checkpoint, we overide rw
      using WRITE_FLUSH_FUA. At this time, we also should set
      REQ_META|REQ_PRIO.
      Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      c434cbc0
    • J
      f2fs: avoid f2fs_balance_fs call during pageout · c33ec326
      Jaegeuk Kim 提交于
      This patch should resolve the following bug.
      
      =========================================================
      [ INFO: possible irq lock inversion dependency detected ]
      3.13.0-rc5.f2fs+ #6 Not tainted
      ---------------------------------------------------------
      kswapd0/41 just changed the state of lock:
       (&sbi->gc_mutex){+.+.-.}, at: [<ffffffffa030503e>] f2fs_balance_fs+0xae/0xd0 [f2fs]
      but this lock took another, RECLAIM_FS-READ-unsafe lock in the past:
       (&sbi->cp_rwsem){++++.?}
      
      and interrupts could create inverse lock ordering between them.
      
      other info that might help us debug this:
      Chain exists of:
        &sbi->gc_mutex --> &sbi->cp_mutex --> &sbi->cp_rwsem
      
       Possible interrupt unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&sbi->cp_rwsem);
                                     local_irq_disable();
                                     lock(&sbi->gc_mutex);
                                     lock(&sbi->cp_mutex);
        <Interrupt>
          lock(&sbi->gc_mutex);
      
       *** DEADLOCK ***
      
      This bug is due to the f2fs_balance_fs call in f2fs_write_data_page.
      If f2fs_write_data_page is triggered by wbc->for_reclaim via kswapd, it should
      not call f2fs_balance_fs which tries to get a mutex grabbed by original syscall
      flow.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      c33ec326
  2. 14 1月, 2014 1 次提交
  3. 06 1月, 2014 3 次提交
    • J
      f2fs: handle errors correctly during f2fs_reserve_block · a8865372
      Jaegeuk Kim 提交于
      The get_dnode_of_data nullifies inode and node page when error is occurred.
      
      There are two cases that passes inode page into get_dnode_of_data().
      
      1. make_empty_dir()
          -> get_new_data_page()
            -> f2fs_reserve_block(ipage)
      	-> get_dnode_of_data()
      
      2. f2fs_convert_inline_data()
          -> __f2fs_convert_inline_data()
            -> f2fs_reserve_block(ipage)
      	-> get_dnode_of_data()
      
      This patch adds correct error handling codes when get_dnode_of_data() returns
      an error.
      
      At first, f2fs_reserve_block() calls f2fs_put_dnode() whenever reserve_new_block
      returns an error.
      So, the rule of f2fs_reserve_block() is to nullify inode page when there is any
      error internally.
      
      Finally, two callers of f2fs_reserve_block() should call f2fs_put_dnode()
      appropriately if they got an error since successful f2fs_reserve_block().
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      a8865372
    • J
      f2fs: refactor f2fs_convert_inline_data · 9e09fc85
      Jaegeuk Kim 提交于
      Change log from v1:
       o handle NULL pointer of grab_cache_page_write_begin() pointed by Chao Yu.
      
      This patch refactors f2fs_convert_inline_data to check a couple of conditions
      internally for deciding whether it needs to convert inline_data or not.
      
      So, the new f2fs_convert_inline_data initially checks:
      1) f2fs_has_inline_data(), and
      2) the data size to be changed.
      
      If the inode has inline_data but the size to fill is less than MAX_INLINE_DATA,
      then we don't need to convert the inline_data with data allocation.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      9e09fc85
    • J
      f2fs: call f2fs_put_page at the error case · 26f466f4
      Jaegeuk Kim 提交于
      In f2fs_write_begin(), if f2fs_conver_inline_data() returns an error like
      -ENOSPC, f2fs should call f2fs_put_page().
      Otherwise, it is remained as a locked page, resulting in the following bug.
      
      [<ffffffff8114657e>] sleep_on_page+0xe/0x20
      [<ffffffff81146567>] __lock_page+0x67/0x70
      [<ffffffff81157d08>] truncate_inode_pages_range+0x368/0x5d0
      [<ffffffff81157ff5>] truncate_inode_pages+0x15/0x20
      [<ffffffff8115804b>] truncate_pagecache+0x4b/0x70
      [<ffffffff81158082>] truncate_setsize+0x12/0x20
      [<ffffffffa02a1842>] f2fs_setattr+0x72/0x270 [f2fs]
      [<ffffffff811cdae3>] notify_change+0x213/0x400
      [<ffffffff811ab376>] do_truncate+0x66/0xa0
      [<ffffffff811ab541>] vfs_truncate+0x191/0x1b0
      [<ffffffff811ab5bc>] do_sys_truncate+0x5c/0xa0
      [<ffffffff811ab78e>] SyS_truncate+0xe/0x10
      [<ffffffff81756052>] system_call_fastpath+0x16/0x1b
      [<ffffffffffffffff>] 0xffffffffffffffff
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      26f466f4
  4. 26 12月, 2013 3 次提交
    • H
      f2fs: handle inline data operations · 9ffe0fb5
      Huajun Li 提交于
      Hook inline data read/write, truncate, fallocate, setattr, etc.
      
      Files need meet following 2 requirement to inline:
       1) file size is not greater than MAX_INLINE_DATA;
       2) file doesn't pre-allocate data blocks by fallocate().
      
      FI_INLINE_DATA will not be set while creating a new regular inode because
      most of the files are bigger than ~3.4K. Set FI_INLINE_DATA only when
      data is submitted to block layer, ranther than set it while creating a new
      inode, this also avoids converting data from inline to normal data block
      and vice versa.
      
      While writting inline data to inode block, the first data block should be
      released if the file has a block indexed by i_addr[0].
      
      On the other hand, when a file operation is appied to a file with inline
      data, we need to test if this file can remain inline by doing this
      operation, otherwise it should be convert into normal file by reserving
      a new data block, copying inline data to this new block and clear
      FI_INLINE_DATA flag. Because reserve a new data block here will make use
      of i_addr[0], if we save inline data in i_addr[0..872], then the first
      4 bytes would be overwriten. This problem can be avoided simply by
      not using i_addr[0] for inline data.
      Signed-off-by: NHuajun Li <huajun.li@intel.com>
      Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
      Signed-off-by: NWeihong Xu <weihong.xu@intel.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      9ffe0fb5
    • J
      f2fs: check the blocksize before calling generic_direct_IO path · 944fcfc1
      Jaegeuk Kim 提交于
      The f2fs supports 4KB block size. If user requests dwrite with under 4KB data,
      it allocates a new 4KB data block.
      However, f2fs doesn't add zero data into the untouched data area inside the
      newly allocated data block.
      
      This incurs an error during the xfstest #263 test as follow.
      
      263 12s ... [failed, exit status 1] - output mismatch (see 263.out.bad)
      	--- 263.out	2013-03-09 03:37:15.043967603 +0900
      	+++ 263.out.bad	2013-12-27 04:20:39.230203114 +0900
      	@@ -1,3 +1,976 @@
      	QA output created by 263
      	fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z
      	-fsx -N 10000 -o 128000 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z
      	+fsx -N 10000 -o 8192 -l 500000 -r PSIZE -t BSIZE -w BSIZE -Z
      	+truncating to largest ever: 0x12a00
      	+truncating to largest ever: 0x75400
      	+fallocating to largest ever: 0x79cbf
      	...
      	(Run 'diff -u 263.out 263.out.bad' to see the entire diff)
      	Ran: 263
      	Failures: 263
      	Failed 1 of 1 tests
      
      It turns out that, when the test tries to write 2KB data with dio, the new dio
      path allocates 4KB data block without filling zero data inside the remained 2KB
      area. Finally, the output file contains a garbage data for that region.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      944fcfc1
    • J
      f2fs: should put the dnode when NEW_ADDR is detected · 1ec79083
      Jaegeuk Kim 提交于
      When get_dnode_of_data() in get_data_block() returns a successful dnode, we
      should put the dnode.
      But, previously, if its data block address is equal to NEW_ADDR, we didn't do
      that, resulting in a deadlock condition.
      So, this patch splits original error conditions with this case, and then calls
      f2fs_put_dnode before finishing the function.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      1ec79083
  5. 23 12月, 2013 19 次提交
  6. 29 10月, 2013 1 次提交
  7. 25 10月, 2013 2 次提交
  8. 07 10月, 2013 1 次提交
    • G
      f2fs: use rw_sem instead of fs_lock(locks mutex) · e479556b
      Gu Zheng 提交于
      The fs_locks is used to block other ops(ex, recovery) when doing checkpoint.
      And each other operate routine(besides checkpoint) needs to acquire a fs_lock,
      there is a terrible problem here, if these are too many concurrency threads acquiring
      fs_lock, so that they will block each other and may lead to some performance problem,
      but this is not the phenomenon we want to see.
      Though there are some optimization patches introduced to enhance the usage of fs_lock,
      but the thorough solution is using a *rw_sem* to replace the fs_lock.
      Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block
      other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other,
      this can avoid the problem described above completely.
      Because of the weakness of rw_sem, the above change may introduce a potential problem
      that the checkpoint thread might get starved if other threads are intensively locking
      the read semaphore for I/O.(Pointed out by Xu Jin)
      In order to avoid this, a wait_list is introduced, the appending read semaphore ops
      will be dropped into the wait_list if checkpoint thread is waiting for write semaphore,
      and will be waked up when checkpoint thread gives up write semaphore.
      Thanks to Kim's previous review and test, and will be very glad to see other guys'
      performance tests about this patch.
      
      V2:
        -fix the potential starvation problem.
        -use more suitable func name suggested by Xu Jin.
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      [Jaegeuk Kim: adjust minor coding standard]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      e479556b
  9. 26 8月, 2013 1 次提交
    • J
      f2fs: reserve the xattr space dynamically · de93653f
      Jaegeuk Kim 提交于
      This patch enables the number of direct pointers inside on-disk inode block to
      be changed dynamically according to the size of inline xattr space.
      
      The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file
      has inline xattr flag.
      
      The number of direct pointers that will be used by inline xattrs is defined as
      F2FS_INLINE_XATTR_ADDRS.
      Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      de93653f
  10. 20 8月, 2013 1 次提交
    • J
      f2fs: fix wrong BUG_ON condition · d59ff4df
      Jaegeuk Kim 提交于
      This patch removes a false-alaramed BUG_ON.
      The previous BUG_ON condition didn't cover the following true scenario.
      
      In f2fs_add_link, 1) get_new_data_page gives an uptodate page successfully,
      and then, 2) init_inode_metadata returns -ENOSPC.
      At this moment, a new clean data page is remained in the page cache, but its
      block address still indicates NEW_ADDR.
      After then, even if sync is called, this clean data page cannot be written to
      the disk due to the clean state.
      
      So this means that get_lock_data_page should make a new empty page when its
      block address is NEW_ADDR and its page is not uptodated.
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      d59ff4df
  11. 12 8月, 2013 1 次提交
  12. 06 8月, 2013 2 次提交
    • J
      f2fs: fix a deadlock in fsync · a569469e
      Jin Xu 提交于
      This patch fixes a deadlock bug that occurs quite often when there are
      concurrent write and fsync on a same file.
      
      Following is the simplified call trace when tasks get hung.
      
      fsync thread:
      - f2fs_sync_file
       ...
       - f2fs_write_data_pages
       ...
        - update_extent_cache
        ...
         - update_inode
          - wait_on_page_writeback
      
      bdi writeback thread
      - __writeback_single_inode
       - f2fs_write_data_pages
        - mutex_lock(sbi->writepages)
      
      The deadlock happens when the fsync thread waits on a inode page that has
      been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately,
      no one else could be able to submit the cached bio to block layer for
      writeback. This is because the fsync thread already hold a sbi->fs_lock and
      the sbi->writepages lock, causing the bdi thread being blocked when attempt
      to write data pages for the same inode. At the same time, f2fs_gc thread
      does not notice the situation and could not help. Even the sync syscall
      gets blocked.
      
      To fix it, we could submit the cached bio first before waiting on a inode page
      that is being written back.
      Signed-off-by: NJin Xu <jinuxstyle@gmail.com>
      [Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback]
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      a569469e
    • N
      f2fs: remove redundant code from f2fs_write_begin · df273efc
      Namjae Jeon 提交于
      This code is being used for nobh_write_end() function.
      But since now f2fs_write_end function is added so
      there is no need for this code.
      Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NPankaj Kumar <pankaj.km@samsung.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
      df273efc
  13. 31 7月, 2013 1 次提交
  14. 30 7月, 2013 2 次提交