- 24 9月, 2014 6 次提交
-
-
由 Jaegeuk Kim 提交于
If same data is updated multiple times, we don't need to redo whole the operations. Let's just update the lastest one. Reviewed-by: NChao Yu <chao2.yu@samsung.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
In f2fs_sync_file, if there is no written appended writes, it skips to write its node blocks. But, if there is up-to-date inode page, we should write it to update its metadata during the roll-forward recovery. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
We can summarize the roll forward recovery scenarios as follows. [Term] F: fsync_mark, D: dentry_mark 1. inode(x) | CP | inode(x) | dnode(F) -> Update the latest inode(x). 2. inode(x) | CP | inode(F) | dnode(F) -> No problem. 3. inode(x) | CP | dnode(F) | inode(x) -> Recover to the latest dnode(F), and drop the last inode(x) 4. inode(x) | CP | dnode(F) | inode(F) -> No problem. 5. CP | inode(x) | dnode(F) -> The inode(DF) was missing. Should drop this dnode(F). 6. CP | inode(DF) | dnode(F) -> No problem. 7. CP | dnode(F) | inode(DF) -> If f2fs_iget fails, then goto next to find inode(DF). 8. CP | dnode(F) | inode(x) -> If f2fs_iget fails, then goto next to find inode(DF). But it will fail due to no inode(DF). So, this patch adds some missing points such as #1, #5, #7, and #8. Signed-off-by: NHuang Ying <ying.huang@intel.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
This patch revisited whole the recovery information during the f2fs_sync_file. In this patch, there are three information to make a decision. a) IS_CHECKPOINTED, /* is it checkpointed before? */ b) HAS_FSYNCED_INODE, /* is the inode fsynced before? */ c) HAS_LAST_FSYNC, /* has the latest node fsync mark? */ And, the scenarios for our rule are based on: [Term] F: fsync_mark, D: dentry_mark 1. inode(x) | CP | inode(x) | dnode(F) 2. inode(x) | CP | inode(F) | dnode(F) 3. inode(x) | CP | dnode(F) | inode(x) | inode(F) 4. inode(x) | CP | dnode(F) | inode(F) 5. CP | inode(x) | dnode(F) | inode(DF) 6. CP | inode(DF) | dnode(F) 7. CP | dnode(F) | inode(DF) 8. CP | dnode(F) | inode(x) | inode(DF) For example, #3, the three conditions should be changed as follows. inode(x) | CP | dnode(F) | inode(x) | inode(F) a) x o o o o b) x x x x o c) x o o x o If f2fs_sync_file stops ------^, it should write inode(F) --------------^ So, the need_inode_block_update should return true, since c) get_nat_flag(e, HAS_LAST_FSYNC), is false. For example, #8, CP | alloc | dnode(F) | inode(x) | inode(DF) a) o x x x x b) x x x o c) o o x o If f2fs_sync_file stops -------^, it should write inode(DF) --------------^ Note that, the roll-forward policy should follow this rule, which means, if there are any missing blocks, we doesn't need to recover that inode. Signed-off-by: NHuang Ying <ying.huang@intel.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
This patch introduces a flag in the nat entry structure to merge various information such as checkpointed and fsync_done marks. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
Previously, all the dnode pages should be read during the roll-forward recovery. Even worsely, whole the chain was traversed twice. This patch removes that redundant and costly read operations by using page cache of meta_inode and readahead function as well. Reviewed-by: NChao Yu <chao2.yu@samsung.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 16 9月, 2014 5 次提交
-
-
由 Jaegeuk Kim 提交于
If the inode is same and its data index are needed to truncate, we can fall into double lock for its inode page via get_dnode_of_data. Error case is like this. 1. write data 1, 2, 3, 4, 5 in inode #4. 2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4. 3. sync 4. update data 100->106 in dnode #6. 5. fsync inode #4. 6. power-cut -> Then, 1. go back to #3's checkpoint 2. in do_recover_data, get_dnode_of_data() gets inode #4. 3. detect 100->106 in dnode #6. 4. check_index_in_prev_nodes tries to truncate 100 in dnode #6. 5. to trigger truncate_hole, get_dnode_of_data should grab inode #4. 6. detect *kernel hang* This patch should resolve that bug. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Huang Ying 提交于
The nm_i->fcnt checking is executed before spin_lock, so if another thread delete the last free_nid from the list, the wrong nid may be gotten. So fix the race condition by moving the nm_i->fnct checking into spin_lock. Signed-off-by: NHuang, Ying <ying.huang@intel.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Huang Ying 提交于
Now, if there is no free nid in nm_i->free_nid_list, 0 may be saved into next_free_nid of checkpoint, this may cause useless scanning for next mount. nm_i->next_scan_nid should be a better default value than 0. Signed-off-by: NHuang, Ying <ying.huang@intel.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file only starts to try in-place-updates. And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it keeps out-of-order manner. Otherwise, it triggers in-place-updates. This may be used by storage showing very high random write performance. For example, it can be used when, Seq. writes (Data) + wait + Seq. writes (Node) is pretty much slower than, Rand. writes (Data) Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
Previously f2fs only counts dirty dentry pages, but there is no reason not to expand the scope. This patch changes the names on the management of dirty pages and to count dirty pages in each inode info as well. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 11 9月, 2014 1 次提交
-
-
由 Jaegeuk Kim 提交于
This patch is to remove lengthy name by adding a new variable. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 10 9月, 2014 10 次提交
-
-
由 Jaegeuk Kim 提交于
If application throws negative value of lseek with SEEK_DATA|SEEK_HOLE, previous f2fs went into BUG_ON in get_dnode_of_data, which was reported by Tommi Rantala. He could make a simple code to detect this having: lseek(fd, -17595150933902LL, SEEK_DATA); This patch should resolve that bug. Reported-by: NTommi Rentala <tt.rantala@gmail.com> [Jaegeuk Kim: relocate the condition as suggested by Chao] Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Huang Ying 提交于
In gc_node_segment, if node page gc is run concurrently with node page writeback, and check_valid_map and get_node_page run after page locked and before cur_valid_map is updated as below, it is possible for the page to be written twice unnecessarily. sync_node_pages try_lock_page ... check_valid_map f2fs_write_node_page ... write_node_page do_write_page allocate_data_block ... refresh_sit_entry /* update cur_valid_map */ ... ... unlock_page get_node_page ... set_page_dirty ... f2fs_put_page unlock_page This can be solved via calling check_valid_map after get_node_page again. Signed-off-by: NHuang, Ying <ying.huang@intel.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Gu Zheng 提交于
We use flush cmd control to collect many flush cmds, and flush them together. In this case, we use two list to manage the flush cmds (collect and dispatch), and one spin lock is used to protect this. In fact, the lock-less list(llist) is very suitable to this case, and we use simplify this routine. - v2: -use llist_for_each_entry_safe to fix possible use-after-free issue. -remove the unused field from struct flush_cmd. Thanks for Yu's suggestion. - Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
In commit aec71382 ("f2fs: refactor flush_nat_entries codes for reducing NAT writes"), we descripte the issue as below: "Although building NAT journal in cursum reduce the read/write work for NAT block, but previous design leave us lower performance when write checkpoint frequently for these cases: 1. if journal in cursum has already full, it's a bit of waste that we flush all nat entries to page for persistence, but not to cache any entries. 2. if journal in cursum is not full, we fill nat entries to journal util journal is full, then flush the left dirty entries to disk without merge journaled entries, so these journaled entries may be flushed to disk at next checkpoint but lost chance to flushed last time." Actually, we have the same problem in using SIT journal area. In this patch, firstly we will update sit journal with dirty entries as many as possible. Secondly if there is no space in sit journal, we will remove all entries in journal and walk through the whole dirty entry bitmap of sit, accounting dirty sit entries located in same SIT block to sit entry set. All entry sets are linked to list sit_entry_set in sm_info, sorted ascending order by count of entries in set. Later we flush entries in set which have fewest entries into journal as many as we can, and then flush dense set with merged entries to disk. In this way we can use sit journal area more effectively, also we will reduce SIT update, result in gaining in performance and saving lifetime of flash device. In my testing environment, it shows this patch can help to reduce SIT block update obviously. virtual machine + hard disk: fsstress -p 20 -n 400 -l 5 sit page num cp count sit pages/cp based 2006.50 1349.75 1.486 patched 1566.25 1463.25 1.070 Our latency of merging op is small when handling a great number of dirty SIT entries in flush_sit_entries: latency(ns) dirty sit count 36038 2151 49168 2123 37174 2232 Signed-off-by: NChao Yu <chao2.yu@samsung.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO is not used, remove it. Signed-off-by: NChao Yu <chao2.yu@samsung.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
If the roll-forward recovery was failed, we'd better conduct fsck.f2fs. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
This patch adds to handle corner buggy cases for fsck.f2fs. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
This patch replaces BUG cases with f2fs_bug_on to remain fsck.f2fs information. And it implements some void functions to initiate fsck.f2fs too. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
If any f2fs_bug_on is triggered, fsck.f2fs is needed. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
This patch adds sbi->need_fsck to conduct fsck.f2fs later. This flag can only be removed by fsck.f2fs. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 04 9月, 2014 1 次提交
-
-
由 Jaegeuk Kim 提交于
This patch adds three inline functions to clean up dirty casting codes. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 02 9月, 2014 1 次提交
-
-
由 Chao Yu 提交于
As the race condition on the inode cache, following scenario can appear: [Thread a] [Thread b] ->f2fs_mkdir ->f2fs_add_link ->__f2fs_add_link ->init_inode_metadata failed here ->gc_thread_func ->f2fs_gc ->do_garbage_collect ->gc_data_segment ->f2fs_iget ->iget_locked ->wait_on_inode ->unlock_new_inode ->move_data_page ->make_bad_inode ->iput When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode should be set as bad to avoid being accessed by other thread. But in above scenario, it allows f2fs to access the invalid inode before this inode was set as bad. This patch fix the potential problem, and this issue was found by code review. change log from v1: o Add condition judgment in gc_data_segment() suggested by Changman Lee. o use iget_failed to simplify code. Signed-off-by: NChao Yu <chao2.yu@samsung.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 29 8月, 2014 2 次提交
-
-
由 Jaegeuk Kim 提交于
The dentry name type is unsigned char *. If we don't match this type, some character codes can be changed by signed bit. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Dan Carpenter 提交于
We can make the code a bit simpler because we know that "!retry" is zero. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 26 8月, 2014 1 次提交
-
-
由 Jaegeuk Kim 提交于
This verifies to truncate any allocated blocks, offset[0], by inline_data. Not figured out, but for making sure. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 23 8月, 2014 1 次提交
-
-
由 Chao Yu 提交于
This patch introduces DEF_NIDS_PER_INODE/GET_ORPHAN_BLOCKS/F2FS_CP_PACKS macro instead of numbers in code for readability. change log from v1: o fix typo pointed out by Jaegeuk Kim. Signed-off-by: NChao Yu <chao2.yu@samsung.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
- 22 8月, 2014 12 次提交
-
-
由 Chao Yu 提交于
This patch introduce need_do_checkpoint() to include numerous judgment condition for readability. Signed-off-by: NChao Yu <chao2.yu@samsung.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Chao Yu 提交于
Theoretically, our total inodes number is the same as total node number, but there are three node ids are reserved in f2fs, they are 0, 1 (node nid), and 2 (meta nid), and they should never be used by user, so our total/free inode number calculated in ->statfs is wrong. This patch indroduces F2FS_RESERVED_NODE_NUM and then fixes this issue by recalculating total/free inode number with the macro. Signed-off-by: NChao Yu <chao2.yu@samsung.com> Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
Refer the following patch. commit 7177a9c4 Author: Miklos Szeredi <mszeredi@suse.cz> Date: Wed Jul 23 15:15:30 2014 +0200 fs: call rename2 if exists Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
This patch checks inline_data one more time under the inode page lock whether its inline_data is converted or not. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
I think we need to let the dirty node pages remain in the page cache instead of rewriting them in their places. So, after done with successful recovery, write_checkpoint will flush all of them through the normal write path. Through this, we can avoid potential error cases in terms of block allocation. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
The init_inode_metadata calls truncate_blocks when error is occurred. The callers holds f2fs_lock_op, so we should not call it again in truncate_blocks. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
Any checkpoint should not be done during the core roll-forward procedure. Especially, it includes error cases too. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
This patch adds WARN_ON when f2fs_bug_on is disable to see kernel messages. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
There are two rules when EIO is occurred. 1. don't write any checkpoint data to preserve the previous checkpoint 2. don't lose the cached dentry/node/meta pages So, at first, this patch adds set_page_dirty in f2fs_write_end_io's failure. Then, writing checkpoint/dentry/node blocks is not allowed. Note that, for the data pages, we can't just throw away by redirtying them. Otherwise, kworker can fall into infinite loop to flush them. (Ref. xfstests/019) Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
It needs to check s_dirty under cp_mutex, since s_dirty is reset under that mutex. And previous condition was not correct, since we can omit doing checkpoint when checkpoint was done followed by all the node pages were written back. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
This patch fixes missing unlock_page when a node page is redirtied out. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-
由 Jaegeuk Kim 提交于
This patch adds f2fs_cp_error for readability. Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
-