- 22 10月, 2013 1 次提交
-
-
由 Gu Zheng 提交于
Introduce the unfailed version of kmem_cache_alloc named f2fs_kmem_cache_alloc to hide the retry routine and make the code a bit cleaner. v2: Fix the wrong use of 'retry' tag pointed out by Gao feng. Use more neat code to remove redundant tag suggested by Haicheng Li. Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 18 10月, 2013 2 次提交
-
-
由 Gu Zheng 提交于
Previously, do_checkpoint() will call congestion_wait() for waiting the pages (previous submitted node/meta/data pages) to be written back. Because congestion_wait() will set a regular period (e.g. HZ / 50 ) for waiting, and no additional wake up mechanism was introduced if IO ends up before regular period costed. Yuan Zhong found there is a situation that after the pages have been written back, but the checkpoint thread still wait for congestion_wait to exit. So here we store checkpoint task into f2fs_sb when doing checkpoint, it'll wait for IO completes if there's IO going on, and in the end IO path, wake up checkpoint task when IO ends up. Thanks to Yuan Zhong's pre work about this problem. Reported-by: NYuan Zhong <yuan.mark.zhong@samsung.com> Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
This patch removes the logic previously introduced to address the starvation on cp_rwsem. One potential there-in bug is that we should cover the wait.list with spin_lock, but the previous code broke this rule. And, actually current rwsem handles this starvation issue reasonably, so that we didn't need to do this before neither. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 07 10月, 2013 1 次提交
-
-
由 Gu Zheng 提交于
The fs_locks is used to block other ops(ex, recovery) when doing checkpoint. And each other operate routine(besides checkpoint) needs to acquire a fs_lock, there is a terrible problem here, if these are too many concurrency threads acquiring fs_lock, so that they will block each other and may lead to some performance problem, but this is not the phenomenon we want to see. Though there are some optimization patches introduced to enhance the usage of fs_lock, but the thorough solution is using a *rw_sem* to replace the fs_lock. Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other, this can avoid the problem described above completely. Because of the weakness of rw_sem, the above change may introduce a potential problem that the checkpoint thread might get starved if other threads are intensively locking the read semaphore for I/O.(Pointed out by Xu Jin) In order to avoid this, a wait_list is introduced, the appending read semaphore ops will be dropped into the wait_list if checkpoint thread is waiting for write semaphore, and will be waked up when checkpoint thread gives up write semaphore. Thanks to Kim's previous review and test, and will be very glad to see other guys' performance tests about this patch. V2: -fix the potential starvation problem. -use more suitable func name suggested by Xu Jin. Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: adjust minor coding standard] Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 24 9月, 2013 1 次提交
-
-
由 Yu Chao 提交于
There is a performance problem: when all sbi->fs_lock are holded, then all the following threads may get the same next_lock value from sbi->next_lock_num in function mutex_lock_op, and wait for the same lock(fs_lock[next_lock]), it may cause performance reduce. So we move the sbi->next_lock_num++ before getting lock, this will average the following threads if all sbi->fs_lock are holded. v1-->v2: Drop the needless spin_lock as Jaegeuk suggested. Suggested-by: NJaegeuk Kim <jaegeuk.kim@samsung.com> Signed-off-by: NYu Chao <chao2.yu@samsung.com> Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 26 8月, 2013 4 次提交
-
-
由 Jaegeuk Kim 提交于
0. modified inode structure -------------------------------------- metadata (e.g., i_mtime, i_ctime, etc) -------------------------------------- direct pointers [0 ~ 873] inline xattrs (200 bytes by default) indirect pointers [0 ~ 4] -------------------------------------- node footer -------------------------------------- 1. setxattr flow - read_all_xattrs copies all the xattrs from inline and xattr node block. - handle xattr entries - write_all_xattrs copies modified xattrs into inline and xattr node block. 2. getxattr flow - read_all_xattrs copies all the xattrs from inline and xattr node block. - check target entries 3. Usage # mount -t f2fs -o inline_xattr $DEV $MNT Once mounted with the inline_xattr option, f2fs marks all the newly created files to reserve an amount of inline xattr space explicitly inside the inode block. Without the mount option, f2fs will not touch any existing files and newly created files as well. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
The truncate_xattr_node function will be used by inline xattr. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
This patch enables the number of direct pointers inside on-disk inode block to be changed dynamically according to the size of inline xattr space. The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file has inline xattr flag. The number of direct pointers that will be used by inline xattrs is defined as F2FS_INLINE_XATTR_ADDRS. Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
This patch adds basic inode flags for inline xattrs, F2FS_INLINE_XATTR, and add a mount option, inline_xattr, which is enabled when xattr is set. If the mount option is enabled, all the files are marked with the inline_xattrs flag. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 09 8月, 2013 3 次提交
-
-
由 Jaegeuk Kim 提交于
This patch introduces a new inline function, cur_cp_version, to reduce redundant codes. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
Previously xattr node blocks are stored to the COLD_NODE log, which means that our roll-forward mechanism doesn't recover the xattr node blocks at all. Only the direct node blocks in the WARM_NODE log can be recovered. So, let's resolve the issue simply by conducting checkpoint during fsync when a file has a modified xattr node block. This approach is able to degrade the performance, but normally the checkpoint overhead is shown at the initial fsync call after the xattr entry changes. Once the checkpoint is done, no additional overhead would be occurred. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
This patch fixes the use of XATTR_NODE_OFFSET. o The offset should not use several MSB bits which are used by marking node blocks. o IS_DNODE should handle XATTR_NODE_OFFSET to avoid potential abnormality during the fsync call. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 08 8月, 2013 1 次提交
-
-
由 Jaegeuk Kim 提交于
This patch should resolve the following error reported by kbuild test robot. All error/warnings: In file included from fs/f2fs/dir.c:13:0: >> fs/f2fs/f2fs.h:435:17: error: field 's_kobj' has incomplete type struct kobject s_kobj; The failure was caused by missing the kobject header file in dir.c. So, this patch move the header file to the right location, f2fs.h. CC: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 06 8月, 2013 2 次提交
-
-
由 Jin Xu 提交于
This patch fixes a deadlock bug that occurs quite often when there are concurrent write and fsync on a same file. Following is the simplified call trace when tasks get hung. fsync thread: - f2fs_sync_file ... - f2fs_write_data_pages ... - update_extent_cache ... - update_inode - wait_on_page_writeback bdi writeback thread - __writeback_single_inode - f2fs_write_data_pages - mutex_lock(sbi->writepages) The deadlock happens when the fsync thread waits on a inode page that has been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately, no one else could be able to submit the cached bio to block layer for writeback. This is because the fsync thread already hold a sbi->fs_lock and the sbi->writepages lock, causing the bdi thread being blocked when attempt to write data pages for the same inode. At the same time, f2fs_gc thread does not notice the situation and could not help. Even the sync syscall gets blocked. To fix it, we could submit the cached bio first before waiting on a inode page that is being written back. Signed-off-by: NJin Xu <jinuxstyle@gmail.com> [Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback] Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Namjae Jeon 提交于
Add sysfs entries to control the timing parameters for f2fs gc thread. Various Sysfs options introduced are: gc_min_sleep_time: Min Sleep time for GC in ms gc_max_sleep_time: Max Sleep time for GC in ms gc_no_gc_sleep_time: Default Sleep time for GC in ms Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: NPankaj Kumar <pankaj.km@samsung.com> Reviewed-by: NGu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: fix an umount bug and some minor changes] Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 30 7月, 2013 5 次提交
-
-
由 Jaegeuk Kim 提交于
This patch fixes mishandling of the sbi->n_orphans variable. If users request lots of f2fs_unlink(), check_orphan_space() could be contended. In such the case, sbi->n_orphans can be read incorrectly so that f2fs_unlink() would fall into the wrong state which results in the failure of add_orphan_inode(). So, let's increment sbi->n_orphans virtually prior to the actual orphan inode stuffs. After that, let's release sbi->n_orphans by calling release_orphan_inode or remove_orphan_inode. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
The error is reproducible by: 0. mkfs.f2fs /dev/sdb1 & mount 1. touch test1 2. touch test2 3. mv test1 test2 4. umount 5. dumpt.f2fs -i 4 /dev/sdb1 After this, when we retrieve the inode->i_name of test2 by dump.f2fs, we get test1 instead of test2. This is because f2fs didn't update the file name during the f2fs_rename. So, this patch fixes that. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Gu Zheng 提交于
Introduce help function F2FS_NODE() to simplify the conversion of node_page to f2fs_node. Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Gu Zheng 提交于
Add a help func F2FS_STAT() to get the f2fs_stat_info. Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
You can monitor valid block counts of whole segments in: /proc/fs/f2fs/sdb1/segment_info. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 02 7月, 2013 1 次提交
-
-
由 Jaegeuk Kim 提交于
While calculating CRC for the checkpoint block, we use __u32, but when storing the crc value to the disk, we use __le32. Let's fix the inconsistency. Reported-and-Tested-by: NOded Gabbay <ogabbay@advaoptical.com> Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 14 6月, 2013 3 次提交
-
-
由 Jaegeuk Kim 提交于
If a file is linked, f2fs loose its parent inode number so that fsync calls for the linked file should do checkpoint all the time. But, if we can recover its parent inode number after the checkpoint, we can adjust roll-forward mechanism for the further fsync calls, which is able to improve the fsync performance significatly. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Haicheng Li 提交于
It's used only locally and could be static. Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com> Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
If update_inode is called, we don't need to do write_inode. So, let's use a *dirty* flag for each inode. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 12 6月, 2013 1 次提交
-
-
由 Jaegeuk Kim 提交于
If new dentry block is allocated and its i_size is updated, we should update its inode block together in order to sync i_size and its block allocation. Otherwise, we can loose additional dentry block due to the unconsistent i_size. Errorneous Scenario ------------------- In the recovery routine, - recovery_dentry | - __f2fs_add_link | | - get_new_data_page | | | - i_size_write(new_i_size) | | | - mark_inode_dirty_sync(dir) | | - update_parent_metadata | | | - mark_inode_dirty(dir) | - write_checkpoint - sync_dirty_dir_inodes - filemap_flush(dentry_blocks) - f2fs_write_data_page - skip to write the last dentry block due to index < i_size In the above flow, new_i_size is not updated to its inode block so that the last dentry block will be lost accordingly. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 11 6月, 2013 2 次提交
-
-
由 Jaegeuk Kim 提交于
Basically an inode manages the number of allocated blocks with inode->i_blocks which is represented in a unit of sectors, not file system blocks. But, f2fs has used i_blocks in a unit of file system blocks, and f2fs_getattr translates it to the number of sectors when fstat is called. However, previously f2fs_file_inode_operations only has this, so this patch adds it to all the types of inode_operations. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
This patch adds the support of security labels for f2fs, which will be used by Linus Security Models (LSMs). Quote from http://en.wikipedia.org/wiki/Linux_Security_Modules: "Linux Security Modules (LSM) is a framework that allows the Linux kernel to support a variety of computer security models while avoiding favoritism toward any single security implementation. The framework is licensed under the terms of the GNU General Public License and is standard part of the Linux kernel since Linux 2.6. AppArmor, SELinux, Smack and TOMOYO Linux are the currently accepted modules in the official kernel.". Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 07 6月, 2013 1 次提交
-
-
由 Jaegeuk Kim 提交于
It is possible that iput is skipped after iget during the recovery. In recover_dentry(), dir = f2fs_iget(); ... if (de && inode->i_ino == le32_to_cpu(de->ino)) goto out; In this case, this dir is not able to be added in dirty_dir_inode_list. The actual linking is done only when set_page_dirty() is called. So let's add this newly got inode into the list explicitly, and put it at the end of the recovery routine. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 28 5月, 2013 8 次提交
-
-
由 Namjae Jeon 提交于
Some, counters are needed only for the statistical information while debugging. So, those can be controlled using CONFIG_F2FS_STAT_FS, pushing the usage for few variables under this flag. Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: NAmit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
The on-disk block address is defined as __le32, but in-memory block address, block_t, does as u64. Let's synchronize them to 32 bits. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
This patch fixes the following deadlock bug during the recovery. INFO: task mount:1322 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount D ffffffff81125870 0 1322 1266 0x00000000 ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046 ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8 ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520 Call Trace: [<ffffffff81125870>] ? __lock_page+0x70/0x70 [<ffffffff816a92d9>] schedule+0x29/0x70 [<ffffffff816a93af>] io_schedule+0x8f/0xd0 [<ffffffff8112587e>] sleep_on_page+0xe/0x20 [<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0 [<ffffffff81125867>] __lock_page+0x67/0x70 [<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40 [<ffffffff81126857>] find_lock_page+0x67/0x80 [<ffffffff8112698f>] find_or_create_page+0x3f/0xb0 [<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs] [<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs] [<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs] [<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40 [<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs] [<ffffffff81184cf9>] mount_bdev+0x1c9/0x210 [<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs] [<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffff81185a13>] mount_fs+0x43/0x1b0 [<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20 [<ffffffff811a0796>] vfs_kern_mount+0x76/0x120 [<ffffffff811a2cb7>] do_mount+0x237/0xa10 [<ffffffff81140b9b>] ? strndup_user+0x5b/0x80 [<ffffffff811a3520>] SyS_mount+0x90/0xe0 [<ffffffff816b3502>] system_call_fastpath+0x16/0x1b The bug is triggered when check_index_in_prev_nodes tries to get the direct node page by calling get_node_page. At this point, if the direct node page is already locked by get_dnode_of_data, its caller, we got a deadlock condition. This patch adds additional condition check for the reuse of locked direct node pages prior to the get_node_page call. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
Introduce a simple macro function for readability. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Peter Zijlstra 提交于
Majianpeng reported a lockdep splat for f2fs. It turns out mutex_lock_all() acquires an array of locks (in global/local lock style). Any such operation is always serialized using cp_mutex, therefore there is no fs_lock[] lock-order issue; tell lockdep about this using the mutex_lock_nest_lock() primitive. Reported-by: Nmajianpeng <majianpeng@gmail.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
I found a bug when testing power-off-recovery as follows. [Bug Scenario] 1. create a file 2. fsync the file 3. reboot w/o any sync 4. try to recover the file - found its fsync mark - found its dentry mark : try to recover its dentry - get its file name - get its parent inode number : here we got zero value The reason why we get the wrong parent inode number is that we didn't synchronize the inode page with its newly created inode information perfectly. Especially, previous f2fs stores fi->i_pino and writes it to the cached node page in a wrong order, which incurs the zero-valued i_pino during the recovery. So, this patch modifies the creation flow to fix the synchronization order of inode page with its inode. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
This patch is for passing a locked node page to get_dnode_of_data. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
由 Jaegeuk Kim 提交于
During the dentry recovery routine, recover_inode() triggers __f2fs_add_link with its directory inode. In the following scenario, a bug is captured. 1. dir = f2fs_iget(pino) 2. __f2fs_add_link(dir, name) 3. iput(dir) -> f2fs_evict_inode() faces with BUG_ON(atomic_read(fi->dirty_dents)) Kernel BUG at ffffffffa01c0676 [verbose debug info unavailable] [<ffffffffa01c0676>] f2fs_evict_inode+0x276/0x300 [f2fs] Call Trace: [<ffffffff8118ea00>] evict+0xb0/0x1b0 [<ffffffff8118f1c5>] iput+0x105/0x190 [<ffffffffa01d2dac>] recover_fsync_data+0x3bc/0x1070 [f2fs] [<ffffffff81692e8a>] ? io_schedule+0xaa/0xd0 [<ffffffff81690acb>] ? __wait_on_bit_lock+0x7b/0xc0 [<ffffffff8111a0e7>] ? __lock_page+0x67/0x70 [<ffffffff81165e21>] ? kmem_cache_alloc+0x31/0x140 [<ffffffff8118a502>] ? __d_instantiate+0x92/0xf0 [<ffffffff812a949b>] ? security_d_instantiate+0x1b/0x30 [<ffffffff8118a5b4>] ? d_instantiate+0x54/0x70 This means that we should flush all the dentry pages between iget and iput(). But, during the recovery routine, it is unallowed due to consistency, so we have to wait the whole recovery process. And then, write_checkpoint flushes all the dirty dentry blocks, and nicely we can put the stale dir inodes from the dirty_dir_inode_list. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 29 4月, 2013 1 次提交
-
-
由 Jaegeuk Kim 提交于
In order to avoid build_free_nid lock contention, let's change the order of function calls as follows. At first, check whether there is enough free nids. - If available, just get a free nid with spin_lock without any overhead. - Otherwise, conduct build_free_nids. : scan nat pages, journal nat entries, and nat cache entries. We should consider carefullly not to serve free nids intermediately made by build_free_nids. We can get stable free nids only after build_free_nids is done. Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 26 4月, 2013 1 次提交
-
-
由 Jaegeuk Kim 提交于
Previously, background GC submits many 4KB read requests to load victim blocks and/or its (i)node blocks. ... f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0] f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0] f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0] ... However, by the fact that many IOs are sequential, we can give a chance to merge the IOs by IO scheduler. In order to do that, let's use blk_plug. ... f2fs_gc : f2fs_iget: ino = 143 f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee f2fs_gc : f2fs_iget: ino = 143 f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef <idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0] <idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0] <idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0] <idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0] <idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0] <idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0] <idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0] <idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0] <idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0] ... Note that this issue should be addressed in checkpoint, and some readahead flows too. Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 09 4月, 2013 1 次提交
-
-
由 Jaegeuk Kim 提交于
In the previous version, f2fs uses global locks according to the usage types, such as directory operations, block allocation, block write, and so on. Reference the following lock types in f2fs.h. enum lock_type { RENAME, /* for renaming operations */ DENTRY_OPS, /* for directory operations */ DATA_WRITE, /* for data write */ DATA_NEW, /* for data allocation */ DATA_TRUNC, /* for data truncate */ NODE_NEW, /* for node allocation */ NODE_TRUNC, /* for node truncate */ NODE_WRITE, /* for node write */ NR_LOCK_TYPE, }; In that case, we lose the performance under the multi-threading environment, since every types of operations must be conducted one at a time. In order to address the problem, let's share the locks globally with a mutex array regardless of any types. So, let users grab a mutex and perform their jobs in parallel as much as possbile. For this, I propose a new global lock scheme as follows. 0. Data structure - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS] - f2fs_sb_info -> node_write 1. mutex_lock_op(sbi) - try to get an avaiable lock from the array. - returns the index of the gottern lock variable. 2. mutex_unlock_op(sbi, index of the lock) - unlock the given index of the lock. 3. mutex_lock_all(sbi) - grab all the locks in the array before the checkpoint. 4. mutex_unlock_all(sbi) - release all the locks in the array after checkpoint. 5. block_operations() - call mutex_lock_all() - sync_dirty_dir_inodes() - grab node_write - sync_node_pages() Note that, the pairs of mutex_lock_op()/mutex_unlock_op() and mutex_lock_all()/mutex_unlock_all() should be used together. Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-
- 03 4月, 2013 1 次提交
-
-
由 Jaegeuk Kim 提交于
This patch removes a bitmap for victim segments selected by foreground GC, and modifies the other bitmap for victim segments selected by background GC. 1) foreground GC bitmap : We don't need to manage this, since we just only one previous victim section number instead of the whole victim history. The f2fs uses the victim section number in order not to allocate currently GC'ed section to current active logs. 2) background GC bitmap : This bitmap is used to avoid selecting victims repeatedly by background GCs. In addition, the victims are able to be selected by foreground GCs, since there is no need to read victim blocks during foreground GCs. By the fact that the foreground GC reclaims segments in a section unit, it'd be better to manage this bitmap based on the section granularity. Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
-