提交 · fb5566da9181d33ecdd9892e44f90320e7d4cc9f · openeuler / Kernel

08 1月, 2014 1 次提交

f2fs: improve write performance under frequent fsync calls · fb5566da

由 Jaegeuk Kim 提交于 1月 08, 2014

When considering a bunch of data writes with very frequent fsync calls, we
are able to think the following performance regression.

N: Node IO, D: Data IO, IO scheduler: cfq

Issue    pending IOs
	 D1 D2 D3 D4
 D1         D2 D3 D4 N1
 D2            D3 D4 N1 N2
 N1            D3 D4 N2 D1
 --> N1 can be selected by cfq becase of the same priority of N and D.
     Then D3 and D4 would be delayed, resuling in performance degradation.

So, when processing the fsync call, it'd better give higher priority to data IOs
than node IOs by assigning WRITE and WRITE_SYNC respectively.
This patch improves the random wirte performance with frequent fsync calls by up
to 10%.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

fb5566da

06 1月, 2014 3 次提交

f2fs: add inline_data recovery routine · 1e1bb4ba

由 Jaegeuk Kim 提交于 12月 26, 2013

This patch adds a inline_data recovery routine with the following policy.

[prev.] [next] of inline_data flag
   o       o  -> recover inline_data
   o       x  -> remove inline_data, and then recover data blocks
   x       o  -> remove inline_data, and then recover inline_data
   x       x  -> recover data blocks
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

1e1bb4ba

f2fs: refactor f2fs_convert_inline_data · 9e09fc85

由 Jaegeuk Kim 提交于 12月 27, 2013

Change log from v1:
 o handle NULL pointer of grab_cache_page_write_begin() pointed by Chao Yu.

This patch refactors f2fs_convert_inline_data to check a couple of conditions
internally for deciding whether it needs to convert inline_data or not.

So, the new f2fs_convert_inline_data initially checks:
1) f2fs_has_inline_data(), and
2) the data size to be changed.

If the inode has inline_data but the size to fill is less than MAX_INLINE_DATA,
then we don't need to convert the inline_data with data allocation.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

9e09fc85

f2fs: convert inline_data for punch_hole · 8230a0a4

由 Jaegeuk Kim 提交于 12月 27, 2013

In the punch_hole(), let's convert inline_data all the time for simplicity and
to avoid potential deadlock conditions.
It is pretty much not a big deal to do this.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

8230a0a4

27 12月, 2013 1 次提交

f2fs: don't need to get f2fs_lock_op for the inline_data test · f185ff97

由 Jaegeuk Kim 提交于 12月 27, 2013

This patch locates checking the inline_data prior to calling f2fs_lock_op()
in truncate_blocks(), since getting the lock is unnecessary.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

f185ff97

26 12月, 2013 1 次提交

f2fs: handle inline data operations · 9ffe0fb5

由 Huajun Li 提交于 11月 10, 2013

Hook inline data read/write, truncate, fallocate, setattr, etc.

Files need meet following 2 requirement to inline:
 1) file size is not greater than MAX_INLINE_DATA;
 2) file doesn't pre-allocate data blocks by fallocate().

FI_INLINE_DATA will not be set while creating a new regular inode because
most of the files are bigger than ~3.4K. Set FI_INLINE_DATA only when
data is submitted to block layer, ranther than set it while creating a new
inode, this also avoids converting data from inline to normal data block
and vice versa.

While writting inline data to inode block, the first data block should be
released if the file has a block indexed by i_addr[0].

On the other hand, when a file operation is appied to a file with inline
data, we need to test if this file can remain inline by doing this
operation, otherwise it should be convert into normal file by reserving
a new data block, copying inline data to this new block and clear
FI_INLINE_DATA flag. Because reserve a new data block here will make use
of i_addr[0], if we save inline data in i_addr[0..872], then the first
4 bytes would be overwriten. This problem can be avoided simply by
not using i_addr[0] for inline data.
Signed-off-by: NHuajun Li <huajun.li@intel.com>
Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: NWeihong Xu <weihong.xu@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

9ffe0fb5

23 12月, 2013 3 次提交

f2fs: add unlikely() macro for compiler more aggressively · 6bacf52f

由 Jaegeuk Kim 提交于 12月 06, 2013

This patch adds unlikely() macro into the most of codes.
The basic rule is to add that when:
- checking unusual errors,
- checking page mappings,
- and the other unlikely conditions.

Change log from v1:
 - Don't add unlikely for the NULL test and error test: advised by Andi Kleen.

Cc: Chao Yu <chao2.yu@samsung.com>
Cc: Andi Kleen <andi@firstfloor.org>
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

6bacf52f

f2fs: remove unneeded code in punch_hole · a66c7b2f

由 Chao Yu 提交于 11月 22, 2013

Because FALLOC_FL_PUNCH_HOLE flag must be ORed with FALLOC_FL_KEEP_SIZE
in fallocate, so we could remove the useless 'keep size' branch code which
will never be excuted in punch_hole.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NFan Li <fanofcode.li@samsung.com>
[Jaegeuk Kim: remove an unnecessary parameter togather]
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

a66c7b2f

f2fs: add a new function: f2fs_reserve_block() · b600965c

由 Huajun Li 提交于 11月 10, 2013

Add the function f2fs_reserve_block() to easily reserve new blocks, and
use it to clean up more codes.
Signed-off-by: NHuajun Li <huajun.li@intel.com>
Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: NWeihong Xu <weihong.xu@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

b600965c

31 10月, 2013 1 次提交

f2fs: avoid to wait all the node blocks during fsync · cfe58f9d

由 Jaegeuk Kim 提交于 10月 31, 2013

Previously, f2fs_sync_file() waits for all the node blocks to be written.
But, we don't need to do that, but wait only the inode-related node blocks.

This patch adds wait_on_node_pages_writeback() in which waits inode-related
node blocks that are on writeback.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

cfe58f9d

29 10月, 2013 1 次提交

f2fs: add an option to avoid unnecessary BUG_ONs · 5d56b671

由 Jaegeuk Kim 提交于 10月 29, 2013

If you want to remove unnecessary BUG_ONs, you can just turn off F2FS_CHECK_FS
in your kernel config.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

5d56b671

25 10月, 2013 1 次提交

f2fs: add tracepoint for vm_page_mkwrite · e943a10d

由 Jaegeuk Kim 提交于 10月 25, 2013

This patch adds a tracepoint for f2fs_vm_page_mkwrite.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

e943a10d

07 10月, 2013 1 次提交

f2fs: use rw_sem instead of fs_lock(locks mutex) · e479556b

由 Gu Zheng 提交于 9月 27, 2013

The fs_locks is used to block other ops(ex, recovery) when doing checkpoint.
And each other operate routine(besides checkpoint) needs to acquire a fs_lock,
there is a terrible problem here, if these are too many concurrency threads acquiring
fs_lock, so that they will block each other and may lead to some performance problem,
but this is not the phenomenon we want to see.
Though there are some optimization patches introduced to enhance the usage of fs_lock,
but the thorough solution is using a *rw_sem* to replace the fs_lock.
Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block
other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other,
this can avoid the problem described above completely.
Because of the weakness of rw_sem, the above change may introduce a potential problem
that the checkpoint thread might get starved if other threads are intensively locking
the read semaphore for I/O.(Pointed out by Xu Jin)
In order to avoid this, a wait_list is introduced, the appending read semaphore ops
will be dropped into the wait_list if checkpoint thread is waiting for write semaphore,
and will be waked up when checkpoint thread gives up write semaphore.
Thanks to Kim's previous review and test, and will be very glad to see other guys'
performance tests about this patch.

V2:
  -fix the potential starvation problem.
  -use more suitable func name suggested by Xu Jin.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: adjust minor coding standard]
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

e479556b

26 8月, 2013 1 次提交

f2fs: reserve the xattr space dynamically · de93653f

由 Jaegeuk Kim 提交于 8月 12, 2013

This patch enables the number of direct pointers inside on-disk inode block to
be changed dynamically according to the size of inline xattr space.

The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file
has inline xattr flag.

The number of direct pointers that will be used by inline xattrs is defined as
F2FS_INLINE_XATTR_ADDRS.
Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

de93653f

09 8月, 2013 2 次提交

f2fs: introduce cur_cp_version function to reduce code size · d71b5564

由 Jaegeuk Kim 提交于 8月 09, 2013

This patch introduces a new inline function, cur_cp_version, to reduce redundant
codes.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

d71b5564

f2fs: fix inconsistency between xattr node blocks and its inode · e518ff81

由 Jaegeuk Kim 提交于 8月 09, 2013

Previously xattr node blocks are stored to the COLD_NODE log, which means that
our roll-forward mechanism doesn't recover the xattr node blocks at all.
Only the direct node blocks in the WARM_NODE log can be recovered.

So, let's resolve the issue simply by conducting checkpoint during fsync when a
file has a modified xattr node block.

This approach is able to degrade the performance, but normally the checkpoint
overhead is shown at the initial fsync call after the xattr entry changes.
Once the checkpoint is done, no additional overhead would be occurred.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

e518ff81

30 7月, 2013 3 次提交

f2fs: fix i_name during f2fs_sync_file · f0947e5c

由 Jaegeuk Kim 提交于 7月 22, 2013

As similar as the i_pino fix, i_name also should be fixed when i_nlink is 1.

The errorneous scenario is like this.

1. touch test1
2. link test1 test2
3. unlink test2
4. fsync test1

After this, i_name should be test1.

CC: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

f0947e5c

f2fs: introduce help function F2FS_NODE() · 45590710

由 Gu Zheng 提交于 7月 15, 2013

Introduce help function F2FS_NODE() to simplify the conversion of node_page to
f2fs_node.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

45590710

f2fs: recover date requested by fdatasync · e5d2385e

由 Jaegeuk Kim 提交于 7月 03, 2013

In order to support SQLite that uses fdatasync instead of fsync, we should
guarantee the data requested by fdatasync can be recovered after sudden-power-
off.

So, let's remove the fdatasync condition in f2fs_sync_file.
Otherwise, we can restore the data after sudden-power-off due to nonexistence
of any fsync mark'ed node blocks.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

e5d2385e

14 6月, 2013 4 次提交

f2fs: recover wrong pino after checkpoint during fsync · 354a3399

由 Jaegeuk Kim 提交于 6月 14, 2013

If a file is linked, f2fs loose its parent inode number so that fsync calls
for the linked file should do checkpoint all the time.
But, if we can recover its parent inode number after the checkpoint, we can
adjust roll-forward mechanism for the further fsync calls, which is able to
improve the fsync performance significatly.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

354a3399

f2fs: avoid freqeunt write_inode calls · b3783873

由 Jaegeuk Kim 提交于 6月 10, 2013

If update_inode is called, we don't need to do write_inode.
So, let's use a *dirty* flag for each inode.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

b3783873

f2fs: optimise the truncate_data_blocks_range() range · d7cc950b

由 Namjae Jeon 提交于 6月 08, 2013

The function truncate_data_blocks_range() decrements the valid
block count of inode via dec_valid_block_count(). Since this
function updates the i_blocks field of inode, we can update this
field once we have calculated total the number of blocks
to be freed.

Therefore we can decrement valid blocks outside of the for loop.

	if (nr_free) {
+		dec_valid_block_count(sbi, dn->inode, nr_free);
 		set_page_dirty(dn->node_page);
 		sync_inode_page(dn);
 	}

'nr_free' tells the total number of blocks freed. So, we can
just directly pass this value to dec_valid_block_count() and update
the i_blocks.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NPankaj Kumar <pankaj.km@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

d7cc950b

f2fs: use the F2FS specific flags in f2fs_ioctl() · 6a3e8ef0

由 Namjae Jeon 提交于 6月 08, 2013

In f2fs_ioctl() function, it is using generic flags.
Since F2FS specific flags are defined. So lets use
those flags.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NPankaj Kumar <pankaj.km@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

6a3e8ef0

11 6月, 2013 1 次提交

f2fs: fix i_blocks translation on various types of files · 2d4d9fb5

由 Jaegeuk Kim 提交于 6月 07, 2013

Basically an inode manages the number of allocated blocks with inode->i_blocks
which is represented in a unit of sectors, not file system blocks.
But, f2fs has used i_blocks in a unit of file system blocks, and f2fs_getattr
translates it to the number of sectors when fstat is called.

However, previously f2fs_file_inode_operations only has this, so this patch adds
it to all the types of inode_operations.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

2d4d9fb5

28 5月, 2013 4 次提交

f2fs: reuse the locked dnode page and its inode · b292dcab

由 Jaegeuk Kim 提交于 5月 22, 2013

This patch fixes the following deadlock bug during the recovery.

INFO: task mount:1322 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount           D ffffffff81125870     0  1322   1266 0x00000000
 ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046
 ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8
 ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520
Call Trace:
[<ffffffff81125870>] ? __lock_page+0x70/0x70
[<ffffffff816a92d9>] schedule+0x29/0x70
[<ffffffff816a93af>] io_schedule+0x8f/0xd0
[<ffffffff8112587e>] sleep_on_page+0xe/0x20
[<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0
[<ffffffff81125867>] __lock_page+0x67/0x70
[<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40
[<ffffffff81126857>] find_lock_page+0x67/0x80
[<ffffffff8112698f>] find_or_create_page+0x3f/0xb0
[<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs]
[<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs]
[<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs]
[<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40
[<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs]
[<ffffffff81184cf9>] mount_bdev+0x1c9/0x210
[<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs]
[<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffff81185a13>] mount_fs+0x43/0x1b0
[<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20
[<ffffffff811a0796>] vfs_kern_mount+0x76/0x120
[<ffffffff811a2cb7>] do_mount+0x237/0xa10
[<ffffffff81140b9b>] ? strndup_user+0x5b/0x80
[<ffffffff811a3520>] SyS_mount+0x90/0xe0
[<ffffffff816b3502>] system_call_fastpath+0x16/0x1b

The bug is triggered when check_index_in_prev_nodes tries to get the direct
node page by calling get_node_page.
At this point, if the direct node page is already locked by get_dnode_of_data,
its caller, we got a deadlock condition.

This patch adds additional condition check for the reuse of locked direct node
pages prior to the get_node_page call.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

b292dcab

f2fs: add f2fs_readonly() · 77888c1e

由 Jaegeuk Kim 提交于 5月 20, 2013

Introduce a simple macro function for readability.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

77888c1e

f2fs: reorganize f2fs_vm_page_mkwrite · 9851e6e1

由 Namjae Jeon 提交于 4月 28, 2013

Few things can be changed in the default mkwrite function
1) Make file_update_time at the start before acquiring any lock
2) the condition page_offset(page) >= i_size_read(inode) should be
 changed to page_offset(page) > i_size_read
3) Move wait_on_page_writeback.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NAmit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

9851e6e1

f2fs: change get_new_data_page to pass a locked node page · 64aa7ed9

由 Jaegeuk Kim 提交于 5月 20, 2013

This patch is for passing a locked node page to get_dnode_of_data.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

64aa7ed9

29 4月, 2013 1 次提交

f2fs: check truncation of mapping after lock_page · afcb7ca0

由 Jaegeuk Kim 提交于 4月 26, 2013

We call lock_page when we need to update a page after readpage.
Between grab and lock page, the page can be truncated by other thread.
So, we should check the page after lock_page whether it was truncated or not.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

afcb7ca0

26 4月, 2013 1 次提交

f2fs: give a chance to merge IOs by IO scheduler · c718379b

由 Jaegeuk Kim 提交于 4月 24, 2013

Previously, background GC submits many 4KB read requests to load victim blocks
and/or its (i)node blocks.

...
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed
f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee
f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0]
f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef
f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0]
...

However, by the fact that many IOs are sequential, we can give a chance to merge
the IOs by IO scheduler.
In order to do that, let's use blk_plug.

...
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee
f2fs_gc : f2fs_iget: ino = 143
f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef
<idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0]
<idle> : block_rq_complete: 8,16 R () 1520432 + 96 [0]
<idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0]
<idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0]
<idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0]
<idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0]
<idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0]
<idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0]
...

Note that this issue should be addressed in checkpoint, and some readahead
flows too.
Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

c718379b

23 4月, 2013 3 次提交

f2fs: add tracepoints to debug the block allocation · c01e2853

由 Namjae Jeon 提交于 4月 23, 2013

Add tracepoints to debug the block allocation & fallocate.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NPankaj Kumar <pankaj.km@samsung.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
[Jaegeuk: enhance information]
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

c01e2853

f2fs: add tracepoints for truncate operation · 51dd6249

由 Namjae Jeon 提交于 4月 20, 2013

add tracepoints for tracing the truncate operations
like truncate node/data blocks, f2fs_truncate etc.

Tracepoints are added at entry and exit of operation
to trace the success & failure of operation.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NPankaj Kumar <pankaj.km@samsung.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

51dd6249

f2fs: add tracepoints for sync & inode operations · a2a4a7e4

由 Namjae Jeon 提交于 4月 20, 2013

Add tracepoints in f2fs for tracing the syncing
operations like filesystem sync, file sync enter/exit.
It will helf to trace the code under debugging scenarios.

Also add tracepoints for tracing the various inode operations
like building inode, eviction of inode, link/unlike of
inodes.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NPankaj Kumar <pankaj.km@samsung.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
[Jaegeuk: combine and modify the tracepoint structures]
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

a2a4a7e4

10 4月, 2013 1 次提交
- A
  f2fs: use mnt_want_write_file() in ioctl · bdaec334
  由 Al Viro 提交于 3月 20, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  bdaec334
09 4月, 2013 2 次提交

f2fs: introduce a new global lock scheme · 39936837

由 Jaegeuk Kim 提交于 11月 22, 2012

In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.

Reference the following lock types in f2fs.h.
enum lock_type {
	RENAME,		/* for renaming operations */
	DENTRY_OPS,	/* for directory operations */
	DATA_WRITE,	/* for data write */
	DATA_NEW,	/* for data allocation */
	DATA_TRUNC,	/* for data truncate */
	NODE_NEW,	/* for node allocation */
	NODE_TRUNC,	/* for node truncate */
	NODE_WRITE,	/* for node write */
	NR_LOCK_TYPE,
};

In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.

In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.

For this, I propose a new global lock scheme as follows.

0. Data structure
 - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
 - f2fs_sb_info -> node_write

1. mutex_lock_op(sbi)
 - try to get an avaiable lock from the array.
 - returns the index of the gottern lock variable.

2. mutex_unlock_op(sbi, index of the lock)
 - unlock the given index of the lock.

3. mutex_lock_all(sbi)
 - grab all the locks in the array before the checkpoint.

4. mutex_unlock_all(sbi)
 - release all the locks in the array after checkpoint.

5. block_operations()
 - call mutex_lock_all()
 - sync_dirty_dir_inodes()
 - grab node_write
 - sync_node_pages()

Note that,
 the pairs of mutex_lock_op()/mutex_unlock_op() and
 mutex_lock_all()/mutex_unlock_all() should be used together.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

39936837

f2fs: move f2fs_balance_fs from truncate to punch_hole · 1127a3d4

由 Jason Hrycay 提交于 4月 08, 2013

Move the f2fs_balance_fs out of the truncate_hole function and only
perform that in punch_hole use case.  The commit:

  ed60b1644e7f7e5dd67d21caf7e4425dff05dad0

intended to do this but moved it into truncate_hole to cover more
cases.  However, a deadlock scenario is possible when deleting an inode
entry under specific conditions:

 f2fs_delete_entry()
     mutex_lock_op(sbi, DENTRY_OPS);
     truncate_hole()
         f2fs_balance_fs()
             mutex_lock(&sbi->gc_mutex);
             f2fs_gc()
                 write_checkpoint()
                     block_operations()
                         mutex_lock_op(sbi, DENTRY_OPS);

Lets move it into the punch_hole case to cover the original intent of
avoiding it during fallocate's expand_inode_data case.

Change-Id: I29f8ea1056b0b88b70ba8652d901b6e8431bb27e
Signed-off-by: NJason Hrycay <jason.hrycay@motorola.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

1127a3d4

27 3月, 2013 2 次提交

f2fs: fix to give correct parent inode number for roll forward · 953a3e27

由 Jaegeuk Kim 提交于 3月 21, 2013

When we recover fsync'ed data after power-off-recovery, we should guarantee
that any parent inode number should be correct for each direct inode blocks.

So, let's make the following rules.

- The fsync should do checkpoint to all the inodes that were experienced hard
links.

- So, the only normal files can be recovered by roll-forward.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

953a3e27

f2fs: do not skip writing file meta during fsync · 0ff153a2

由 Jaegeuk Kim 提交于 3月 20, 2013

This patch removes data_version check flow during the fsync call.
The original purpose for the use of data_version was to avoid writng inode
pages redundantly by the fsync calls repeatedly.
However, when user can modify file meta and then call fsync, we should not
skip fsync procedure.
So, let's remove this condition check and hope that user triggers in right
manner.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

0ff153a2

20 3月, 2013 1 次提交

f2fs: fix to call WRITE_FLUSH at the end of fsync · ae51fb31

由 Jaegeuk Kim 提交于 3月 16, 2013

The fsync call should be ended after flushing the in-device caches.
Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

ae51fb31

18 3月, 2013 1 次提交

f2fs: introduce readahead mode of node pages · 266e97a8

由 Jaegeuk Kim 提交于 2月 26, 2013

Previously, f2fs reads several node pages ahead when get_dnode_of_data is called
with RDONLY_NODE flag.
And, this flag is set by the following functions.
- get_data_block_ro
- get_lock_data_page
- do_write_data_page
- truncate_blocks
- truncate_hole

However, this readahead mechanism is initially introduced for the use of
get_data_block_ro to enhance the sequential read performance.

So, let's clarify all the cases with the additional modes as follows.

enum {
	ALLOC_NODE,	/* allocate a new node page if needed */
	LOOKUP_NODE,	/* look up a node without readahead */
	LOOKUP_NODE_RA,	/*
			 * look up a node with readahead called
			 * by get_datablock_ro.
			 */
}
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: NNamjae Jeon <namjae.jeon@samsung.com>

266e97a8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功