提交 · c52e1b10b175bef84f1681946b4a438cc4c84147 · openeuler / raspberrypi-kernel

24 9月, 2014 6 次提交

f2fs: remove redundant operation during roll-forward recovery · c52e1b10

由 Jaegeuk Kim 提交于 9月 11, 2014

If same data is updated multiple times, we don't need to redo whole the
operations.
Let's just update the lastest one.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c52e1b10

f2fs: do not skip latest inode information · 19c9c466

由 Jaegeuk Kim 提交于 9月 10, 2014

In f2fs_sync_file, if there is no written appended writes, it skips
to write its node blocks.
But, if there is up-to-date inode page, we should write it to update
its metadata during the roll-forward recovery.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

19c9c466

f2fs: fix roll-forward missing scenarios · 441ac5cb

由 Jaegeuk Kim 提交于 9月 15, 2014

We can summarize the roll forward recovery scenarios as follows.

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
-> Update the latest inode(x).

2. inode(x) | CP | inode(F) | dnode(F)
-> No problem.

3. inode(x) | CP | dnode(F) | inode(x)
-> Recover to the latest dnode(F), and drop the last inode(x)

4. inode(x) | CP | dnode(F) | inode(F)
-> No problem.

5. CP | inode(x) | dnode(F)
-> The inode(DF) was missing. Should drop this dnode(F).

6. CP | inode(DF) | dnode(F)
-> No problem.

7. CP | dnode(F) | inode(DF)
-> If f2fs_iget fails, then goto next to find inode(DF).

8. CP | dnode(F) | inode(x)
-> If f2fs_iget fails, then goto next to find inode(DF).
   But it will fail due to no inode(DF).

So, this patch adds some missing points such as #1, #5, #7, and #8.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

441ac5cb

f2fs: fix conditions to remain recovery information in f2fs_sync_file · 88bd02c9

由 Jaegeuk Kim 提交于 9月 15, 2014

This patch revisited whole the recovery information during the f2fs_sync_file.

In this patch, there are three information to make a decision.

a) IS_CHECKPOINTED,	/* is it checkpointed before? */
b) HAS_FSYNCED_INODE,	/* is the inode fsynced before? */
c) HAS_LAST_FSYNC,	/* has the latest node fsync mark? */

And, the scenarios for our rule are based on:

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
2. inode(x) | CP | inode(F) | dnode(F)
3. inode(x) | CP | dnode(F) | inode(x) | inode(F)
4. inode(x) | CP | dnode(F) | inode(F)
5. CP | inode(x) | dnode(F) | inode(DF)
6. CP | inode(DF) | dnode(F)
7. CP | dnode(F) | inode(DF)
8. CP | dnode(F) | inode(x) | inode(DF)

For example, #3, the three conditions should be changed as follows.

   inode(x) | CP | dnode(F) | inode(x) | inode(F)
a)    x       o      o          o          o
b)    x       x      x          x          o
c)    x       o      o          x          o

If f2fs_sync_file stops   ------^,
 it should write inode(F)    --------------^

So, the need_inode_block_update should return true, since
 c) get_nat_flag(e, HAS_LAST_FSYNC), is false.

For example, #8,
      CP | alloc | dnode(F) | inode(x) | inode(DF)
a)    o      x        x          x          x
b)    x               x          x          o
c)    o               o          x          o

If f2fs_sync_file stops   -------^,
 it should write inode(DF)    --------------^

Note that, the roll-forward policy should follow this rule, which means,
if there are any missing blocks, we doesn't need to recover that inode.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

88bd02c9

f2fs: introduce a flag to represent each nat entry information · 7ef35e3b

由 Jaegeuk Kim 提交于 9月 15, 2014

This patch introduces a flag in the nat entry structure to merge various
information such as checkpointed and fsync_done marks.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7ef35e3b

f2fs: use meta_inode cache to improve roll-forward speed · 4c521f49

由 Jaegeuk Kim 提交于 9月 11, 2014

Previously, all the dnode pages should be read during the roll-forward recovery.
Even worsely, whole the chain was traversed twice.
This patch removes that redundant and costly read operations by using page cache
of meta_inode and readahead function as well.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4c521f49

16 9月, 2014 5 次提交

f2fs: fix double lock for inode page during roll-foward recovery · 60979115

由 Jaegeuk Kim 提交于 9月 13, 2014

If the inode is same and its data index are needed to truncate, we can fall into
double lock for its inode page via get_dnode_of_data.

Error case is like this.

1. write data 1, 2, 3, 4, 5 in inode #4.
2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4.
3. sync
4. update data 100->106 in dnode #6.
5. fsync inode #4.
6. power-cut

-> Then,
1. go back to #3's checkpoint
2. in do_recover_data, get_dnode_of_data() gets inode #4.
3. detect 100->106 in dnode #6.
4. check_index_in_prev_nodes tries to truncate 100 in dnode #6.
5. to trigger truncate_hole, get_dnode_of_data should grab inode #4.
6. detect *kernel hang*

This patch should resolve that bug.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

60979115

f2fs: fix a race condition in next_free_nid · c6e48930

由 Huang Ying 提交于 9月 12, 2014

The nm_i->fcnt checking is executed before spin_lock, so if another
thread delete the last free_nid from the list, the wrong nid may be
gotten.  So fix the race condition by moving the nm_i->fnct checking
into spin_lock.
Signed-off-by: NHuang, Ying <ying.huang@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c6e48930

f2fs: use nm_i->next_scan_nid as default for next_free_nid · 77041823

由 Huang Ying 提交于 9月 12, 2014

Now, if there is no free nid in nm_i->free_nid_list, 0 may be saved
into next_free_nid of checkpoint, this may cause useless scanning for
next mount.  nm_i->next_scan_nid should be a better default value than
0.
Signed-off-by: NHuang, Ying <ying.huang@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

77041823

f2fs: give an option to enable in-place-updates during fsync to users · c1ce1b02

由 Jaegeuk Kim 提交于 9月 10, 2014

If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c1ce1b02

f2fs: expand counting dirty pages in the inode page cache · a7ffdbe2

由 Jaegeuk Kim 提交于 9月 12, 2014

Previously f2fs only counts dirty dentry pages, but there is no reason not to
expand the scope.

This patch changes the names on the management of dirty pages and to count
dirty pages in each inode info as well.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

a7ffdbe2

11 9月, 2014 1 次提交

f2fs: remove lengthy inode->i_ino · 2403c155

由 Jaegeuk Kim 提交于 9月 10, 2014

This patch is to remove lengthy name by adding a new variable.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2403c155

10 9月, 2014 10 次提交

f2fs: fix negative value for lseek offset · 0b4c5afd

由 Jaegeuk Kim 提交于 9月 08, 2014

If application throws negative value of lseek with SEEK_DATA|SEEK_HOLE,
previous f2fs went into BUG_ON in get_dnode_of_data, which was reported
by Tommi Rantala.

He could make a simple code to detect this having:
	lseek(fd, -17595150933902LL, SEEK_DATA);

This patch should resolve that bug.
Reported-by: NTommi Rentala <tt.rantala@gmail.com>
[Jaegeuk Kim: relocate the condition as suggested by Chao]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0b4c5afd

f2fs: avoid node page to be written twice in gc_node_segment · 9a01b56b

由 Huang Ying 提交于 9月 07, 2014

In gc_node_segment, if node page gc is run concurrently with node page
writeback, and check_valid_map and get_node_page run after page locked
and before cur_valid_map is updated as below, it is possible for the
page to be written twice unnecessarily.

			sync_node_pages
			  try_lock_page
			  ...
check_valid_map		  f2fs_write_node_page
			    ...
			    write_node_page
			      do_write_page
			        allocate_data_block
				  ...
				  refresh_sit_entry /* update cur_valid_map */
				  ...
			    ...
			    unlock_page
get_node_page
...
set_page_dirty
...
f2fs_put_page
  unlock_page

This can be solved via calling check_valid_map after get_node_page again.
Signed-off-by: NHuang, Ying <ying.huang@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

9a01b56b

f2fs: use lock-less list(llist) to simplify the flush cmd management · 721bd4d5

由 Gu Zheng 提交于 9月 05, 2014

We use flush cmd control to collect many flush cmds, and flush them
together. In this case, we use two list to manage the flush cmds
(collect and dispatch), and one spin lock is used to protect this.
In fact, the lock-less list(llist) is very suitable to this case,
and we use simplify this routine.

-
v2:
-use llist_for_each_entry_safe to fix possible use-after-free issue.
-remove the unused field from struct flush_cmd.
Thanks for Yu's suggestion.
-
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

721bd4d5

f2fs: refactor flush_sit_entries codes for reducing SIT writes · 184a5cd2

由 Chao Yu 提交于 9月 04, 2014

In commit aec71382 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:

"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
   nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
   journal is full, then flush the left dirty entries to disk without merge
   journaled entries, so these journaled entries may be flushed to disk at next
   checkpoint but lost chance to flushed last time."

Actually, we have the same problem in using SIT journal area.

In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.

In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.

In my testing environment, it shows this patch can help to reduce SIT block
update obviously.

virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
		sit page num	cp count	sit pages/cp
based		2006.50		1349.75		1.486
patched		1566.25		1463.25		1.070

Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns)	dirty sit count
36038		2151
49168		2123
37174		2232
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

184a5cd2

f2fs: remove unneeded sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO · d3a14afd

由 Chao Yu 提交于 9月 04, 2014

sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO is not used, remove it.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d3a14afd

f2fs: need fsck.f2fs if the recovery was failed · b0c44f05

由 Jaegeuk Kim 提交于 9月 02, 2014

If the roll-forward recovery was failed, we'd better conduct fsck.f2fs.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b0c44f05

f2fs: handle bug cases by letting fsck.f2fs initiate · ec325b52

由 Jaegeuk Kim 提交于 9月 02, 2014

This patch adds to handle corner buggy cases for fsck.f2fs.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

ec325b52

f2fs: add BUG cases to initiate fsck.f2fs · 05796763

由 Jaegeuk Kim 提交于 9月 02, 2014

This patch replaces BUG cases with f2fs_bug_on to remain fsck.f2fs information.
And it implements some void functions to initiate fsck.f2fs too.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

05796763

J
f2fs: need fsck.f2fs when f2fs_bug_on is triggered · 9850cf4a
由 Jaegeuk Kim 提交于 9月 02, 2014
```
If any f2fs_bug_on is triggered, fsck.f2fs is needed.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
```
9850cf4a

f2fs: retain inconsistency information to initiate fsck.f2fs · 2ae4c673

由 Jaegeuk Kim 提交于 9月 02, 2014

This patch adds sbi->need_fsck to conduct fsck.f2fs later.
This flag can only be removed by fsck.f2fs.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2ae4c673

04 9月, 2014 1 次提交

f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB · 4081363f

由 Jaegeuk Kim 提交于 9月 02, 2014

This patch adds three inline functions to clean up dirty casting codes.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4081363f

02 9月, 2014 1 次提交

f2fs: reposition unlock_new_inode to prevent accessing invalid inode · b73e5282

由 Chao Yu 提交于 8月 30, 2014

As the race condition on the inode cache, following scenario can appear:
[Thread a]				[Thread b]
					->f2fs_mkdir
					  ->f2fs_add_link
					    ->__f2fs_add_link
					      ->init_inode_metadata failed here
->gc_thread_func
  ->f2fs_gc
    ->do_garbage_collect
      ->gc_data_segment
        ->f2fs_iget
          ->iget_locked
            ->wait_on_inode
					  ->unlock_new_inode
        ->move_data_page
					  ->make_bad_inode
					  ->iput

When we fail in create/symlink/mkdir/mknod/tmpfile, the new allocated inode
should be set as bad to avoid being accessed by other thread. But in above
scenario, it allows f2fs to access the invalid inode before this inode was set
as bad.
This patch fix the potential problem, and this issue was found by code review.

change log from v1:
 o Add condition judgment in gc_data_segment() suggested by Changman Lee.
 o use iget_failed to simplify code.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b73e5282

29 8月, 2014 2 次提交

f2fs: fix wrong casting for dentry name · 3304b564

由 Jaegeuk Kim 提交于 8月 29, 2014

The dentry name type is unsigned char *.
If we don't match this type, some character codes can be changed by signed bit.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

3304b564

f2fs: simplify by using a literal · 922cedbd

由 Dan Carpenter 提交于 8月 28, 2014

We can make the code a bit simpler because we know that "!retry" is
zero.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

922cedbd

26 8月, 2014 1 次提交

f2fs: truncate stale block for inline_data · c2e69583

由 Jaegeuk Kim 提交于 8月 25, 2014

This verifies to truncate any allocated blocks, offset[0], by inline_data.
Not figured out, but for making sure.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c2e69583

23 8月, 2014 1 次提交

f2fs: use macro for code readability · b5b82205

由 Chao Yu 提交于 8月 22, 2014

This patch introduces DEF_NIDS_PER_INODE/GET_ORPHAN_BLOCKS/F2FS_CP_PACKS macro
instead of numbers in code for readability.

change log from v1:
 o fix typo pointed out by Jaegeuk Kim.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b5b82205

22 8月, 2014 12 次提交

f2fs: introduce need_do_checkpoint for readability · 9d1589ef

由 Chao Yu 提交于 8月 20, 2014

This patch introduce need_do_checkpoint() to include numerous judgment condition
for readability.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

9d1589ef

f2fs: fix incorrect calculation with total/free inode num · c200b1aa

由 Chao Yu 提交于 8月 20, 2014

Theoretically, our total inodes number is the same as total node number, but
there are three node ids are reserved in f2fs, they are 0, 1 (node nid), and 2
(meta nid), and they should never be used by user, so our total/free inode
number calculated in ->statfs is wrong.

This patch indroduces F2FS_RESERVED_NODE_NUM and then fixes this issue by
recalculating total/free inode number with the macro.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c200b1aa

f2fs: remove rename and use rename2 · 04859dba

由 Jaegeuk Kim 提交于 8月 19, 2014

Refer the following patch.

commit 7177a9c4
Author: Miklos Szeredi <mszeredi@suse.cz>
Date:   Wed Jul 23 15:15:30 2014 +0200

    fs: call rename2 if exists
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

04859dba

f2fs: skip if inline_data was converted already · ec4e7af4

由 Jaegeuk Kim 提交于 8月 18, 2014

This patch checks inline_data one more time under the inode page lock whether
its inline_data is converted or not.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

ec4e7af4

f2fs: remove rewrite_node_page · 202095a7

由 Jaegeuk Kim 提交于 8月 15, 2014

I think we need to let the dirty node pages remain in the page cache instead
of rewriting them in their places.
So, after done with successful recovery, write_checkpoint will flush all of them
through the normal write path.
Through this, we can avoid potential error cases in terms of block allocation.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

202095a7

f2fs: avoid double lock in truncate_blocks · 764aa3e9

由 Jaegeuk Kim 提交于 8月 14, 2014

The init_inode_metadata calls truncate_blocks when error is occurred.
The callers holds f2fs_lock_op, so we should not call it again in
truncate_blocks.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

764aa3e9

f2fs: prevent checkpoint during roll-forward · 14f4e690

由 Jaegeuk Kim 提交于 8月 13, 2014

Any checkpoint should not be done during the core roll-forward procedure.
Especially, it includes error cases too.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

14f4e690

f2fs: add WARN_ON in f2fs_bug_on · b3fe0a0d

由 Jaegeuk Kim 提交于 8月 13, 2014

This patch adds WARN_ON when f2fs_bug_on is disable to see kernel messages.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b3fe0a0d

f2fs: handle EIO not to break fs consistency · cf779cab

由 Jaegeuk Kim 提交于 8月 11, 2014

There are two rules when EIO is occurred.
1. don't write any checkpoint data to preserve the previous checkpoint
2. don't lose the cached dentry/node/meta pages

So, at first, this patch adds set_page_dirty in f2fs_write_end_io's failure.
Then, writing checkpoint/dentry/node blocks is not allowed.

Note that, for the data pages, we can't just throw away by redirtying them.
Otherwise, kworker can fall into infinite loop to flush them.
(Ref. xfstests/019)
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

cf779cab

f2fs: check s_dirty under cp_mutex · 8501017e

由 Jaegeuk Kim 提交于 8月 11, 2014

It needs to check s_dirty under cp_mutex, since s_dirty is reset under that
mutex.
And previous condition was not correct, since we can omit doing checkpoint
when checkpoint was done followed by all the node pages were written back.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

8501017e

f2fs: unlock_page when node page is redirtied out · 52746519

由 Jaegeuk Kim 提交于 8月 11, 2014

This patch fixes missing unlock_page when a node page is redirtied out.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

52746519

J
f2fs: introduce f2fs_cp_error for readability · 1e968fdf
由 Jaegeuk Kim 提交于 8月 11, 2014
```
This patch adds f2fs_cp_error for readability.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
```
1e968fdf