提交 · 88b88a66797159949cec32eaab12b4968f6fae2d · openanolis / cloud-kernel

07 10月, 2014 1 次提交

由 Jaegeuk Kim 提交于 10月 06, 2014

This patch introduces a very limited functionality for atomic write support.
In order to support atomic write, this patch adds two ioctls:
 o F2FS_IOC_START_ATOMIC_WRITE
 o F2FS_IOC_COMMIT_ATOMIC_WRITE

The database engine should be aware of the following sequence.
1. open
 -> ioctl(F2FS_IOC_START_ATOMIC_WRITE);
2. writes
  : all the written data will be treated as atomic pages.
3. commit
 -> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
  : this flushes all the data blocks to the disk, which will be shown all or
  nothing by f2fs recovery procedure.
4. repeat to #2.

The IO pattens should be:

  ,- START_ATOMIC_WRITE                  ,- COMMIT_ATOMIC_WRITE
 CP | D D D D D D | FSYNC | D D D D | FSYNC ...
                      `- COMMIT_ATOMIC_WRITE
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

88b88a66

06 10月, 2014 1 次提交

f2fs: remove unused return value · 120c2cba

由 Jaegeuk Kim 提交于 10月 03, 2014

Don't return any value without any usage.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

120c2cba

01 10月, 2014 7 次提交

f2fs: clean up f2fs_ioctl functions · 52656e6c

由 Jaegeuk Kim 提交于 9月 24, 2014

This patch cleans up f2fs_ioctl functions for better readability.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

52656e6c

f2fs: potential shift wrapping buf in f2fs_trim_fs() · 8a21984d

由 Dan Carpenter 提交于 9月 25, 2014

My static checker complains that segment is a u64 but only the lower 31
bits can be used before we hit a shift wrapping bug.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

8a21984d

f2fs: call f2fs_unlock_op after error was handled · 44c16156

由 Jaegeuk Kim 提交于 9月 25, 2014

This patch relocates f2fs_unlock_op in every directory operations to be called
after any error was processed.
Otherwise, the checkpoint can be entered with valid node ids without its
dentry when -ENOSPC is occurred.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

44c16156

f2fs: check the use of macros on block counts and addresses · 7cd8558b

由 Jaegeuk Kim 提交于 9月 23, 2014

This patch cleans up the existing and new macros for readability.

Rule is like this.

         ,-----------------------------------------> MAX_BLKADDR -,
         |  ,------------- TOTAL_BLKS ----------------------------,
         |  |                                                     |
         |  ,- seg0_blkaddr   ,----- sit/nat/ssa/main blkaddress  |
block    |  | (SEG0_BLKADDR)  | | | |   (e.g., MAIN_BLKADDR)      |
address  0..x................ a b c d .............................
            |                                                     |
global seg# 0...................... m .............................
            |                       |                             |
            |                       `------- MAIN_SEGS -----------'
            `-------------- TOTAL_SEGS ---------------------------'
                                    |                             |
 seg#                               0..........xx..................

= Note =
 o GET_SEGNO_FROM_SEG0 : blk address -> global segno
 o GET_SEGNO           : blk address -> segno
 o START_BLOCK         : segno -> starting block address
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7cd8558b

f2fs: refactor flush_nat_entries to remove costly reorganizing ops · 309cc2b6

由 Jaegeuk Kim 提交于 9月 22, 2014

Previously, f2fs tries to reorganize the dirty nat entries into multiple sets
according to its nid ranges. This can improve the flushing nat pages, however,
if there are a lot of cached nat entries, it becomes a bottleneck.

This patch introduces a new set management flow by removing dirty nat list and
adding a series of set operations when the nat entry becomes dirty.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

309cc2b6

f2fs: introduce FITRIM in f2fs_ioctl · 4b2fecc8

由 Jaegeuk Kim 提交于 9月 20, 2014

This patch introduces FITRIM in f2fs_ioctl.
In this case, f2fs will issue small discards and prefree discards as many as
possible for the given area.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4b2fecc8

f2fs: introduce cp_control structure · 75ab4cb8

由 Jaegeuk Kim 提交于 9月 20, 2014

This patch add a new data structure to control checkpoint parameters.
Currently, it presents the reason of checkpoint such as is_umount and normal
sync.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

75ab4cb8

24 9月, 2014 15 次提交

f2fs: use more free segments until SSR is activated · 95dd8973

由 Jaegeuk Kim 提交于 9月 17, 2014

Previously, f2fs activates SSR if the # of free segments reaches to the # of
overprovisioned segments.
In this case, SSR starts to use dirty segments only, so that the overprovisoned
space cannot be selected for new data.
This means that we have no chance to utilizae the overprovisioned space at all.

This patch fixes that by allowing LFS allocations until the # of free segments
reaches to the last threshold, reserved space.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

95dd8973

f2fs: change the ipu_policy option to enable combinations · 9b5f136f

由 Jaegeuk Kim 提交于 9月 16, 2014

This patch changes the ipu_policy setting to use any combination of orthogonal policies.
Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

9b5f136f

f2fs: fix to search whole dirty segmap when get_victim · 210f41bc

由 Chao Yu 提交于 9月 15, 2014

In ->get_victim we get max_search value from dirty_i->nr_dirty without
protection of seglist_lock, after that, nr_dirty can be increased/decreased
before we hold seglist_lock lock.
Then in main loop we attempt to traverse all dirty section one time to find
victim section, but it's not accurate to use max_search as the total loop count,
because we might lose checking several sections or check sections redundantly
for the case of nr_dirty are increased or decreased previously.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

210f41bc

f2fs: fix to clean previous mount option when remount_fs · 26666c8a

由 Chao Yu 提交于 9月 15, 2014

In manual of mount, we descript remount as below:

"mount -o remount,rw /dev/foo /dir
After  this call all old mount options are replaced and arbitrary stuff from
fstab is ignored, except the loop= option which is internally generated and
maintained by the mount command."

Previously f2fs do not clear up old mount options when remount_fs, so we have no
chance of disabling previous option (e.g. flush_merge). Fix it.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

26666c8a

f2fs: skip punching hole in special condition · 14cecc5c

由 Chao Yu 提交于 9月 15, 2014

Now punching hole in directory is not supported in f2fs, so let's limit file
type in punch_hole().

In addition, in punch_hole if offset is exceed file size, we should skip
punching hole.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

14cecc5c

f2fs: support large sector size · 55cf9cb6

由 Chao Yu 提交于 9月 15, 2014

Block size in f2fs is 4096 bytes, so theoretically, f2fs can support 4096 bytes
sector device at maximum. But now f2fs only support 512 bytes size sector, so
block device such as zRAM which uses page cache as its block storage space will
not be mounted successfully as mismatch between sector size of zRAM and sector
size of f2fs supported.

In this patch we support large sector size in f2fs, so block device with sector
size of 512/1024/2048/4096 bytes can be supported in f2fs.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

55cf9cb6

f2fs: fix to truncate blocks past EOF in ->setattr · 09db6a2e

由 Chao Yu 提交于 9月 15, 2014

By using FALLOC_FL_KEEP_SIZE in ->fallocate of f2fs, we can fallocate block past
EOF without changing i_size of inode. These blocks past EOF will not be
truncated in ->setattr as we truncate them only when change the file size.

We should give a chance to truncate blocks out of filesize in setattr().
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

09db6a2e

f2fs: update i_size when __allocate_data_block · 976e4c50

由 Jaegeuk Kim 提交于 9月 15, 2014

The f2fs_direct_IO uses __allocate_data_block, but inside the allocation path,
we should update i_size at the changed time to update its inode page.
Otherwise, we can get wrong i_size after roll-forward recovery.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

976e4c50

f2fs: use MAX_BIO_BLOCKS(sbi) · 90a893c7

由 Jaegeuk Kim 提交于 9月 22, 2014

This patch cleans up a simple macro.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

90a893c7

f2fs: remove redundant operation during roll-forward recovery · c52e1b10

由 Jaegeuk Kim 提交于 9月 11, 2014

If same data is updated multiple times, we don't need to redo whole the
operations.
Let's just update the lastest one.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c52e1b10

f2fs: do not skip latest inode information · 19c9c466

由 Jaegeuk Kim 提交于 9月 10, 2014

In f2fs_sync_file, if there is no written appended writes, it skips
to write its node blocks.
But, if there is up-to-date inode page, we should write it to update
its metadata during the roll-forward recovery.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

19c9c466

f2fs: fix roll-forward missing scenarios · 441ac5cb

由 Jaegeuk Kim 提交于 9月 15, 2014

We can summarize the roll forward recovery scenarios as follows.

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
-> Update the latest inode(x).

2. inode(x) | CP | inode(F) | dnode(F)
-> No problem.

3. inode(x) | CP | dnode(F) | inode(x)
-> Recover to the latest dnode(F), and drop the last inode(x)

4. inode(x) | CP | dnode(F) | inode(F)
-> No problem.

5. CP | inode(x) | dnode(F)
-> The inode(DF) was missing. Should drop this dnode(F).

6. CP | inode(DF) | dnode(F)
-> No problem.

7. CP | dnode(F) | inode(DF)
-> If f2fs_iget fails, then goto next to find inode(DF).

8. CP | dnode(F) | inode(x)
-> If f2fs_iget fails, then goto next to find inode(DF).
   But it will fail due to no inode(DF).

So, this patch adds some missing points such as #1, #5, #7, and #8.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

441ac5cb

f2fs: fix conditions to remain recovery information in f2fs_sync_file · 88bd02c9

由 Jaegeuk Kim 提交于 9月 15, 2014

This patch revisited whole the recovery information during the f2fs_sync_file.

In this patch, there are three information to make a decision.

a) IS_CHECKPOINTED,	/* is it checkpointed before? */
b) HAS_FSYNCED_INODE,	/* is the inode fsynced before? */
c) HAS_LAST_FSYNC,	/* has the latest node fsync mark? */

And, the scenarios for our rule are based on:

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
2. inode(x) | CP | inode(F) | dnode(F)
3. inode(x) | CP | dnode(F) | inode(x) | inode(F)
4. inode(x) | CP | dnode(F) | inode(F)
5. CP | inode(x) | dnode(F) | inode(DF)
6. CP | inode(DF) | dnode(F)
7. CP | dnode(F) | inode(DF)
8. CP | dnode(F) | inode(x) | inode(DF)

For example, #3, the three conditions should be changed as follows.

   inode(x) | CP | dnode(F) | inode(x) | inode(F)
a)    x       o      o          o          o
b)    x       x      x          x          o
c)    x       o      o          x          o

If f2fs_sync_file stops   ------^,
 it should write inode(F)    --------------^

So, the need_inode_block_update should return true, since
 c) get_nat_flag(e, HAS_LAST_FSYNC), is false.

For example, #8,
      CP | alloc | dnode(F) | inode(x) | inode(DF)
a)    o      x        x          x          x
b)    x               x          x          o
c)    o               o          x          o

If f2fs_sync_file stops   -------^,
 it should write inode(DF)    --------------^

Note that, the roll-forward policy should follow this rule, which means,
if there are any missing blocks, we doesn't need to recover that inode.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

88bd02c9

f2fs: introduce a flag to represent each nat entry information · 7ef35e3b

由 Jaegeuk Kim 提交于 9月 15, 2014

This patch introduces a flag in the nat entry structure to merge various
information such as checkpointed and fsync_done marks.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7ef35e3b

f2fs: use meta_inode cache to improve roll-forward speed · 4c521f49

由 Jaegeuk Kim 提交于 9月 11, 2014

Previously, all the dnode pages should be read during the roll-forward recovery.
Even worsely, whole the chain was traversed twice.
This patch removes that redundant and costly read operations by using page cache
of meta_inode and readahead function as well.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4c521f49

16 9月, 2014 5 次提交

f2fs: fix double lock for inode page during roll-foward recovery · 60979115

由 Jaegeuk Kim 提交于 9月 13, 2014

If the inode is same and its data index are needed to truncate, we can fall into
double lock for its inode page via get_dnode_of_data.

Error case is like this.

1. write data 1, 2, 3, 4, 5 in inode #4.
2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4.
3. sync
4. update data 100->106 in dnode #6.
5. fsync inode #4.
6. power-cut

-> Then,
1. go back to #3's checkpoint
2. in do_recover_data, get_dnode_of_data() gets inode #4.
3. detect 100->106 in dnode #6.
4. check_index_in_prev_nodes tries to truncate 100 in dnode #6.
5. to trigger truncate_hole, get_dnode_of_data should grab inode #4.
6. detect *kernel hang*

This patch should resolve that bug.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

60979115

f2fs: fix a race condition in next_free_nid · c6e48930

由 Huang Ying 提交于 9月 12, 2014

The nm_i->fcnt checking is executed before spin_lock, so if another
thread delete the last free_nid from the list, the wrong nid may be
gotten.  So fix the race condition by moving the nm_i->fnct checking
into spin_lock.
Signed-off-by: NHuang, Ying <ying.huang@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c6e48930

f2fs: use nm_i->next_scan_nid as default for next_free_nid · 77041823

由 Huang Ying 提交于 9月 12, 2014

Now, if there is no free nid in nm_i->free_nid_list, 0 may be saved
into next_free_nid of checkpoint, this may cause useless scanning for
next mount.  nm_i->next_scan_nid should be a better default value than
0.
Signed-off-by: NHuang, Ying <ying.huang@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

77041823

f2fs: give an option to enable in-place-updates during fsync to users · c1ce1b02

由 Jaegeuk Kim 提交于 9月 10, 2014

If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c1ce1b02

f2fs: expand counting dirty pages in the inode page cache · a7ffdbe2

由 Jaegeuk Kim 提交于 9月 12, 2014

Previously f2fs only counts dirty dentry pages, but there is no reason not to
expand the scope.

This patch changes the names on the management of dirty pages and to count
dirty pages in each inode info as well.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

a7ffdbe2

11 9月, 2014 1 次提交

f2fs: remove lengthy inode->i_ino · 2403c155

由 Jaegeuk Kim 提交于 9月 10, 2014

This patch is to remove lengthy name by adding a new variable.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2403c155

10 9月, 2014 10 次提交

f2fs: fix negative value for lseek offset · 0b4c5afd

由 Jaegeuk Kim 提交于 9月 08, 2014

If application throws negative value of lseek with SEEK_DATA|SEEK_HOLE,
previous f2fs went into BUG_ON in get_dnode_of_data, which was reported
by Tommi Rantala.

He could make a simple code to detect this having:
	lseek(fd, -17595150933902LL, SEEK_DATA);

This patch should resolve that bug.
Reported-by: NTommi Rentala <tt.rantala@gmail.com>
[Jaegeuk Kim: relocate the condition as suggested by Chao]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0b4c5afd

f2fs: avoid node page to be written twice in gc_node_segment · 9a01b56b

由 Huang Ying 提交于 9月 07, 2014

In gc_node_segment, if node page gc is run concurrently with node page
writeback, and check_valid_map and get_node_page run after page locked
and before cur_valid_map is updated as below, it is possible for the
page to be written twice unnecessarily.

			sync_node_pages
			  try_lock_page
			  ...
check_valid_map		  f2fs_write_node_page
			    ...
			    write_node_page
			      do_write_page
			        allocate_data_block
				  ...
				  refresh_sit_entry /* update cur_valid_map */
				  ...
			    ...
			    unlock_page
get_node_page
...
set_page_dirty
...
f2fs_put_page
  unlock_page

This can be solved via calling check_valid_map after get_node_page again.
Signed-off-by: NHuang, Ying <ying.huang@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

9a01b56b

f2fs: use lock-less list(llist) to simplify the flush cmd management · 721bd4d5

由 Gu Zheng 提交于 9月 05, 2014

We use flush cmd control to collect many flush cmds, and flush them
together. In this case, we use two list to manage the flush cmds
(collect and dispatch), and one spin lock is used to protect this.
In fact, the lock-less list(llist) is very suitable to this case,
and we use simplify this routine.

-
v2:
-use llist_for_each_entry_safe to fix possible use-after-free issue.
-remove the unused field from struct flush_cmd.
Thanks for Yu's suggestion.
-
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

721bd4d5

f2fs: refactor flush_sit_entries codes for reducing SIT writes · 184a5cd2

由 Chao Yu 提交于 9月 04, 2014

In commit aec71382 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:

"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
   nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
   journal is full, then flush the left dirty entries to disk without merge
   journaled entries, so these journaled entries may be flushed to disk at next
   checkpoint but lost chance to flushed last time."

Actually, we have the same problem in using SIT journal area.

In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.

In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.

In my testing environment, it shows this patch can help to reduce SIT block
update obviously.

virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
		sit page num	cp count	sit pages/cp
based		2006.50		1349.75		1.486
patched		1566.25		1463.25		1.070

Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns)	dirty sit count
36038		2151
49168		2123
37174		2232
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

184a5cd2

f2fs: remove unneeded sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO · d3a14afd

由 Chao Yu 提交于 9月 04, 2014

sit_i in macro SIT_BLOCK_OFFSET/START_SEGNO is not used, remove it.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d3a14afd

f2fs: need fsck.f2fs if the recovery was failed · b0c44f05

由 Jaegeuk Kim 提交于 9月 02, 2014

If the roll-forward recovery was failed, we'd better conduct fsck.f2fs.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b0c44f05

f2fs: handle bug cases by letting fsck.f2fs initiate · ec325b52

由 Jaegeuk Kim 提交于 9月 02, 2014

This patch adds to handle corner buggy cases for fsck.f2fs.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

ec325b52

f2fs: add BUG cases to initiate fsck.f2fs · 05796763

由 Jaegeuk Kim 提交于 9月 02, 2014

This patch replaces BUG cases with f2fs_bug_on to remain fsck.f2fs information.
And it implements some void functions to initiate fsck.f2fs too.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

05796763

J
f2fs: need fsck.f2fs when f2fs_bug_on is triggered · 9850cf4a
由 Jaegeuk Kim 提交于 9月 02, 2014
```
If any f2fs_bug_on is triggered, fsck.f2fs is needed.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
```
9850cf4a

f2fs: retain inconsistency information to initiate fsck.f2fs · 2ae4c673

由 Jaegeuk Kim 提交于 9月 02, 2014

This patch adds sbi->need_fsck to conduct fsck.f2fs later.
This flag can only be removed by fsck.f2fs.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2ae4c673

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功