提交 · d260081ccf37f57b74396ec48f415f27d1b01b13 · openanolis / cloud-kernel

24 2月, 2017 1 次提交

f2fs: change recovery policy of xattr node block · d260081c

由 Chao Yu 提交于 2月 08, 2017

Currently, if we call fsync after updating the xattr date belongs to the
file, f2fs needs to trigger checkpoint to keep xattr data consistent. But,
this policy cause low performance as checkpoint will block most foreground
operations and cause unneeded and unrelated IOs around checkpoint.

This patch will reuse regular file recovery policy for xattr node block,
so, we change to write xattr node block tagged with fsync flag to warm
area instead of cold area, and during recovery, we search warm node chain
for fsynced xattr block, and do the recovery.

So, for below application IO pattern, performance can be improved
obviously:
- touch file
- create/update/delete xattr entry in file
- fsync file
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d260081c

23 2月, 2017 2 次提交

f2fs: check in-memory nat version bitmap · 599a09b2

由 Chao Yu 提交于 1月 07, 2017

This patch adds a mirror for nat version bitmap, and use it to detect
in-memory bitmap corruption which may be caused by bit-transition of
cache or memory overflow.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

599a09b2

f2fs: clean up with list_{first, last}_entry · 939afa94

由 Chao Yu 提交于 1月 07, 2017

Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

939afa94

24 11月, 2016 2 次提交

f2fs: split free nid list · b8559dc2

由 Chao Yu 提交于 10月 12, 2016

During free nid allocation, in order to do preallocation, we will tag free
nid entry as allocated one and still leave it in free nid list, for other
allocators who want to grab free nids, it needs to traverse the free nid
list for lookup. It becomes overhead in scenario of allocating free nid
intensively by multithreads.

This patch splits free nid list to two list: {free,alloc}_nid_list, to
keep free nids and preallocated free nids separately, after that, traverse
latency will be gone, besides split nid_cnt for separate statistic.

Additionally, introduce __insert_nid_to_list and __remove_nid_from_list for
cleanup.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: modify f2fs_bug_on to avoid needless branches]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b8559dc2

f2fs: fix sparse warnings · 0c0b471e

由 Eric Biggers 提交于 10月 11, 2016

f2fs contained a number of endianness conversion bugs.

Also, one function should have been 'static'.

Found with sparse by running 'make C=2 CF=-D__CHECK_ENDIAN__ fs/f2fs/'
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0c0b471e

01 10月, 2016 2 次提交

f2fs: introduce cp_lock to protect updating of ckpt_flags · aaec2b1d

由 Chao Yu 提交于 9月 20, 2016

This patch introduces spinlock to protect updating process of ckpt_flags
field in struct f2fs_checkpoint, it avoids incorrectly updating in race
condition.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: add __is_set_ckpt_flags likewise __set_ckpt_flags]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

aaec2b1d

f2fs: use crc and cp version to determine roll-forward recovery · a468f0ef

由 Jaegeuk Kim 提交于 9月 19, 2016

Previously, we used cp_version only to detect recoverable dnodes.
In order to avoid same garbage cp_version, we needed to truncate the next
dnode during checkpoint, resulting in additional discard or data write.
If we can distinguish this by using crc in addition to cp_version, we can
remove this overhead.

There is backward compatibility concern where it changes node_footer layout.
So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to
detect new layout. New layout will be activated only when this flag is set.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

a468f0ef

07 7月, 2016 1 次提交

f2fs: produce more nids and reduce readahead nats · ad4edb83

由 Jaegeuk Kim 提交于 6月 16, 2016

The readahead nat pages are more likely to be reclaimed quickly, so it'd better
to gather more free nids in advance.

And, let's keep some free nids as much as possible.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

ad4edb83

08 6月, 2016 2 次提交

f2fs: control not to exceed # of cached nat entries · e589c2c4

由 Jaegeuk Kim 提交于 6月 02, 2016

This is to avoid cache entry management overhead including radix tree.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e589c2c4

J
f2fs: fix wrong percentage · 29710bcf
由 Jaegeuk Kim 提交于 6月 02, 2016
```
This should be 1%, 10MB / 1GB.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
```
29710bcf

23 2月, 2016 4 次提交

f2fs: use wait_for_stable_page to avoid contention · fec1d657

由 Jaegeuk Kim 提交于 1月 20, 2016

In write_begin, if storage supports stable_page, we don't need to wait for
writeback to update its contents.
This patch introduces to use wait_for_stable_page instead of
wait_on_page_writeback.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

fec1d657

f2fs: avoid multiple node page writes due to inline_data · 2049d4fc

由 Jaegeuk Kim 提交于 1月 25, 2016

The sceanrio is:
1. create fully node blocks
2. flush node blocks
3. write inline_data for all the node blocks again
4. flush node blocks redundantly

So, this patch tries to flush inline_data when flushing node blocks.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2049d4fc

f2fs: export dirty_nats_ratio in sysfs · 2304cb0c

由 Chao Yu 提交于 1月 18, 2016

This patch exports a new sysfs entry 'dirty_nat_ratio' to control threshold
of dirty nat entries, if current ratio exceeds configured threshold,
checkpoint will be triggered in f2fs_balance_fs_bg for flushing dirty nats.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2304cb0c

f2fs: flush dirty nat entries when exceeding threshold · 7d768d2c

由 Chao Yu 提交于 1月 18, 2016

When testing f2fs with xfstest, generic/251 is stuck for long time,
the case uses below serials to obtain fresh released space in device,
in order to prepare for following fstrim test.

1. rm -rf /mnt/dir
2. mkdir /mnt/dir/
3. cp -axT `pwd`/ /mnt/dir/
4. goto 1

During preparing step, all nat entries will be cached in nat cache,
most of them are dirty entries with invalid blkaddr, which means
nodes related to these entries have been truncated, and they could
be reused after the dirty entries been checkpointed.

However, there was no checkpoint been triggered, so nid allocators
(e.g. mkdir, creat) will run into long journey of iterating all NAT
pages, looking for free nids in alloc_nid->build_free_nids.

Here, in f2fs_balance_fs_bg we give another chance to do checkpoint
to flush nat entries for reusing them in free nid cache when dirty
entry count exceeds 10% of max count.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7d768d2c

09 1月, 2016 1 次提交

f2fs: avoid unnecessary f2fs_balance_fs calls · 12719ae1

由 Jaegeuk Kim 提交于 1月 07, 2016

Only when node page is newly dirtied, it needs to check whether we need to do
f2fs_gc.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

12719ae1

05 12月, 2015 1 次提交

f2fs: use sbi->blocks_per_seg to avoid unnecessary calculation · 3519e3f9

由 Chao Yu 提交于 12月 01, 2015

Use sbi->blocks_per_seg directly to avoid unnecessary calculation when using
1 << sbi->log_blocks_per_seg.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

3519e3f9

13 10月, 2015 2 次提交

f2fs: export ra_nid_pages to sysfs · ea1a29a0

由 Chao Yu 提交于 10月 12, 2015

After finishing building free nid cache, we will try to readahead
asynchronously 4 more pages for the next reloading, the count of
readahead nid pages is fixed.

In some case, like SMR drive, read less sectors with fixed count
each time we trigger RA may be low efficient, since we will face
high seeking overhead, so we'd better let user to configure this
parameter from sysfs in specific workload.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

ea1a29a0

Revert "f2fs: do not skip dentry block writes" · a1257023

由 Jaegeuk Kim 提交于 10月 08, 2015

The periodic checkpoint can resolve the previous issue.
So, now we can use this again to improve the reported performance regression:

https://lkml.org/lkml/2015/10/8/20

This reverts commit 15bec0ff5a9ba6d203178fa8772259df6207942a.

a1257023

10 10月, 2015 1 次提交

f2fs: do not skip dentry block writes · 90b803e6

由 Jaegeuk Kim 提交于 9月 25, 2015

Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory
pressure and the number of dirty pages is pretty small.

But, we didn't skip for normal data writes, which gives us not much big impact
on overall performance.
Moreover, by skipping some data writes, kworker falls into infinite loop to try
to write blocks, when many dir inodes have only one dentry block.

So, this patch removes skipping data writes.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

90b803e6

29 5月, 2015 1 次提交

f2fs: move existing definitions into f2fs.h · b5492af7

由 Jaegeuk Kim 提交于 4月 20, 2015

This patch moves some inode-related definitions from node.h to f2fs.h to
add new features.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b5492af7

04 3月, 2015 1 次提交

f2fs: introduce infra macro and data structure of rb-tree extent cache · 13054c54

由 Chao Yu 提交于 2月 05, 2015

Introduce infra macro and data structure for rb-tree based extent cache:

Macros:
 * EXT_TREE_VEC_SIZE: indicate vector size for gang lookup in extent tree.
 * F2FS_MIN_EXTENT_LEN: indicate minimum length of extent managed in cache.
 * EXTENT_CACHE_SHRINK_NUMBER: indicate number of extent in cache will be shrunk.

Basic data structures for extent cache:
 * struct extent_tree: extent tree entry per inode.
 * struct extent_node: extent info node linked in extent tree.

Besides, adding new extent cache related fields in f2fs_sb_info.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

13054c54

10 1月, 2015 4 次提交

f2fs: free radix_tree_nodes used by nat_set entries · 7aed0d45

由 Jaegeuk Kim 提交于 1月 07, 2015

In the normal case, the radix_tree_nodes are freed successfully.
But, when cp_error was detected, we should destroy them forcefully.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7aed0d45

f2fs: fix missing cold bit during recovery · 09eb483e

由 Jaegeuk Kim 提交于 12月 23, 2014

In do_recover_data, we find and update previous node pages after updating
its new block addresses.
After then, we call fill_node_footer without reset field, we erase its
cold bit so that this new cold node block is written to wrong log area.
This patch fixes not to miss its old flag.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

09eb483e

f2fs: merge two uchar variable in struct node_info to reduce memory cost · 5c27f4ee

由 Chao Yu 提交于 12月 18, 2014

This patch moves one member of struct nat_entry: _flag_ to struct node_info,
so _version_ in struct node_info and _flag_ which are unsigned char type will
merge to one 32-bit space in register/memory. So the size of nat_entry will be
reduced from 28 bytes to 24 bytes (for 64-bit machine, reduce its size from 40
bytes to 32 bytes) and then slab memory using by f2fs will be reduced.

changes from v2:
 o update description of memory usage gain for 64-bit machine suggested by
   Changman Lee.
changes from v1:
 o introduce inline copy_node_info() to copy valid data from node info suggested
   by Jaegeuk Kim, it can avoid bug.
Reviewed-by: NChangman Lee <cm224.lee@samsung.com>
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

5c27f4ee

f2fs: change atomic and volatile write policies · 1e84371f

由 Jaegeuk Kim 提交于 12月 09, 2014

This patch adds two new ioctls to release inmemory pages grabbed by atomic
writes.
 o f2fs_ioc_abort_volatile_write
  - If transaction was failed, all the grabbed pages and data should be written.
 o f2fs_ioc_release_volatile_write
  - This is to enhance the performance of PERSIST mode in sqlite.

In order to avoid huge memory consumption which causes OOM, this patch changes
volatile writes to use normal dirty pages, instead blocked flushing to the disk
as long as system does not suffer from memory pressure.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

1e84371f

07 11月, 2014 1 次提交

f2fs: control the memory footprint used by ino entries · e5e7ea3c

由 Jaegeuk Kim 提交于 11月 06, 2014

This patch adds to control the memory footprint used by ino entries.
This will conduct best effort, not strictly.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e5e7ea3c

04 11月, 2014 1 次提交

f2fs: introduce f2fs_change_bit to simplify the change bit logic · c6ac4c0e

由 Gu Zheng 提交于 10月 20, 2014

Introduce f2fs_change_bit to simplify the change bit logic in
function set_to_next_nat{sit}.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c6ac4c0e

06 10月, 2014 1 次提交

f2fs: remove unused return value · 120c2cba

由 Jaegeuk Kim 提交于 10月 03, 2014

Don't return any value without any usage.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

120c2cba

01 10月, 2014 1 次提交

f2fs: refactor flush_nat_entries to remove costly reorganizing ops · 309cc2b6

由 Jaegeuk Kim 提交于 9月 22, 2014

Previously, f2fs tries to reorganize the dirty nat entries into multiple sets
according to its nid ranges. This can improve the flushing nat pages, however,
if there are a lot of cached nat entries, it becomes a bottleneck.

This patch introduces a new set management flow by removing dirty nat list and
adding a series of set operations when the nat entry becomes dirty.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

309cc2b6

24 9月, 2014 2 次提交

f2fs: fix conditions to remain recovery information in f2fs_sync_file · 88bd02c9

由 Jaegeuk Kim 提交于 9月 15, 2014

This patch revisited whole the recovery information during the f2fs_sync_file.

In this patch, there are three information to make a decision.

a) IS_CHECKPOINTED,	/* is it checkpointed before? */
b) HAS_FSYNCED_INODE,	/* is the inode fsynced before? */
c) HAS_LAST_FSYNC,	/* has the latest node fsync mark? */

And, the scenarios for our rule are based on:

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
2. inode(x) | CP | inode(F) | dnode(F)
3. inode(x) | CP | dnode(F) | inode(x) | inode(F)
4. inode(x) | CP | dnode(F) | inode(F)
5. CP | inode(x) | dnode(F) | inode(DF)
6. CP | inode(DF) | dnode(F)
7. CP | dnode(F) | inode(DF)
8. CP | dnode(F) | inode(x) | inode(DF)

For example, #3, the three conditions should be changed as follows.

   inode(x) | CP | dnode(F) | inode(x) | inode(F)
a)    x       o      o          o          o
b)    x       x      x          x          o
c)    x       o      o          x          o

If f2fs_sync_file stops   ------^,
 it should write inode(F)    --------------^

So, the need_inode_block_update should return true, since
 c) get_nat_flag(e, HAS_LAST_FSYNC), is false.

For example, #8,
      CP | alloc | dnode(F) | inode(x) | inode(DF)
a)    o      x        x          x          x
b)    x               x          x          o
c)    o               o          x          o

If f2fs_sync_file stops   -------^,
 it should write inode(DF)    --------------^

Note that, the roll-forward policy should follow this rule, which means,
if there are any missing blocks, we doesn't need to recover that inode.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

88bd02c9

f2fs: introduce a flag to represent each nat entry information · 7ef35e3b

由 Jaegeuk Kim 提交于 9月 15, 2014

This patch introduces a flag in the nat entry structure to merge various
information such as checkpointed and fsync_done marks.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7ef35e3b

16 9月, 2014 1 次提交

f2fs: fix a race condition in next_free_nid · c6e48930

由 Huang Ying 提交于 9月 12, 2014

The nm_i->fcnt checking is executed before spin_lock, so if another
thread delete the last free_nid from the list, the wrong nid may be
gotten.  So fix the race condition by moving the nm_i->fnct checking
into spin_lock.
Signed-off-by: NHuang, Ying <ying.huang@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c6e48930

04 9月, 2014 1 次提交

f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB · 4081363f

由 Jaegeuk Kim 提交于 9月 02, 2014

This patch adds three inline functions to clean up dirty casting codes.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4081363f

10 7月, 2014 1 次提交

f2fs: refactor flush_nat_entries codes for reducing NAT writes · aec71382

由 Chao Yu 提交于 6月 24, 2014

Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
   nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
   journal is full, then flush the left dirty entries to disk without merge
   journaled entries, so these journaled entries may be flushed to disk at next
   checkpoint but lost chance to flushed last time.

In this patch we merge dirty entries located in same NAT block to nat entry set,
and linked all set to list, sorted ascending order by entries' count of set.
Later we flush entries in sparse set into journal as many as we can, and then
flush merged entries to disk. In this way we can not only gain in performance,
but also save lifetime of flash device.

In my testing environment, it shows this patch can help to reduce NAT block
writes obviously. In hard disk test case: cost time of fsstress is stablely
reduced by about 5%.

1. virtual machine + hard disk:
fsstress -p 20 -n 200 -l 5
		node num	cp count	nodes/cp
based		4599.6		1803.0		2.551
patched		2714.6		1829.6		1.483

2. virtual machine + 32g micro SD card:
fsstress -p 20 -n 200 -l 1 -w -f chown=0 -f creat=4 -f dwrite=0
-f fdatasync=4 -f fsync=4 -f link=0 -f mkdir=4 -f mknod=4 -f rename=5
-f rmdir=5 -f symlink=0 -f truncate=4 -f unlink=5 -f write=0 -S

		node num	cp count	nodes/cp
based		84.5		43.7		1.933
patched		49.2		40.0		1.23

Our latency of merging op shows not bad when handling extreme case like:
merging a great number of dirty nats:
latency(ns)	dirty nat count
3089219		24922
5129423		27422
4000250		24523

change log from v1:
 o fix wrong logic in add_nat_entry when grab a new nat entry set.
 o swith to create slab cache in create_node_manager_caches.
 o use GFP_ATOMIC instead of GFP_NOFS to avoid potential long latency.

change log from v2:
 o make comment position more appropriate suggested by Jaegeuk Kim.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

aec71382

07 5月, 2014 4 次提交

f2fs: fix checkpatch warning · 8b376249

由 Zhang Zhen 提交于 5月 04, 2014

fix the following checkpatch warning:
WARNING: do {} while (0) macros should not be semicolon terminated
Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

8b376249

f2fs: split grab_cache_page and wait_on_page_writeback for node pages · 54b591df

由 Jaegeuk Kim 提交于 4月 29, 2014

This patch splits grab_cache_page_write_begin into grab_cache_page and
wait_on_page_writeback for node pages.

This patch intends to enhance the latency to get node pages by alleviating
unnecessary wait_on_page_writeback.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

54b591df

f2fs: adjust free mem size to flush dentry blocks · 6fb03f3a

由 Jaegeuk Kim 提交于 4月 16, 2014

If so many dirty dentry blocks are cached, not reached to the flush condition,
we should fall into livelock in balance_dirty_pages.
So, let's consider the mem size for the condition.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

6fb03f3a

f2fs: introduce raw_nat_from_node_info() to simplfy codes · 94dac22e

由 Chao Yu 提交于 4月 17, 2014

This patch introduce raw_nat_from_node_info() to simplfy some codes, and also
use exist function node_info_from_raw_nat() to do the same job.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

94dac22e

20 3月, 2014 2 次提交

f2fs: skip unnecessary node writes during fsync · 479f40c4

由 Jaegeuk Kim 提交于 3月 20, 2014

If multiple redundant fsync calls are triggered, we don't need to write its
node pages with fsync mark continuously.

So, this patch adds FI_NEED_FSYNC to track whether the latest node block is
written with the fsync mark or not.
If the mark was set, a new fsync doesn't need to write a node block.
Otherwise, we should do a new node block with the mark for roll-forward
recovery.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

479f40c4

f2fs: remove unnecessary threshold · a5f42010

由 Jaegeuk Kim 提交于 3月 19, 2014

The NM_WOUT_THRESHOLD is now obsolete since f2fs starts to control on a basis
of the memory footprint.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

a5f42010

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功