提交 · 8074bb515014d281a6f5f1218648aa3abd9c22ab · openeuler / Kernel

27 2月, 2016 1 次提交

f2fs: fix to avoid deadlock when merging inline data · 19c7377b

由 Chao Yu 提交于 2月 26, 2016

When testing with fsstress, kworker and user threads were both blocked:

INFO: task kworker/u16:1:16580 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:1   D ffff8803f2595390     0 16580      2 0x00000000
Workqueue: writeback bdi_writeback_workfn (flush-251:0)
 ffff8802730e5760 0000000000000046 ffff880274729fc0 0000000000012440
 ffff8802730e5fd8 ffff8802730e4010 0000000000012440 0000000000012440
 ffff8802730e5fd8 0000000000012440 ffff880274729fc0 ffff88026eb50000
Call Trace:
 [<ffffffff816fe9d9>] schedule+0x29/0x70
 [<ffffffff816ff895>] rwsem_down_read_failed+0xa5/0xf9
 [<ffffffff81378584>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffffa0694feb>] f2fs_write_data_page+0x31b/0x420 [f2fs]
 [<ffffffffa0690f1a>] __f2fs_writepage+0x1a/0x50 [f2fs]
 [<ffffffffa06922a0>] f2fs_write_data_pages+0xe0/0x290 [f2fs]
 [<ffffffff811473b3>] do_writepages+0x23/0x40
 [<ffffffff811cc3ee>] __writeback_single_inode+0x4e/0x250
 [<ffffffff811cd4f1>] writeback_sb_inodes+0x2c1/0x470
 [<ffffffff811cd73e>] __writeback_inodes_wb+0x9e/0xd0
 [<ffffffff811cda0b>] wb_writeback+0x1fb/0x2d0
 [<ffffffff811cdb7c>] wb_do_writeback+0x9c/0x220
 [<ffffffff811ce232>] bdi_writeback_workfn+0x72/0x1c0
 [<ffffffff8106b74e>] process_one_work+0x1de/0x5b0
 [<ffffffff8106e78f>] worker_thread+0x11f/0x3e0
 [<ffffffff810750ce>] kthread+0xde/0xf0
 [<ffffffff817093f8>] ret_from_fork+0x58/0x90

fsstress thread stack:
 [<ffffffff81139f0e>] sleep_on_page+0xe/0x20
 [<ffffffff81139ef7>] __lock_page+0x67/0x70
 [<ffffffff8113b100>] find_lock_page+0x50/0x80
 [<ffffffff8113b24f>] find_or_create_page+0x3f/0xb0
 [<ffffffffa06983a9>] sync_node_pages+0x259/0x810 [f2fs]
 [<ffffffffa068d874>] write_checkpoint+0x1a4/0xce0 [f2fs]
 [<ffffffffa0686b0c>] f2fs_sync_fs+0x7c/0xd0 [f2fs]
 [<ffffffffa067c813>] f2fs_sync_file+0x143/0x5f0 [f2fs]
 [<ffffffff811d301b>] vfs_fsync_range+0x2b/0x40
 [<ffffffff811d304c>] vfs_fsync+0x1c/0x20
 [<ffffffff811d3291>] do_fsync+0x41/0x70
 [<ffffffff811d32d3>] SyS_fdatasync+0x13/0x20
 [<ffffffff817094a2>] system_call_fastpath+0x16/0x1b
 [<ffffffffffffffff>] 0xffffffffffffffff

The reason of this issue is:
CPU0:					CPU1:
 - f2fs_write_data_pages
					 - f2fs_sync_fs
					  - write_checkpoint
					   - block_operations
					    - f2fs_lock_all
					     - down_write(sbi->cp_rwsem)
  - lock_page(page)
  - f2fs_write_data_page
					    - sync_node_pages
					     - flush_inline_data
					      - pagecache_get_page(page, GFP_LOCK)
   - f2fs_lock_op
    - down_read(sbi->cp_rwsem)

This patch alters to use trylock_page in flush_inline_data to fix this ABBA
deadlock issue.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

19c7377b

26 2月, 2016 1 次提交

f2fs: fix incorrect upper bound when iterating inode mapping tree · 80dd9c0e

由 Chao Yu 提交于 2月 24, 2016

1. Inode mapping tree can index page in range of [0, ULONG_MAX], however,
in some places, f2fs only search or iterate page in ragne of [0, LONG_MAX],
result in miss hitting in page cache.

2. filemap_fdatawait_range accepts range parameters in unit of bytes, so
the max range it covers should be [0, LLONG_MAX], if we use [0, LONG_MAX]
as range for waiting on writeback, big number of pages will not be covered.

This patch corrects above two issues.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

80dd9c0e

23 2月, 2016 13 次提交

f2fs: trace old block address for CoWed page · 7a9d7548

由 Chao Yu 提交于 2月 22, 2016

This patch enables to trace old block address of CoWed page for better
debugging.

f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE
f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE

f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA

f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA
f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7a9d7548

f2fs: try to flush inode after merging inline data · 9a4cbc9e

由 Chao Yu 提交于 2月 22, 2016

When flushing node pages, if current node page is an inline inode page, we
will try to merge inline data from data page into inline inode page, then
skip flushing current node page, it will decrease the number of nodes to
be flushed in batch in this round, which may lead to worse performance.

This patch gives a chance to flush just merged inline inode pages for
performance.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

9a4cbc9e

f2fs: reorder nat cache lock in cache_nat_entry · 1515aef0

由 Chao Yu 提交于 2月 19, 2016

When lookuping nat entry in cache_nat_entry, if we fail to hit nat cache,
we try to load nat entries a) from journal of current segment cache or b)
from NAT pages for updating, during the process, write lock of
nat_tree_lock will be held to avoid inconsistent condition in between
nid cache and nat cache caused by racing among nat entry shrinker,
checkpointer, nat entry updater.

But this way may cause low efficient when updating nat cache, because it
serializes accessing in journal cache or reading NAT pages.

Here, we reorder lock and update flow as below to enhance accessing
concurrency:

 - get_node_info
  - down_read(nat_tree_lock)
  - lookup nat cache --- hit -> unlock & return
  - lookup journal cache --- hit -> unlock & goto update
  - up_read(nat_tree_lock)
update:
  - down_write(nat_tree_lock)
  - cache_nat_entry
   - lookup nat cache --- nohit -> update
  - up_write(nat_tree_lock)
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

1515aef0

f2fs: split journal cache from curseg cache · b7ad7512

由 Chao Yu 提交于 2月 19, 2016

In curseg cache, f2fs caches two different parts:
 - datas of current summay block, i.e. summary entries, footer info.
 - journal info, i.e. sparse nat/sit entries or io stat info.

With this approach, 1) it may cause higher lock contention when we access
or update both of the parts of cache since we use the same mutex lock
curseg_mutex to protect the cache. 2) current summary block with last
journal info will be writebacked into device as a normal summary block
when flushing, however, we treat journal info as valid one only in current
summary, so most normal summary blocks contain junk journal data, it wastes
remaining space of summary block.

So, in order to fix above issues, we split curseg cache into two parts:
a) current summary block, protected by original mutex lock curseg_mutex
b) journal cache, protected by newly introduced r/w semaphore journal_rwsem

When loading curseg cache during ->mount, we store summary info and
journal info into different caches; When doing checkpoint, we combine
datas of two cache into current summary block for persisting.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b7ad7512

f2fs: introduce f2fs_journal struct to wrap journal info · dfc08a12

由 Chao Yu 提交于 2月 14, 2016

Introduce a new structure f2fs_journal to wrap journal info in struct
f2fs_summary_block for readability.

struct f2fs_journal {
	union {
		__le16 n_nats;
		__le16 n_sits;
	};
	union {
		struct nat_journal nat_j;
		struct sit_journal sit_j;
		struct f2fs_extra_info info;
	};
} __packed;

struct f2fs_summary_block {
	struct f2fs_summary entries[ENTRIES_IN_SUM];
	struct f2fs_journal journal;
	struct summary_footer footer;
} __packed;
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

dfc08a12

f2fs: fix missing skip pages info · d31c7c3f

由 Yunlei He 提交于 2月 04, 2016

fix missing skip pages info in f2fs_writepages trace event.
Signed-off-by: NYunlei He <heyunlei@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d31c7c3f

f2fs: introduce f2fs_submit_merged_bio_cond · 0c3a5797

由 Chao Yu 提交于 1月 18, 2016

f2fs use single bio buffer per type data (META/NODE/DATA) for caching
writes locating in continuous block address as many as possible, after
submitting, these writes may be still cached in bio buffer, so we have
to flush cached writes in bio buffer by calling f2fs_submit_merged_bio.

Unfortunately, in the scenario of high concurrency, bio buffer could be
flushed by someone else before we submit it as below reasons:
a) there is no space in bio buffer.
b) add a request of different type (SYNC, ASYNC).
c) add a discontinuous block address.

For this condition, f2fs_submit_merged_bio will be devastating, because
it could break the following merging of writes in bio buffer, split one
big bio into two smaller one.

This patch introduces f2fs_submit_merged_bio_cond which can do a
conditional submitting with bio buffer, before submitting it will judge
whether:
 - page in DATA type bio buffer is matching with specified page;
 - page in DATA type bio buffer is belong to specified inode;
 - page in NODE type bio buffer is belong to specified inode;
If there is no eligible page in bio buffer, we will skip submitting step,
result in gaining more chance to merge consecutive block IOs in bio cache.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0c3a5797

f2fs: wait on page's writeback in writepages path · fa3d2bdf

由 Jaegeuk Kim 提交于 1月 28, 2016

Likewise f2fs_write_cache_pages, let's do for node and meta pages too.
Especially, for node blocks, we should do this before marking its fsync
and dentry flags.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

fa3d2bdf

f2fs: introduce get_next_page_offset to speed up SEEK_DATA · 3cf45747

由 Chao Yu 提交于 1月 26, 2016

When seeking data in ->llseek, if we encounter a big hole which covers
several dnode pages, we will try to seek data from index of page which
is the first page of next dnode page, at most we could skip searching
(ADDRS_PER_BLOCK - 1) pages.

However it's still not efficient, because if our indirect/double-indirect
pointer are NULL, there are no dnode page locate in the tree indirect/
double-indirect pointer point to, it's not necessary to search the whole
region.

This patch introduces get_next_page_offset to calculate next page offset
based on current searching level and max searching level returned from
get_dnode_of_data, with this, we could skip searching the entire area
indirect or double-indirect node block is not exist.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

3cf45747

f2fs: remove unneeded pointer conversion · 81ca7350

由 Chao Yu 提交于 1月 26, 2016

There are redundant pointer conversion in following call stack:
 - at position a, inode was been converted to f2fs_file_info.
 - at position b, f2fs_file_info was been converted to inode again.

 - truncate_blocks(inode,..)
  - fi = F2FS_I(inode)		---a
  - ADDRS_PER_PAGE(node_page, fi)
   - addrs_per_inode(fi)
    - inode = &fi->vfs_inode	---b
    - f2fs_has_inline_xattr(inode)
     - fi = F2FS_I(inode)
     - is_inode_flag_set(fi,..)

In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and
addrs_per_inode to acept parameter with type of inode pointer.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

81ca7350

f2fs: use wait_for_stable_page to avoid contention · fec1d657

由 Jaegeuk Kim 提交于 1月 20, 2016

In write_begin, if storage supports stable_page, we don't need to wait for
writeback to update its contents.
This patch introduces to use wait_for_stable_page instead of
wait_on_page_writeback.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

fec1d657

f2fs: avoid multiple node page writes due to inline_data · 2049d4fc

由 Jaegeuk Kim 提交于 1月 25, 2016

The sceanrio is:
1. create fully node blocks
2. flush node blocks
3. write inline_data for all the node blocks again
4. flush node blocks redundantly

So, this patch tries to flush inline_data when flushing node blocks.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2049d4fc

f2fs: export dirty_nats_ratio in sysfs · 2304cb0c

由 Chao Yu 提交于 1月 18, 2016

This patch exports a new sysfs entry 'dirty_nat_ratio' to control threshold
of dirty nat entries, if current ratio exceeds configured threshold,
checkpoint will be triggered in f2fs_balance_fs_bg for flushing dirty nats.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2304cb0c

12 1月, 2016 1 次提交

f2fs: fix wrong memory condition check · 1663cae4

由 Jaegeuk Kim 提交于 1月 09, 2016

This patch fixes wrong decision for avaliable_free_memory.
The return valus is already set as false, so we should consider true condition
below only.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

1663cae4

09 1月, 2016 3 次提交

f2fs: avoid unnecessary f2fs_balance_fs calls · 12719ae1

由 Jaegeuk Kim 提交于 1月 07, 2016

Only when node page is newly dirtied, it needs to check whether we need to do
f2fs_gc.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

12719ae1

f2fs: introduce __get_node_page to reuse common code · 0e022ea8

由 Chao Yu 提交于 1月 05, 2016

There are duplicated code in between get_node_page and get_node_page_ra,
introduce __get_node_page to includes common parts of these two, and
export get_node_page and get_node_page_ra by reusing __get_node_page.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0e022ea8

f2fs: check node id earily when readaheading node page · e8458725

由 Chao Yu 提交于 1月 08, 2016

Add node id check in ra_node_page and get_node_page_ra like get_node_page.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e8458725

07 1月, 2016 2 次提交

J
Revert "f2fs: check the node block address of newly allocated nid" · 957efb0c
由 Jaegeuk Kim 提交于 1月 02, 2016
```
Original issue is fixed by:

  f2fs: cover more area with nat_tree_lock

This reverts commit 24928634.
```
957efb0c

f2fs: cover more area with nat_tree_lock · a5131193

由 Jaegeuk Kim 提交于 1月 02, 2016

There was a subtle bug on nat cache management which incurs wrong nid allocation
or wrong block addresses when try_to_free_nats is triggered heavily.
This patch enlarges the previous coverage of nat_tree_lock to avoid data race.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

a5131193

01 1月, 2016 1 次提交

f2fs: write pending bios when cp_error is set · 8d4ea29b

由 Jaegeuk Kim 提交于 12月 31, 2015

When testing ioc_shutdown, put_super is able to be hanged by waiting for
writebacking pages as follows.

INFO: task umount:2723 blocked for more than 120 seconds.
      Tainted: G           O    4.4.0-rc3+ #8
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
umount          D ffff88000859f9d8     0  2723   2110 0x00000000
 ffff88000859f9d8 0000000000000000 0000000000000000 ffffffff81e11540
 ffff880078c225c0 ffff8800085a0000 ffff88007fc17440 7fffffffffffffff
 ffffffff818239f0 ffff88000859fb48 ffff88000859f9f0 ffffffff8182310c
Call Trace:
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff8182310c>] schedule+0x3c/0x90
 [<ffffffff81827fb9>] schedule_timeout+0x2d9/0x430
 [<ffffffff810e0f8f>] ? mark_held_locks+0x6f/0xa0
 [<ffffffff8111614d>] ? ktime_get+0x7d/0x140
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff8106a655>] ? kvm_clock_get_cycles+0x25/0x30
 [<ffffffff8111617c>] ? ktime_get+0xac/0x140
 [<ffffffff818239f0>] ? bit_wait+0x50/0x50
 [<ffffffff81822564>] io_schedule_timeout+0xa4/0x110
 [<ffffffff81823a25>] bit_wait_io+0x35/0x50
 [<ffffffff818235bd>] __wait_on_bit+0x5d/0x90
 [<ffffffff811b9e8b>] wait_on_page_bit+0xcb/0xf0
 [<ffffffff810d5f90>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff811cf84c>] truncate_inode_pages_range+0x4bc/0x840
 [<ffffffff811cfc3d>] truncate_inode_pages_final+0x4d/0x60
 [<ffffffffc023ced5>] f2fs_evict_inode+0x75/0x400 [f2fs]
 [<ffffffff812639bc>] evict+0xbc/0x190
 [<ffffffff81263d19>] iput+0x229/0x2c0
 [<ffffffffc0241885>] f2fs_put_super+0x105/0x1a0 [f2fs]
 [<ffffffff8124756a>] generic_shutdown_super+0x6a/0xf0
 [<ffffffff812478f7>] kill_block_super+0x27/0x70
 [<ffffffffc0241290>] kill_f2fs_super+0x20/0x30 [f2fs]
 [<ffffffff81247b03>] deactivate_locked_super+0x43/0x70
 [<ffffffff81247f4c>] deactivate_super+0x5c/0x60
 [<ffffffff81268d2f>] cleanup_mnt+0x3f/0x90
 [<ffffffff81268dc2>] __cleanup_mnt+0x12/0x20
 [<ffffffff810ac463>] task_work_run+0x73/0xa0
 [<ffffffff810032ac>] exit_to_usermode_loop+0xcc/0xd0
 [<ffffffff81003e7c>] syscall_return_slowpath+0xcc/0xe0
 [<ffffffff81829ea2>] int_ret_from_sys_call+0x25/0x9f
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

8d4ea29b

31 12月, 2015 3 次提交

f2fs: let user being aware of IO error · 6d5a1495

由 Chao Yu 提交于 12月 24, 2015

Sometimes we keep dumb when IO error occur in lower layer device, so user
will not receive any error return value for some operation, but actually,
the operation did not succeed.

This sould be avoided, so this patch reports such kind of error to user.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

6d5a1495

f2fs: return early when trying to read null nid · 4aa69d56

由 Jaegeuk Kim 提交于 12月 23, 2015

If get_node_page() gets zero nid, we can return early without getting a wrong
page. For example, get_dnode_of_data() can try to do that.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4aa69d56

f2fs: record node block allocation in dnode_of_data · 93bae099

由 Jaegeuk Kim 提交于 12月 22, 2015

This patch introduces recording node block allocation in dnode_of_data.
This information helps to figure out whether any node block is allocated during
specific file operations.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

93bae099

23 12月, 2015 1 次提交

f2fs: use atomic variable for total_extent_tree · 7441ccef

由 Jaegeuk Kim 提交于 12月 21, 2015

It would be better to use atomic variable for total_extent_tree.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7441ccef

15 12月, 2015 1 次提交

f2fs: clean up node page updating flow · e1c51b9f

由 Chao Yu 提交于 12月 11, 2015

If read_node_page return LOCKED_PAGE, in its caller it's better a) skip
unneeded 'Update' flag and mapping info verfication; b) check nid value
stored in footer structure of node page.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e1c51b9f

13 10月, 2015 5 次提交

f2fs: export ra_nid_pages to sysfs · ea1a29a0

由 Chao Yu 提交于 10月 12, 2015

After finishing building free nid cache, we will try to readahead
asynchronously 4 more pages for the next reloading, the count of
readahead nid pages is fixed.

In some case, like SMR drive, read less sectors with fixed count
each time we trigger RA may be low efficient, since we will face
high seeking overhead, so we'd better let user to configure this
parameter from sysfs in specific workload.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

ea1a29a0

f2fs: readahead for free nids building · 2db2388f

由 Chao Yu 提交于 10月 12, 2015

When there is no free nid in nid cache, all new node allocaters stop their
job to wait for reloading of free nids, however reloading is synchronous as
we will read 4 NAT pages for building nid cache, it cause the long latency.

This patch tries to readahead more NAT pages with READA request flag after
reloading of free nids. It helps to improve performance when users allocate
node id intensively.

Env: Sandisk 32G sd card
time for i in `seq 1 60000`; { echo -n > /mnt/f2fs/$i; echo XXXXXX > /mnt/f2fs/$i;}

Before:
real    0m2.814s
user    0m1.220s
sys     0m1.536s

After:
real    0m2.711s
user    0m1.136s
sys     0m1.568s
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2db2388f

f2fs: support lower priority asynchronous readahead in ra_meta_pages · 26879fb1

由 Chao Yu 提交于 10月 12, 2015

Now, we use ra_meta_pages to reads continuous physical blocks as much as
possible to improve performance of following reads. However, ra_meta_pages
uses a synchronous readahead approach by submitting bio with READ, as READ
is with high priority, it can not be used in the case of preloading blocks,
and it's not sure when these RAed pages will be used.

This patch supports asynchronous readahead in ra_meta_pages by tagging bio
with READA flag in order to allow preloading.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

26879fb1

f2fs: don't tag REQ_META for temporary non-meta pages · 2b947003

由 Chao Yu 提交于 10月 12, 2015

In recovery or checkpoint flow, we grab pages temperarily in meta inode's
mapping for caching temperary data, actually, datas in these pages were
not meta data of f2fs, but still we tag them with REQ_META flag. However,
lower device like eMMC may do some optimization for data of such type.
So in order to avoid wrong optimization, we'd better remove such flag
for temperary non-meta pages.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2b947003

Revert "f2fs: do not skip dentry block writes" · a1257023

由 Jaegeuk Kim 提交于 10月 08, 2015

The periodic checkpoint can resolve the previous issue.
So, now we can use this again to improve the reported performance regression:

https://lkml.org/lkml/2015/10/8/20

This reverts commit 15bec0ff5a9ba6d203178fa8772259df6207942a.

a1257023

10 10月, 2015 2 次提交

f2fs: do not skip dentry block writes · 90b803e6

由 Jaegeuk Kim 提交于 9月 25, 2015

Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory
pressure and the number of dirty pages is pretty small.

But, we didn't skip for normal data writes, which gives us not much big impact
on overall performance.
Moreover, by skipping some data writes, kworker falls into infinite loop to try
to write blocks, when many dir inodes have only one dentry block.

So, this patch removes skipping data writes.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

90b803e6

f2fs: cover number of dirty node pages under node_write lock · 25b93346

由 Jaegeuk Kim 提交于 9月 12, 2015

This number is referenced by checkpoint under node_write lock.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

25b93346

25 8月, 2015 3 次提交

f2fs: fix to release inode correctly · 13ec7297

由 Chao Yu 提交于 8月 24, 2015

In following call stack, if unfortunately we lose all chances to truncate
inode page in remove_inode_page, eventually we will add the nid allocated
previously into free nid cache, this nid is with NID_NEW status and with
NEW_ADDR in its blkaddr pointer:

 - f2fs_create
  - f2fs_add_link
   - __f2fs_add_link
    - init_inode_metadata
     - new_inode_page
      - new_node_page
       - set_node_addr(, NEW_ADDR)
     - f2fs_init_acl   failed
     - remove_inode_page  failed
  - handle_failed_inode
   - remove_inode_page  failed
   - iput
    - f2fs_evict_inode
     - remove_inode_page  failed
     - alloc_nid_failed   cache a nid with valid blkaddr: NEW_ADDR

This may not only cause resource leak of previous inode, but also may cause
incorrect use of the previous blkaddr which is located in NO.nid node entry
when this nid is reused by others.

This patch tries to add this inode to orphan list if we fail to truncate
inode, so that we can obtain a second chance to release it in orphan
recovery flow.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

13ec7297

f2fs: fix wrong pointer access during try_to_free_nids · f7409d0f

由 Jaegeuk Kim 提交于 8月 21, 2015

If we release the lock in list_for_each_entry_safe, we can lose the tmp
pointer by alloc_nid.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

f7409d0f

f2fs: use __GFP_NOFAIL to avoid infinite loop · 80c54505

由 Jaegeuk Kim 提交于 8月 20, 2015

__GFP_NOFAIL can avoid retrying the whole path of kmem_cache_alloc and
bio_alloc.
And, it also fixes the use cases of GFP_ATOMIC correctly.
Suggested-by: NChao Yu <chao2.yu@samsung.com>
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

80c54505

21 8月, 2015 3 次提交

f2fs: check the node block address of newly allocated nid · 24928634

由 Jaegeuk Kim 提交于 8月 16, 2015

This patch adds a routine which checks the block address of newly allocated nid.
If an nid has already allocated by other thread due to subtle data races, it
will result in filesystem corruption.
So, it needs to check whether its block address was already allocated or not
in prior to nid allocation as the last chance.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

24928634

f2fs: reuse nids more aggressively · 26834466

由 Jaegeuk Kim 提交于 8月 14, 2015

If we can reuse nids as many as possible, we can mitigate producing obsolete
node pages in the page cache.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

26834466

f2fs: shrink free_nids entries · 31696580

由 Chao Yu 提交于 7月 28, 2015

This patch introduces __count_free_nids/try_to_free_nids and registers
them in slab shrinker for shrinking under memory pressure.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

31696580

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功