提交 · da85985c6142decea67ee5ff67eadf3f13103a91 · openeuler / raspberrypi-kernel

23 2月, 2016 28 次提交

f2fs: speed up handling holes in fiemap · da85985c

由 Chao Yu 提交于 1月 26, 2016

This patch makes f2fs_map_blocks supporting returning next potential
page offset which skips hole region in indirect tree of inode, and
use it to speed up fiemap in handling big hole case.

Test method:
xfs_io -f /mnt/f2fs/file  -c "pwrite 1099511627776 4096"
time xfs_io -f /mnt/f2fs/file -c "fiemap -v"

Before:
time xfs_io -f /mnt/f2fs/file -c "fiemap -v"
/mnt/f2fs/file:
 EXT: FILE-OFFSET              BLOCK-RANGE      TOTAL FLAGS
   0: [0..2147483647]:         hole             2147483648
   1: [2147483648..2147483655]: 81920..81927         8   0x1

real    3m3.518s
user    0m0.000s
sys     3m3.456s

After:
time xfs_io -f /mnt/f2fs/file -c "fiemap -v"
/mnt/f2fs/file:
 EXT: FILE-OFFSET              BLOCK-RANGE      TOTAL FLAGS
   0: [0..2147483647]:         hole             2147483648
   1: [2147483648..2147483655]: 81920..81927         8   0x1

real    0m0.008s
user    0m0.000s
sys     0m0.008s
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

da85985c

f2fs: introduce get_next_page_offset to speed up SEEK_DATA · 3cf45747

由 Chao Yu 提交于 1月 26, 2016

When seeking data in ->llseek, if we encounter a big hole which covers
several dnode pages, we will try to seek data from index of page which
is the first page of next dnode page, at most we could skip searching
(ADDRS_PER_BLOCK - 1) pages.

However it's still not efficient, because if our indirect/double-indirect
pointer are NULL, there are no dnode page locate in the tree indirect/
double-indirect pointer point to, it's not necessary to search the whole
region.

This patch introduces get_next_page_offset to calculate next page offset
based on current searching level and max searching level returned from
get_dnode_of_data, with this, we could skip searching the entire area
indirect or double-indirect node block is not exist.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

3cf45747

f2fs: remove unneeded pointer conversion · 81ca7350

由 Chao Yu 提交于 1月 26, 2016

There are redundant pointer conversion in following call stack:
 - at position a, inode was been converted to f2fs_file_info.
 - at position b, f2fs_file_info was been converted to inode again.

 - truncate_blocks(inode,..)
  - fi = F2FS_I(inode)		---a
  - ADDRS_PER_PAGE(node_page, fi)
   - addrs_per_inode(fi)
    - inode = &fi->vfs_inode	---b
    - f2fs_has_inline_xattr(inode)
     - fi = F2FS_I(inode)
     - is_inode_flag_set(fi,..)

In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and
addrs_per_inode to acept parameter with type of inode pointer.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

81ca7350

f2fs: simplify __allocate_data_blocks · 5b8db7fa

由 Chao Yu 提交于 1月 26, 2016

This patch uses existing function f2fs_map_block to simplify implementation
of __allocate_data_blocks.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

5b8db7fa

f2fs: simplify f2fs_map_blocks · 4fe71e88

由 Chao Yu 提交于 1月 26, 2016

In f2fs_map_blocks, we use duplicated codes to handle first block mapping
and the following blocks mapping, it's unnecessary. This patch simplifies
f2fs_map_blocks to avoid using copied codes.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4fe71e88

f2fs: introduce lifetime write IO statistics · 8f1dbbbb

由 Shuoran Liu 提交于 1月 27, 2016

This patch introduces lifetime IO write statistics exposed to the sysfs interface.
The write IO amount is obtained from block layer, accumulated in the file system and
stored in the hot node summary of checkpoint.
Signed-off-by: NShuoran Liu <liushuoran@huawei.com>
Signed-off-by: NPengyang Hou <houpengyang@huawei.com>
[Jaegeuk Kim: add sysfs documentation]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

8f1dbbbb

f2fs: give scheduling point in shrinking path · 6fe2bc95

由 Jaegeuk Kim 提交于 1月 20, 2016

It needs to give a chance to be rescheduled while shrinking slab entries.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

6fe2bc95

f2fs: improve shrink performance of extent nodes · 201ef5e0

由 Hou Pengyang 提交于 1月 26, 2016

On the worst case, we need to scan the whole radix tree and every rb-tree to
free the victimed extent_nodes when shrinking.

Pengyang initially introduced a victim_list to record the victimed extent_nodes,
and free these extent_nodes by just scanning a list.

Later, Chao Yu enhances the original patch to improve memory footprint by
removing victim list.

The policy of lru list shrinking becomes:
1) lock lru list's lock
2) trylock extent tree's lock
3) remove extent node from lru list
4) unlock lru list's lock
5) do shrink
6) repeat 1) to 5)
Signed-off-by: NHou Pengyang <houpengyang@huawei.com>
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

201ef5e0

f2fs: don't set cached_en if it will be freed · 42926744

由 Jaegeuk Kim 提交于 1月 26, 2016

If en has empty list pointer, it will be freed sooner, so we don't need to
set cached_en with it.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

42926744

f2fs: move extent_node list operations being coupled with rbtree operation · 43a2fa18

由 Jaegeuk Kim 提交于 1月 26, 2016

This patch moves extent_node list operations to be handled together with
its rbtree operations.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

43a2fa18

f2fs: reconstruct the code to free an extent_node · a03f01f2

由 Hou Pengyang 提交于 1月 26, 2016

There are three steps to free an extent node:
1) list_del_init, 2)__detach_extent_node, 3) kmem_cache_free

In path f2fs_destroy_extent_tree, 1->2->3 to free a node,
But in path f2fs_update_extent_tree_range, it is 2->1->3.

This patch makes all the order to be: 1->2->3
It makes sense, since in the next patch, we import a victim list in the
path shrink_extent_tree, we could check if the extent_node is in the victim
list by checking the list_empty(). So it is necessary to put 1) first.
Signed-off-by: NHou Pengyang <houpengyang@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

a03f01f2

f2fs: use wq_has_sleeper for cp_wait wait_queue · 7c506896

由 Jaegeuk Kim 提交于 1月 26, 2016

We need to use wq_has_sleeper including smp_mb to consider cp_wait concurrency.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7c506896

f2fs: avoid unnecessary search while finding victim in gc · 688159b6

由 Fan Li 提交于 2月 03, 2016

variable nsearched in get_victim_by_default() indicates the number of
dirty segments we already checked. There are 2 problems about the way
it updates:
1. When p.ofs_unit is greater than 1, the victim we find consists
   of multiple segments, possibly more than 1 dirty segment.
   But nsearched always increases by 1.
2. If segments have been found but not been chosen, nsearched won't
   increase. So even we have checked all dirty segments, nsearched
   may still less than p.max_search.
All these problems could cause unnecessary search after all dirty
segments have already been checked.
Signed-off-by: NFan li <fanofcode.li@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

688159b6

f2fs: delete unnecessary wait for page writeback · 85ead818

由 Yunlei He 提交于 2月 03, 2016

no need to wait inline file page writeback for no one
use it, so this patch delete unnecessary wait.
Signed-off-by: NYunlei He <heyunlei@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

85ead818

f2fs: use wait_for_stable_page to avoid contention · fec1d657

由 Jaegeuk Kim 提交于 1月 20, 2016

In write_begin, if storage supports stable_page, we don't need to wait for
writeback to update its contents.
This patch introduces to use wait_for_stable_page instead of
wait_on_page_writeback.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

fec1d657

f2fs: enhance foreground GC · 718e53fa

由 Chao Yu 提交于 1月 23, 2016

If we configure section consist of multiple segments, foreground GC will
do the garbage collection with following approach:

	for each segment in victim section
		blk_start_plug
		for each valid block in segment
			write out by OPU method
		submit bio cache   <---
		blk_finish_plug   <---

There are two issue:
1) for most of the time, 'submit bio cache' will break the merging in
current bio buffer from writes of next segments, making a smaller bio
submitting.
2) block plug only cover IO submitting in one segment, which reduce
opportunity of merging IOs in plug with multiple segments.

So refactor the code as below structure to strive for biggest
opportunity of merging IOs:

	blk_start_plug
	for each segment in victim section
		for each valid block in segment
			write out by OPU method
	submit bio cache
	blk_finish_plug

Test method:
1. mkfs.f2fs -s 8 /dev/sdX
2. touch 32 files
3. write 2M data into each file
4. punch 1.5M data from offset 0 for each file
5. trigger foreground gc through ioctl

Before patch, there are totoally 40 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65776, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66016, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66256, size = 122880
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 66496, size = 32768
----repeat for 8 times

After patch, there are totally 35 bios submitted.
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 65536, size = 122880
----repeat 34 times
f2fs_submit_write_bio: dev = (8,32), WRITE_SYNC, DATA, sector = 73696, size = 16384
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

718e53fa

f2fs: don't need to call set_page_dirty for io error · e3ef1876

由 Jaegeuk Kim 提交于 1月 25, 2016

If end_io gets an error, we don't need to set the page as dirty, since we
already set f2fs_stop_checkpoint which will not flush any data.

This will resolve the following warning.

======================================================
[ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
4.4.0+ #9 Tainted: G           O
------------------------------------------------------
xfs_io/26773 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 (&(&sbi->inode_lock[i])->rlock){+.+...}, at: [<ffffffffc025483f>] update_dirty_page+0x6f/0xd0 [f2fs]

and this task is already holding:
 (&(&q->__queue_lock)->rlock){-.-.-.}, at: [<ffffffff81396ea2>] blk_queue_bio+0x422/0x490
which would create a new lock dependency:
 (&(&q->__queue_lock)->rlock){-.-.-.} -> (&(&sbi->inode_lock[i])->rlock){+.+...}
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e3ef1876

f2fs: avoid needless sync_inode_page when reading inline_data · ae96e7bd

由 Jaegeuk Kim 提交于 1月 25, 2016

In write_begin, if there is an inline_data, f2fs loads it into 0'th data page.
Since it's the read path, we don't need to sync its inode page.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

ae96e7bd

f2fs: don't need to sync node page at every time · 52f80337

由 Jaegeuk Kim 提交于 1月 25, 2016

In write_end, we don't need to sync inode page at every time.
Instead, we can expect f2fs_write_inode will update later.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

52f80337

f2fs: avoid multiple node page writes due to inline_data · 2049d4fc

由 Jaegeuk Kim 提交于 1月 25, 2016

The sceanrio is:
1. create fully node blocks
2. flush node blocks
3. write inline_data for all the node blocks again
4. flush node blocks redundantly

So, this patch tries to flush inline_data when flushing node blocks.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2049d4fc

f2fs: do f2fs_balance_fs when block is allocated · 3c082b7b

由 Jaegeuk Kim 提交于 1月 23, 2016

We should consider data block allocation to trigger f2fs_balance_fs.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

3c082b7b

f2fs: fix to overcome inline_data floods · 6e17bfbc

由 Jaegeuk Kim 提交于 1月 23, 2016

The scenario is:
1. create lots of node blocks
2. sync
3. write lots of inline_data
-> got panic due to no free space

In that case, we should flush node blocks when writing inline_data in #3,
and trigger gc as well.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

6e17bfbc

f2fs: use writepages->lock for WB_SYNC_ALL · 25c13551

由 Jaegeuk Kim 提交于 1月 20, 2016

If there are many writepages calls by multiple threads in background, we don't
need to serialize to merge all the bios, since it's background.
In such the case, it'd better to run writepages concurrently.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

25c13551

f2fs: remove needless condition check · b483fadf

由 Jaegeuk Kim 提交于 1月 20, 2016

This patch removes needless condition variable.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b483fadf

f2fs: correct search area in get_new_segment · 0ab14356

由 Chao Yu 提交于 1月 22, 2016

get_new_segment starts from current segment position, tries to search a
free segment among its right neighbors locate in same section.

But previously our search area was set as [current segment, max segment],
which means we have to search to more bits in free_segmap bitmap for some
worse cases. So here we correct the search area to [current segment, last
segment in section] to avoid unnecessary searching.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0ab14356

f2fs: export dirty_nats_ratio in sysfs · 2304cb0c

由 Chao Yu 提交于 1月 18, 2016

This patch exports a new sysfs entry 'dirty_nat_ratio' to control threshold
of dirty nat entries, if current ratio exceeds configured threshold,
checkpoint will be triggered in f2fs_balance_fs_bg for flushing dirty nats.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2304cb0c

f2fs: flush dirty nat entries when exceeding threshold · 7d768d2c

由 Chao Yu 提交于 1月 18, 2016

When testing f2fs with xfstest, generic/251 is stuck for long time,
the case uses below serials to obtain fresh released space in device,
in order to prepare for following fstrim test.

1. rm -rf /mnt/dir
2. mkdir /mnt/dir/
3. cp -axT `pwd`/ /mnt/dir/
4. goto 1

During preparing step, all nat entries will be cached in nat cache,
most of them are dirty entries with invalid blkaddr, which means
nodes related to these entries have been truncated, and they could
be reused after the dirty entries been checkpointed.

However, there was no checkpoint been triggered, so nid allocators
(e.g. mkdir, creat) will run into long journey of iterating all NAT
pages, looking for free nids in alloc_nid->build_free_nids.

Here, in f2fs_balance_fs_bg we give another chance to do checkpoint
to flush nat entries for reusing them in free nid cache when dirty
entry count exceeds 10% of max count.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7d768d2c

f2fs: relocate is_merged_page · 0fd785eb

由 Chao Yu 提交于 1月 18, 2016

Operations in is_merged_page is related to inner bio cache, move it to
data.c.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0fd785eb

23 1月, 2016 1 次提交

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

15 1月, 2016 1 次提交

kmemcg: account certain kmem allocations to memcg · 5d097056

由 Vladimir Davydov 提交于 1月 14, 2016

Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg.  For the list, see below:

 - threadinfo
 - task_struct
 - task_delay_info
 - pid
 - cred
 - mm_struct
 - vm_area_struct and vm_region (nommu)
 - anon_vma and anon_vma_chain
 - signal_struct
 - sighand_struct
 - fs_struct
 - files_struct
 - fdtable and fdtable->full_fds_bits
 - dentry and external_name
 - inode for all filesystems. This is the most tedious part, because
   most filesystems overwrite the alloc_inode method.

The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds.  Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5d097056

12 1月, 2016 5 次提交

f2fs: should unset atomic flag after successful commit · 447135a8

由 Jaegeuk Kim 提交于 1月 09, 2016

If there is an error during commit, we should keep the flag in order to
abort it.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

447135a8

f2fs: fix wrong memory condition check · 1663cae4

由 Jaegeuk Kim 提交于 1月 09, 2016

This patch fixes wrong decision for avaliable_free_memory.
The return valus is already set as false, so we should consider true condition
below only.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

1663cae4

f2fs: monitor the number of background checkpoint · 42190d2a

由 Jaegeuk Kim 提交于 1月 09, 2016

This patch adds to show the number of background checkpoint.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

42190d2a

f2fs: detect idle time depending on user behavior · d0239e1b

由 Jaegeuk Kim 提交于 1月 08, 2016

This patch adds last time that user requested filesystem operations.
This information is used to detect whether system is idle or not later.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d0239e1b

f2fs: introduce time and interval facility · 6beceb54

由 Jaegeuk Kim 提交于 1月 08, 2016

This patch adds time and interval arrays to store some timing variables.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

6beceb54

09 1月, 2016 5 次提交

f2fs: skip releasing nodes in chindless extent tree · 9b72a388

由 Chao Yu 提交于 1月 08, 2016

If there are no nodes in extent tree, let's skip releasing step to avoid
any overhead of grabbing/releasing extent tree lock.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

9b72a388

f2fs: use atomic type for node count in extent tree · 68e35385

由 Chao Yu 提交于 1月 08, 2016

1. rename field in struct extent_tree from count to node_cnt for
   readability.
2. alter to use atomic type for node_cnt.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

68e35385

f2fs: recognize encrypted data in f2fs_fiemap · da5af127

由 Chao Yu 提交于 1月 08, 2016

This patch fixes to teach f2fs_fiemap to recognize encrypted data.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

da5af127

f2fs: clean up f2fs_balance_fs · 2c4db1a6

由 Jaegeuk Kim 提交于 1月 07, 2016

This patch adds one parameter to clean up all the callers of f2fs_balance_fs.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2c4db1a6

f2fs: remove redundant calls · 2a4b8e9f

由 Jaegeuk Kim 提交于 1月 07, 2016

This patch removes redundant calls.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2a4b8e9f