提交 · 1ecc0c5c50ce8834f7e35b63be7480bf1aaa4155 · openanolis / cloud-kernel

01 10月, 2016 2 次提交

f2fs: support configuring fault injection per superblock · 1ecc0c5c

由 Chao Yu 提交于 9月 23, 2016

Previously, we only support global fault injection configuration, so that
when we configure type/rate of fault injection through sysfs, mount
option, it will influence all f2fs partition which is being used.

It is not make sence, since it will be not convenient if developer want
to test separated partitions with different fault injection rate/type
simultaneously, also it's not possible to enable fault injection in one
partition and disable fault injection in other one.

>From now on, we move global configuration of fault injection in module
into per-superblock, hence injection testing can be more flexible.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

1ecc0c5c

f2fs: add customized migrate_page callback · 5b7a487c

由 Weichao Guo 提交于 9月 20, 2016

This patch improves the migration of dirty pages and allows migrating atomic
written pages that F2FS uses in Page Cache. Instead of the fallback releasing
page path, it provides better performance for memory compaction, CMA and other
users of memory page migrating. For dirty pages, there is no need to write back
first when migrating. For an atomic written page before committing, we can
migrate the page and update the related 'inmem_pages' list at the same time.
Signed-off-by: NWeichao Guo <guoweichao@huawei.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix some coding style]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

5b7a487c

14 9月, 2016 1 次提交

f2fs: avoid ENOMEM during roll-forward recovery · e8ea9b3d

由 Jaegeuk Kim 提交于 9月 09, 2016

This patch gives another chances during roll-forward recovery regarding to
-ENOMEM.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e8ea9b3d

30 8月, 2016 1 次提交

f2fs: remove redundant judgement condition in available_free_memory · 5f8eaf1f

由 Chao Yu 提交于 8月 21, 2016

In available_free_memory, there are two same judgement conditions which
is used for checking NAT excess, remove one of them.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

5f8eaf1f

19 8月, 2016 1 次提交

Revert "f2fs: use percpu_rw_semaphore" · b873b798

由 Jaegeuk Kim 提交于 8月 04, 2016

LKP reported -36.3% regression of fsmark.files_per_sec due to this patch.
I've confirmed that fxmark [1] has also slight regression for DWAL.

[1] https://github.com/sslab-gatech/fxmark

This reverts commit ec795418.

b873b798

21 7月, 2016 1 次提交

block: get rid of bio_rw and READA · 70246286

由 Christoph Hellwig 提交于 7月 19, 2016

These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces.  For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense.  Any check for READA is replaced with an
explicit check for REQ_RAHEAD.  Also remove the READA alias for
REQ_RAHEAD.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

70246286

16 7月, 2016 2 次提交

f2fs: use blk_plug in all the possible paths · 9dfa1baf

由 Jaegeuk Kim 提交于 7月 13, 2016

This patch reverts 19a5f5e2 (f2fs: drop any block plugging),
and adds blk_plug in write paths additionally.

The main reason is that blk_start_plug can be used to wake up from low-power
mode before submitting further bios.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

9dfa1baf

f2fs: refactor __exchange_data_block for speed up · 0a2aa8fb

由 Jaegeuk Kim 提交于 7月 08, 2016

This reduces the elapsed time to do xfstests/generic/017.

Before: 715 s
After:  458 s
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0a2aa8fb

09 7月, 2016 5 次提交

f2fs: use percpu_rw_semaphore · ec795418

由 Jaegeuk Kim 提交于 6月 30, 2016

This patch replaces rw_semaphore with percpu_rw_semaphore for:
sbi->cp_rwsem
nm_i->nat_tree_lock
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

ec795418

J
f2fs: skip to check the block address of node page · 3bdad3c7
由 Jaegeuk Kim 提交于 6月 30, 2016
```
If the node page is up-to-date, it should be alive.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
```
3bdad3c7

f2fs: call SetPageUptodate if needed · 237c0790

由 Jaegeuk Kim 提交于 6月 30, 2016

SetPageUptodate() issues memory barrier, resulting in performance degrdation.
Let's avoid that.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

237c0790

f2fs: introduce f2fs_set_page_dirty_nobuffer · fe76b796

由 Jaegeuk Kim 提交于 6月 30, 2016

This patch adds f2fs_set_page_dirty_nobuffer() copied from __set_page_dirty_buffer.
When appending 4KB blocks in f2fs on pmem with multiple cores, this improves the
overall performance.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

fe76b796

f2fs: fix to detect truncation prior rather than EIO during read · 1563ac75

由 Chao Yu 提交于 7月 03, 2016

In procedure of synchonized read, after sending out the read request, reader
will try to lock the page for waiting device to finish the read jobs and
unlock the page, but meanwhile, truncater will race with reader, so after
reader get lock of the page, it should check page's mapping to detect
whether someone has truncated the page in advance, then reader has the
chance to do the retry if truncation was done, otherwise read can be failed
due to previous condition check.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

1563ac75

07 7月, 2016 1 次提交

f2fs: produce more nids and reduce readahead nats · ad4edb83

由 Jaegeuk Kim 提交于 6月 16, 2016

The readahead nat pages are more likely to be reclaimed quickly, so it'd better
to gather more free nids in advance.

And, let's keep some free nids as much as possible.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

ad4edb83

08 6月, 2016 2 次提交

f2fs: use bio op accessors · 04d328de

由 Mike Christie 提交于 6月 05, 2016

Separate the op from the rq_flag_bits and have f2fs
set/get the bio using bio_set_op_attrs/bio_op.
Signed-off-by: NMike Christie <mchristi@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

04d328de

f2fs: control not to exceed # of cached nat entries · e589c2c4

由 Jaegeuk Kim 提交于 6月 02, 2016

This is to avoid cache entry management overhead including radix tree.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e589c2c4

03 6月, 2016 6 次提交

f2fs: return the errno to the caller to avoid using a wrong page · 0c9df7fb

由 Yunlong Song 提交于 5月 26, 2016

Commit aaf96075 ("f2fs: check node page
contents all the time") pointed out that "sometimes it was reported that
its contents was missing", so it checks the page's mapping and contents.
When "nid != nid_of_node(page)", ERR_PTR(-EIO) will be returned to the
caller. However, commit e1c51b9f ("f2fs:
clean up node page updating flow") moves "nid != nid_of_node(page)" test
to "f2fs_bug_on(sbi, nid != nid_of_node(page))", this will return a
wrong page to the caller when F2FS_CHECK_FS is off when "sometimes it
was reported that its contents was missing" happens.

This patch restores to check node page contents all the time, and
returns the errno to make the caller known something is wrong and avoid
to use the page. This patch also moves f2fs_bug_on to its proper location.
Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0c9df7fb

f2fs: avoid unnecessary updating inode during fsync · 26de9b11

由 Jaegeuk Kim 提交于 5月 20, 2016

If roll-forward recovery can recover i_size, we don't need to update inode's
metadata during fsync.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

26de9b11

f2fs: remove syncing inode page in all the cases · ee6d182f

由 Jaegeuk Kim 提交于 5月 20, 2016

This patch reduces to call them across the whole tree.
- sync_inode_page()
- update_inode_page()
- update_inode()
- f2fs_write_inode()

Instead, checkpoint will flush all the dirty inode metadata before syncing
node pages.
Note that, this is doable, since we call mark_inode_dirty_sync() for all
inode's field change which needs to update on-disk inode as well.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

ee6d182f

f2fs: flush inode metadata when checkpoint is doing · 0f18b462

由 Jaegeuk Kim 提交于 5月 20, 2016

This patch registers all the inodes which have dirty metadata to sync when
checkpoint is doing.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0f18b462

f2fs: call mark_inode_dirty_sync for i_field changes · 205b9822

由 Jaegeuk Kim 提交于 5月 20, 2016

This patch calls mark_inode_dirty_sync() for the following on-disk inode
changes.

 -> largest
 -> ctime/mtime/atime
 -> i_current_depth
 -> i_xattr_nid
 -> i_pino
 -> i_advise
 -> i_flags
 -> i_mode
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

205b9822

f2fs: use inode pointer for {set, clear}_inode_flag · 91942321

由 Jaegeuk Kim 提交于 5月 20, 2016

This patch refactors to use inode pointer for set_inode_flag and
clear_inode_flag.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

91942321

21 5月, 2016 1 次提交

f2fs: fix to update dirty page count correctly · 0f3311a8

由 Chao Yu 提交于 5月 21, 2016

Once we failed to merge inline data into inode page during flushing inline
inode, we will skip invoking inode_dec_dirty_pages, which makes dirty page
count incorrect, result in panic in ->evict_inode, Fix it.

------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:336!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 10004 Comm: umount Tainted: G           O    4.6.0-rc5+ #17
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: f0c33000 ti: c5212000 task.ti: c5212000
EIP: 0060:[<f89aacb5>] EFLAGS: 00010202 CPU: 3
EIP is at f2fs_evict_inode+0x85/0x490 [f2fs]
EAX: 00000001 EBX: c4529ea0 ECX: 00000001 EDX: 00000000
ESI: c0131000 EDI: f89dd0a0 EBP: c5213e9c ESP: c5213e78
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: b75878c0 CR3: 1a36a700 CR4: 000406f0
Stack:
 c4529ea0 c4529ef4 c5213e8c c176d45c c4529ef4 00000000 c4529ea0 c4529fac
 f89dd0a0 c5213eb0 c1204a68 c5213ed8 c452a2b4 c6680930 c5213ec0 c1204b64
 c6680d44 c6680620 c5213eec c120588d ee84b000 ee84b5c0 c5214000 ee84b5e0
Call Trace:
 [<c176d45c>] ? _raw_spin_unlock+0x2c/0x50
 [<c1204a68>] evict+0xa8/0x170
 [<c1204b64>] dispose_list+0x34/0x50
 [<c120588d>] evict_inodes+0x10d/0x130
 [<c11ea941>] generic_shutdown_super+0x41/0xe0
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c1185190>] ? unregister_shrinker+0x40/0x50
 [<c11eac52>] kill_block_super+0x22/0x70
 [<f89af23e>] kill_f2fs_super+0x1e/0x20 [f2fs]
 [<c11eae1d>] deactivate_locked_super+0x3d/0x70
 [<c11eb383>] deactivate_super+0x43/0x60
 [<c1208ec9>] cleanup_mnt+0x39/0x80
 [<c1208f50>] __cleanup_mnt+0x10/0x20
 [<c107d091>] task_work_run+0x71/0x90
 [<c105725a>] exit_to_usermode_loop+0x72/0x9e
 [<c1001c7c>] do_fast_syscall_32+0x19c/0x1c0
 [<c176dd48>] sysenter_past_esp+0x45/0x74
EIP: [<f89aacb5>] f2fs_evict_inode+0x85/0x490 [f2fs] SS:ESP 0068:c5213e78
---[ end trace d30536330b7fdc58 ]---
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0f3311a8

08 5月, 2016 4 次提交

f2fs: read node blocks ahead when truncating blocks · 79344efb

由 Jaegeuk Kim 提交于 5月 06, 2016

This patch enables reading node blocks in advance when truncating large
data blocks.

 > time rm $MNT/testfile (500GB) after drop_cachees
Before : 9.422 s
After  : 4.821 s
Reported-by: NStephen Bates <stephen.bates@microsemi.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

79344efb

f2fs: remove an obsolete variable · fb58ae22

由 Jaegeuk Kim 提交于 5月 04, 2016

This patch removes an obsolete variable used in add_free_nid.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

fb58ae22

f2fs: inject ENOSPC failures · cb78942b

由 Jaegeuk Kim 提交于 4月 29, 2016

This patch injects ENOSPC failures.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

cb78942b

J
f2fs: use f2fs_grab_cache_page instead of grab_cache_page · 300e129c
由 Jaegeuk Kim 提交于 4月 29, 2016
```
This patch converts grab_cache_page to f2fs_grab_cache_page.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
```
300e129c

28 4月, 2016 1 次提交

f2fs: move node pages only in victim section during GC · da011cc0

由 Chao Yu 提交于 4月 27, 2016

For foreground GC, we cache node blocks in victim section and set them
dirty, then we call sync_node_pages to flush these node pages, but
meanwhile, those node pages which does not locate in victim section
will be flushed together, so more bandwidth and continuous free space
would be occupied.

So for this condition, it's better to leave those unrelated node page
in cache for further write hit, and let CP or VM to flush them afterward.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

da011cc0

27 4月, 2016 4 次提交

f2fs: set fsync mark only for the last dnode · 608514de

由 Jaegeuk Kim 提交于 4月 15, 2016

In order to give atomic writes, we should consider power failure during
sync_node_pages in fsync.
So, this patch marks fsync flag only in the last dnode block.
Acked-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

608514de

f2fs: report unwritten status in fsync_node_pages · c267ec15

由 Jaegeuk Kim 提交于 4月 15, 2016

The fsync_node_pages should return pass or failure so that user could know
fsync is completed or not.
Acked-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c267ec15

f2fs: split sync_node_pages with fsync_node_pages · 52681375

由 Jaegeuk Kim 提交于 4月 13, 2016

This patch splits the existing sync_node_pages into (f)sync_node_pages.
The fsync_node_pages is used for f2fs_sync_file only.
Acked-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

52681375

f2fs: avoid needless lock for node pages when fsyncing a file · eca76e78

由 Jaegeuk Kim 提交于 4月 13, 2016

When fsync is called, sync_node_pages finds a proper direct node pages to flush.
But, it locks unrelated direct node pages together unnecessarily.
Acked-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

eca76e78

15 4月, 2016 2 次提交

f2fs: add BUG_ON to avoid unnecessary flow · ff373558

由 Jaegeuk Kim 提交于 3月 29, 2016

This patch adds BUG_ON instead of retrying loop.
In the case of node pages, we already got this inode page, but unlocked it.
By the fact that we don't truncate any node pages in operations, the page's
mapping should be unchangeable.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

ff373558

f2fs: use PGP_LOCK to check its truncation · 4a6de50d

由 Jaegeuk Kim 提交于 3月 30, 2016

Previously, after trylock_page is succeeded, it doesn't check its mapping.
In order to fix that, we can just give PGP_LOCK to pagecache_get_page.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4a6de50d

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

18 3月, 2016 3 次提交

f2fs: submit node page write bios when really required · 12bb0a8f

由 Jaegeuk Kim 提交于 3月 11, 2016

If many threads calls fsync with data writes, we don't need to flush every
bios having node page writes.
The f2fs_wait_on_page_writeback will flush its bios when the page is really
needed.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

12bb0a8f

J
f2fs: declare static functions · 17a0ee55
由 Jaegeuk Kim 提交于 3月 08, 2016
```
Just to avoid sparse warnings.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
```
17a0ee55

f2fs: modify the readahead method in ra_node_page() · 999270de

由 Fan Li 提交于 2月 29, 2016

ra_node_page() is used to read ahead one node page. Comparing to regular
read, it's faster because it doesn't wait for IO completion.
But if it is called twice for reading the same block, and the IO request
from the first call hasn't been completed before the second call, the second
call will have to wait until the read is over.

Here use the code in __do_page_cache_readahead() to solve this problem.
It does nothing when someone else already puts the page in mapping. The
status of page should be assured by whoever puts it there.
This implement also prevents alteration of page reference count.
Signed-off-by: NFan li <fanofcode.li@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

999270de

27 2月, 2016 1 次提交

f2fs: fix to avoid deadlock when merging inline data · 19c7377b

由 Chao Yu 提交于 2月 26, 2016

When testing with fsstress, kworker and user threads were both blocked:

INFO: task kworker/u16:1:16580 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u16:1   D ffff8803f2595390     0 16580      2 0x00000000
Workqueue: writeback bdi_writeback_workfn (flush-251:0)
 ffff8802730e5760 0000000000000046 ffff880274729fc0 0000000000012440
 ffff8802730e5fd8 ffff8802730e4010 0000000000012440 0000000000012440
 ffff8802730e5fd8 0000000000012440 ffff880274729fc0 ffff88026eb50000
Call Trace:
 [<ffffffff816fe9d9>] schedule+0x29/0x70
 [<ffffffff816ff895>] rwsem_down_read_failed+0xa5/0xf9
 [<ffffffff81378584>] call_rwsem_down_read_failed+0x14/0x30
 [<ffffffffa0694feb>] f2fs_write_data_page+0x31b/0x420 [f2fs]
 [<ffffffffa0690f1a>] __f2fs_writepage+0x1a/0x50 [f2fs]
 [<ffffffffa06922a0>] f2fs_write_data_pages+0xe0/0x290 [f2fs]
 [<ffffffff811473b3>] do_writepages+0x23/0x40
 [<ffffffff811cc3ee>] __writeback_single_inode+0x4e/0x250
 [<ffffffff811cd4f1>] writeback_sb_inodes+0x2c1/0x470
 [<ffffffff811cd73e>] __writeback_inodes_wb+0x9e/0xd0
 [<ffffffff811cda0b>] wb_writeback+0x1fb/0x2d0
 [<ffffffff811cdb7c>] wb_do_writeback+0x9c/0x220
 [<ffffffff811ce232>] bdi_writeback_workfn+0x72/0x1c0
 [<ffffffff8106b74e>] process_one_work+0x1de/0x5b0
 [<ffffffff8106e78f>] worker_thread+0x11f/0x3e0
 [<ffffffff810750ce>] kthread+0xde/0xf0
 [<ffffffff817093f8>] ret_from_fork+0x58/0x90

fsstress thread stack:
 [<ffffffff81139f0e>] sleep_on_page+0xe/0x20
 [<ffffffff81139ef7>] __lock_page+0x67/0x70
 [<ffffffff8113b100>] find_lock_page+0x50/0x80
 [<ffffffff8113b24f>] find_or_create_page+0x3f/0xb0
 [<ffffffffa06983a9>] sync_node_pages+0x259/0x810 [f2fs]
 [<ffffffffa068d874>] write_checkpoint+0x1a4/0xce0 [f2fs]
 [<ffffffffa0686b0c>] f2fs_sync_fs+0x7c/0xd0 [f2fs]
 [<ffffffffa067c813>] f2fs_sync_file+0x143/0x5f0 [f2fs]
 [<ffffffff811d301b>] vfs_fsync_range+0x2b/0x40
 [<ffffffff811d304c>] vfs_fsync+0x1c/0x20
 [<ffffffff811d3291>] do_fsync+0x41/0x70
 [<ffffffff811d32d3>] SyS_fdatasync+0x13/0x20
 [<ffffffff817094a2>] system_call_fastpath+0x16/0x1b
 [<ffffffffffffffff>] 0xffffffffffffffff

The reason of this issue is:
CPU0:					CPU1:
 - f2fs_write_data_pages
					 - f2fs_sync_fs
					  - write_checkpoint
					   - block_operations
					    - f2fs_lock_all
					     - down_write(sbi->cp_rwsem)
  - lock_page(page)
  - f2fs_write_data_page
					    - sync_node_pages
					     - flush_inline_data
					      - pagecache_get_page(page, GFP_LOCK)
   - f2fs_lock_op
    - down_read(sbi->cp_rwsem)

This patch alters to use trylock_page in flush_inline_data to fix this ABBA
deadlock issue.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

19c7377b

26 2月, 2016 1 次提交

f2fs: fix incorrect upper bound when iterating inode mapping tree · 80dd9c0e

由 Chao Yu 提交于 2月 24, 2016

1. Inode mapping tree can index page in range of [0, ULONG_MAX], however,
in some places, f2fs only search or iterate page in ragne of [0, LONG_MAX],
result in miss hitting in page cache.

2. filemap_fdatawait_range accepts range parameters in unit of bytes, so
the max range it covers should be [0, LLONG_MAX], if we use [0, LONG_MAX]
as range for waiting on writeback, big number of pages will not be covered.

This patch corrects above two issues.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

80dd9c0e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功