提交 · e03b07d9084d03e896b7f1a598a7f6aa18f6eeda · openeuler / raspberrypi-kernel

11 4月, 2015 40 次提交

f2fs: do not recover wrong data index · e03b07d9

由 Jaegeuk Kim 提交于 4月 01, 2015

During the roll-forward recovery, if we found a new data index written fsync
lastly, we need to recover new block address.
But, if that address was corrupted, we should not recover that.
Otherwise, f2fs gets kernel panic from:

 In check_index_in_prev_nodes(),

    sentry = get_seg_entry(sbi, segno);
             --------------------------> out-of-range segno.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e03b07d9

f2fs: do not increase link count during recovery · 418f6c27

由 Jaegeuk Kim 提交于 3月 31, 2015

If there are multiple fsynced dnodes having a dent flag, roll-forward routine
sets FI_INC_LINK for their inode, and recovery_dentry increases its link count
accordingly.
That results in normal file having a link count as 2, so we can't unlink those
files.

This was added to handle several inode blocks having same inode number with
different directory paths.
But, current f2fs doesn't replay all of path changes and only recover its dentry
for the last fsynced inode block.
So, there is no reason to do this.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

418f6c27

f2fs: assign parent's i_mode for empty dir · cb58463b

由 Jaegeuk Kim 提交于 3月 30, 2015

When assigning i_mode for dotdot, it needs to assign parent's i_mode.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

cb58463b

f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries · 510022a8

由 Jaegeuk Kim 提交于 3月 30, 2015

If f2fs was corrupted with missing dot dentries, it needs to recover them after
fsck.f2fs detection.

The underlying precedure is:

1. The fsck.f2fs remains F2FS_INLINE_DOTS flag in directory inode, if it detects
missing dot dentries.

2. When f2fs looks up the corrupted directory, it triggers f2fs_add_link with
proper inode numbers and their dot and dotdot names.

3. Once f2fs recovers the directory without errors, it removes F2FS_INLINE_DOTS
finally.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

510022a8

f2fs: fix mismatching lock and unlock pages for roll-forward recovery · c9ef4810

由 Jaegeuk Kim 提交于 3月 26, 2015

Previously, inode page is not correctly locked and unlocked in pair during
the roll-forward recovery.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c9ef4810

f2fs: fix sparse warnings · adad81ed

由 Jaegeuk Kim 提交于 3月 24, 2015

This patch fixes the below warning.

sparse warnings: (new ones prefixed by >>)

>> fs/f2fs/inode.c:56:23: sparse: restricted __le32 degrades to integer
>> fs/f2fs/inode.c:56:52: sparse: restricted __le32 degrades to integer
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

adad81ed

f2fs: limit b_size of mapped bh in f2fs_map_bh · 1b3e27a9

由 Chao Yu 提交于 3月 24, 2015

Map bh over max size which caller defined is not needed, limit it in
f2fs_map_bh.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

1b3e27a9

f2fs: persist system.advise into on-disk inode · 30c62fdb

由 Chao Yu 提交于 3月 23, 2015

This patch fixes to dirty inode for persisting i_advise of f2fs inode info into
on-disk inode if user sets system.advise through setxattr. Otherwise the new
value will be lost.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

30c62fdb

f2fs: avoid NULL pointer dereference in f2fs_xattr_advise_get · 84e97c27

由 Chao Yu 提交于 3月 23, 2015

We will encounter oops by executing below command.
getfattr -n system.advise /mnt/f2fs/file
Killed

message log:
BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [<f8b54d69>] f2fs_xattr_advise_get+0x29/0x40 [f2fs]
*pdpt = 00000000319b7001 *pde = 0000000000000000
Oops: 0002 [#1] SMP
Modules linked in: f2fs(O) snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq joydev
snd_seq_device snd_timer bnep snd rfcomm microcode bluetooth soundcore i2c_piix4 mac_hid serio_raw parport_pc ppdev lp parport
binfmt_misc hid_generic psmouse usbhid hid e1000 [last unloaded: f2fs]
CPU: 3 PID: 3134 Comm: getfattr Tainted: G           O    4.0.0-rc1 #6
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: f3a71b60 ti: f19a6000 task.ti: f19a6000
EIP: 0060:[<f8b54d69>] EFLAGS: 00010246 CPU: 3
EIP is at f2fs_xattr_advise_get+0x29/0x40 [f2fs]
EAX: 00000000 EBX: f19a7e71 ECX: 00000000 EDX: f8b5b467
ESI: 00000000 EDI: f2008570 EBP: f19a7e14 ESP: f19a7e08
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 00000000 CR3: 319b8000 CR4: 000007f0
Stack:
 f8b5a634 c0cbb580 00000000 f19a7e34 c1193850 00000000 00000007 f19a7e71
 f19a7e64 c0cbb580 c1193810 f19a7e50 c1193c00 00000000 00000000 00000000
 c0cbb580 00000000 f19a7f70 c1194097 00000000 00000000 00000000 74737973
Call Trace:
 [<c1193850>] generic_getxattr+0x40/0x50
 [<c1193810>] ? xattr_resolve_name+0x80/0x80
 [<c1193c00>] vfs_getxattr+0x70/0xa0
 [<c1194097>] getxattr+0x87/0x190
 [<c11801d7>] ? path_lookupat+0x57/0x5f0
 [<c11819d2>] ? putname+0x32/0x50
 [<c116653a>] ? kmem_cache_alloc+0x2a/0x130
 [<c11819d2>] ? putname+0x32/0x50
 [<c11819d2>] ? putname+0x32/0x50
 [<c11819d2>] ? putname+0x32/0x50
 [<c11827f9>] ? user_path_at_empty+0x49/0x70
 [<c118283f>] ? user_path_at+0x1f/0x30
 [<c11941e7>] path_getxattr+0x47/0x80
 [<c11948e7>] SyS_getxattr+0x27/0x30
 [<c163f748>] sysenter_do_call+0x12/0x12
Code: 66 90 55 89 e5 57 56 53 66 66 66 66 90 8b 78 20 89 d3 ba 67 b4 b5 f8 89 d8 89 ce e8 42 7c 7b c8 85 c0 75 16 0f b6 87 44 01 00
00 <88> 06 b8 01 00 00 00 5b 5e 5f 5d c3 8d 76 00 b8 ea ff ff ff eb
EIP: [<f8b54d69>] f2fs_xattr_advise_get+0x29/0x40 [f2fs] SS:ESP 0068:f19a7e08
CR2: 0000000000000000
---[ end trace 860260654f1f416a ]---

The reason is that in getfattr there are two steps which is indicated by strace info:
1) try to lookup and get size of specified xattr.
2) get value of the extented attribute.

strace info:
getxattr("/mnt/f2fs/file", "system.advise", 0x0, 0) = 1
getxattr("/mnt/f2fs/file", "system.advise", "\x00", 256) = 1

For the first step, getfattr may pass a NULL pointer in @value and zero in @size
as parameters for ->getxattr, but we access this @value pointer directly without
checking whether the pointer is valid or not in f2fs_xattr_advise_get, so the
oops occurs.

This patch fixes this issue by verifying @value pointer before using.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

84e97c27

f2fs: preallocate fallocated blocks for direct IO · df6136ef

由 Chao Yu 提交于 3月 23, 2015

Normally, due to DIO_SKIP_HOLES flag is set by default, blockdev_direct_IO in
f2fs_direct_IO tries to skip DIO in holes when writing inside i_size, this
makes us falling back to buffered IO which shows lower performance.

So in commit 59b802e5 ("f2fs: allocate data blocks in advance for
f2fs_direct_IO"), we improve perfromance by allocating data blocks in advance
if we meet holes no matter in i_size or not, since with it we can avoid falling
back to buffered IO.

But we forget to consider for unwritten fallocated block in this commit.
This patch tries to fix it for fallocate case, this helps to improve
performance.

Test result:
Storage info: sandisk ultra 64G micro sd card.

touch /mnt/f2fs/file
truncate -s 67108864 /mnt/f2fs/file
fallocate -o 0 -l 67108864 /mnt/f2fs/file
time dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=64 conv=notrunc oflag=direct

Time before applying the patch:
67108864 bytes (67 MB) copied, 36.16 s, 1.9 MB/s
real    0m36.162s
user    0m0.000s
sys     0m0.180s

Time after applying the patch:
67108864 bytes (67 MB) copied, 27.7776 s, 2.4 MB/s
real    0m27.780s
user    0m0.000s
sys     0m0.036s
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

df6136ef

f2fs: enable inline data by default · 75342797

由 Wanpeng Li 提交于 3月 24, 2015

Enable inline_data feature by default since it brings us better
performance and space utilization and now has already stable.
Add another option noinline_data to disable it during mount.
Suggested-by: NJaegeuk Kim <jaegeuk@kernel.org>
Suggested-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

75342797

f2fs: preserve extent info for extent cache · 0bdee482

由 Chao Yu 提交于 3月 19, 2015

This patch tries to preserve last extent info in extent tree cache into on-disk
inode, so this can help us to reuse the last extent info next time for
performance.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0bdee482

f2fs: initialize extent tree with on-disk extent info of inode · 028a41e8

由 Chao Yu 提交于 3月 19, 2015

With normal extent info cache, we records largest extent mapping between logical
block and physical block into extent info, and we persist extent info in on-disk
inode.

When we enable extent tree cache, if extent info of on-disk inode is exist, and
the extent is not a small fragmented mapping extent. We'd better to load the
extent info into extent tree cache when inode is loaded. By this way we can have
more chance to hit extent tree cache rather than taking more time to read dnode
page for block address.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

028a41e8

f2fs: introduce __{find,grab}_extent_tree · 93dfc526

由 Chao Yu 提交于 3月 19, 2015

This patch introduces __{find,grab}_extent_tree for reusing by following
patches.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

93dfc526

f2fs: split set_data_blkaddr from f2fs_update_extent_cache · 216a620a

由 Chao Yu 提交于 3月 19, 2015

Split __set_data_blkaddr from f2fs_update_extent_cache for readability.

Additionally rename __set_data_blkaddr to set_data_blkaddr for exporting.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

216a620a

f2fs: enable fast symlink by utilizing inline data · 368a0e40

由 Wanpeng Li 提交于 3月 19, 2015

Fast symlink can utilize inline data flow to avoid using any
i_addr region, since we need to handle many cases such as
truncation, roll-forward recovery, and fsck/dump tools.
Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

368a0e40

J
f2fs: add some tracepoints to debug volatile and atomic writes · 8ce67cb0
由 Jaegeuk Kim 提交于 3月 17, 2015
```
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
```
8ce67cb0

f2fs: avoid punch_hole overhead when releasing volatile data · 3c6c2beb

由 Jaegeuk Kim 提交于 3月 17, 2015

This patch is to avoid some punch_hole overhead when releasing volatile data.
If volatile data was not written yet, we just can make the first page as zero.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

3c6c2beb

f2fs: avoid wrong f2fs_bug_on when truncating inline_data · 83e21db6

由 Jaegeuk Kim 提交于 3月 16, 2015

This patch removes wrong f2fs_bug_on in truncate_inline_inode.

When there is no space, it can happen a corner case where i_isze is over
MAX_INLINE_SIZE while its inode is still inline_data.

The scenario is
 1. write small data into file #A.
 2. fill the whole partition to 100%.
 3. truncate 4096 on file #A.
 4. write data at 8192 offset.
  --> f2fs_write_begin
    -> -ENOSPC = f2fs_convert_inline_page
    -> f2fs_write_failed
      -> truncate_blocks
        -> truncate_inline_inode
	  BUG_ON, since i_size is 4096.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

83e21db6

f2fs: enhance multi-threads performance · 78373b73

由 Jaegeuk Kim 提交于 3月 13, 2015

Previously, f2fs_write_data_pages has a mutex, sbi->writepages, to serialize
data writes to maximize write bandwidth, while sacrificing multi-threads
performance.
Practically, however, multi-threads environment is much more important for
users. So this patch tries to remove the mutex.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

78373b73

f2fs: set buffer_new when new blocks are allocated · 3402e87c

由 Jaegeuk Kim 提交于 3月 11, 2015

This patch modifies to call set_buffer_new, if new blocks are allocated.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

3402e87c

f2fs: set SBI_NEED_FSCK when encountering exception in recovery · 2adc3505

由 Chao Yu 提交于 3月 16, 2015

This patch tries to set SBI_NEED_FSCK flag into sbi only when we fail to recover
in fill_super, so we could skip fscking image when we fail to fill super for
other reason.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2adc3505

f2fs: fix to cover sentry_lock for block allocation · 21cb1d99

由 Jaegeuk Kim 提交于 3月 11, 2015

In the following call stack, f2fs changes the bitmap for dirty segments and # of
dirty sentries without grabbing sit_i->sentry_lock.
This can result in mismatch on bitmap and # of dirty sentries, since if there
are some direct_io operations.

In allocate_data_block,
 - __allocate_new_segments
  - mutex_lock(&curseg->curseg_mutex);
  - s_ops->allocate_segment
   - new_curseg/change_curseg
    - reset_curseg
     - __set_sit_entry_type
      - __mark_sit_entry_dirty
       - set_bit(dirty_sentries_bitmap)
       - dirty_sentries++;
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

21cb1d99

f2fs: fix to check current blkaddr in __allocate_data_blocks · d6d4f1cb

由 Chao Yu 提交于 3月 12, 2015

In __allocate_data_blocks, we should check current blkaddr which is located at
ofs_in_node of dnode page instead of checking first blkaddr all the time.
Otherwise we can only allocate one blkaddr in each dnode page. Fix it.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d6d4f1cb

f2fs: fix to truncate inline data past EOF · 0bfcfcca

由 Chao Yu 提交于 3月 10, 2015

Previously if inode is with inline data, we will try to invalid partial inline
data in page #0 when we truncate size of inode in truncate_partial_data_page().
And then we set page #0 to dirty, after this we can synchronize inode page with
page #0 at ->writepage().

But sometimes we will fail to operate page #0 in truncate_partial_data_page()
due to below reason:
a) if offset is zero, we will skip setting page #0 to dirty.
b) if page #0 is not uptodate, we will fail to update it as it has no mapping
data.

So with following operations, we will meet recent data which should be
truncated.

1.write inline data to file
2.sync first data page to inode page
3.truncate file size to 0
4.truncate file size to max_inline_size
5.echo 1 > /proc/sys/vm/drop_caches
6.read file --> meet original inline data which is remained in inode page.

This patch renames truncate_inline_data() to truncate_inline_inode() for code
readability, then use truncate_inline_inode() to truncate inline data in inode
page in truncate_blocks() and truncate page #0 in truncate_partial_data_page()
for fixing.

v2:
 o truncate partially #0 page in truncate_partial_data_page to avoid keeping
   old data in #0 page.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0bfcfcca

f2fs: fix reference leaks in f2fs_acl_create · 83dfe53c

由 Chao Yu 提交于 3月 09, 2015

Our f2fs_acl_create is copied and modified from posix_acl_create to avoid
deadlock bug when inline_dentry feature is enabled.

Now, we got reference leaks in posix_acl_create, and this has been fixed in
commit fed0b588 ("posix_acl: fix reference leaks in posix_acl_create")
by Omar Sandoval.
https://lkml.org/lkml/2015/2/9/5

Let's fix this issue in f2fs_acl_create too.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Reviewed-by: NChangman Lee <cm224.lee@ssamsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

83dfe53c

f2fs: fix to calculate max length of contiguous free slots correctly · bda19076

由 Chao Yu 提交于 3月 09, 2015

When lookuping for creating, we will try to record the level of current dentry
hash table if current dentry has enough contiguous slots for storing name of new
file which will be created later, this can save our lookup time when add a link
into parent dir.

But currently in find_target_dentry, our current length of contiguous free slots
is not calculated correctly. This make us leaving some holes in dentry block
occasionally, it wastes our space of dentry block.

Let's refactor the lookup flow for max slots as following to fix this issue:
a) increase max_len if current slot is free;
b) update max_slots with max_len if max_len is larger than max_slots;
c) reset max_len to zero if current slot is not free.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

bda19076

f2fs: fix unlocked nat set cache operation · 57ed1e95

由 Wanpeng Li 提交于 3月 09, 2015

nm_i->nat_tree_lock is used to sync both the operations of nat entry
cache tree and nat set cache tree, however, it isn't held when flush
nat entries during checkpoint which lead to potential race, this patch
fix it by holding the lock when gang lookup nat set cache and delete
item from nat set cache.
Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

57ed1e95

f2fs: cleanup statement about max orphan inodes calc · e0150392

由 Changman Lee 提交于 3月 09, 2015

Through each macro, we can read the meaning easily.
Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e0150392

f2fs: remove unnecessary condition judgment · d9f46bb1

由 Yuan Zhong 提交于 3月 09, 2015

Remove the unnecessary condition judgment, because
'max_slots' has been initialized to '0' at the beginging
of the function, as following:
if (max_slots)
       *max_slots = 0;
Signed-off-by: NYuan Zhong <yuan.mark.zhong@samsung.com>
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d9f46bb1

f2fs: set the correct place of initializing *res_page · b1f73b79

由 Yuan Zhong 提交于 3月 07, 2015

The function 'find_in_inline_dir()' contain 'res_page'
as an argument. So, we should initiaize 'res_page' before
this function.
Signed-off-by: NYuan Zhong <yuan.mark.zhong@samsung.com>
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b1f73b79

f2fs: reduce searching region of segmap when set free section · 7fd97019

由 Wanpeng Li 提交于 3月 06, 2015

In __set_free we will check whether all segment are free in one section
when free one segment, in order to set section to free status. But the
searching region of segmap is from start segno to last segno of main
area, it's not necessary. So let's just only check all segment bitmap
of target section.
Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7fd97019

f2fs: fix extent cache memory leak · fdf6c8be

由 Wanpeng Li 提交于 3月 06, 2015

extent tree/node slab cache is created during f2fs insmod,
how, it isn't destroyed during f2fs rmmod, this patch fix
it by destroy extent tree/node slab cache once rmmod f2fs.
Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

fdf6c8be

f2fs: relocate Kconfig from misc filesystems · d7196c5a

由 Jaegeuk Kim 提交于 3月 03, 2015

The f2fs has been shipped on many smartphone devices during a couple of years.
So, it is worth to relocate Kconfig into main page from misc filesystems for
developers to choose it more easily.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d7196c5a

f2fs: report -ENOENT for unreached data indices · 76629165

由 Jaegeuk Kim 提交于 3月 02, 2015

If inode has inline_data, it should report -ENOENT when accessing out-of-bound
region.
This is used by f2fs_fiemap which treats -ENOENT with no error.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

76629165

f2fs: clear append/update flags once fsync is done · cff28521

由 Jaegeuk Kim 提交于 3月 02, 2015

When fsync is done through checkpoint, previous f2fs missed to clear append
and update flag. This patch fixes to clear them.

This was originally catched by Changman Lee before.
Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

cff28521

f2fs: avoid to trigger writepage during POR · d5669f7b

由 Jaegeuk Kim 提交于 2月 27, 2015

This patch doesn't make any effect on previous behavior, since
f2fs_write_data_page bypasses writing the page during POR.

But, the difference is that this patch avoids holding writepages mutex.
This is to avoid the following false warning, since this can happen only
when mount and shutdown are triggered at the same time.

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 4.0.0-rc1+ #3 Tainted: G           O
 -------------------------------------------------------
 kworker/u8:0/2270 is trying to acquire lock:
  (&sbi->gc_mutex){+.+.+.}, at: [<ffffffffa02bdd33>] f2fs_balance_fs+0x73/0x90 [f2fs]

 but task is already holding lock:
  (&sbi->writepages){+.+...}, at: [<ffffffffa02b261b>] f2fs_write_data_pages+0xcb/0x3a0 [f2fs]

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #2 (&sbi->writepages){+.+...}:
        [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0
        [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530
        [<ffffffffa02b261b>] f2fs_write_data_pages+0xcb/0x3a0 [f2fs]
        [<ffffffff811c38c1>] do_writepages+0x21/0x50
        [<ffffffff8126c5a6>] __writeback_single_inode+0x76/0xbf0
        [<ffffffff8126e23a>] writeback_single_inode+0xea/0x1c0
        [<ffffffff8126e425>] write_inode_now+0x95/0xa0
        [<ffffffff81259dab>] iput+0x20b/0x3f0
        [<ffffffffa02c1c8b>] recover_data.constprop.14+0x26b/0xa80 [f2fs]
        [<ffffffffa02c2776>] recover_fsync_data+0x2b6/0x5e0 [f2fs]
        [<ffffffffa02a9744>] f2fs_fill_super+0xb24/0xb90 [f2fs]
        [<ffffffff8123d7f4>] mount_bdev+0x1a4/0x1e0
        [<ffffffffa02a3c85>] f2fs_mount+0x15/0x20 [f2fs]
        [<ffffffff8123e159>] mount_fs+0x39/0x180
        [<ffffffff8125e51b>] vfs_kern_mount+0x6b/0x160
        [<ffffffff81261554>] do_mount+0x204/0xbe0
        [<ffffffff8126223b>] SyS_mount+0x8b/0xe0
        [<ffffffff81863e6d>] system_call_fastpath+0x16/0x1b

 -> #1 (&sbi->cp_mutex){+.+...}:
        [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0
        [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530
        [<ffffffffa02acbf2>] write_checkpoint+0x42/0x1230 [f2fs]
        [<ffffffffa02a847d>] f2fs_sync_fs+0x9d/0x2a0 [f2fs]
        [<ffffffff81272f82>] sync_filesystem+0x82/0xb0
        [<ffffffff8123c214>] generic_shutdown_super+0x34/0x100
        [<ffffffff8123c5f7>] kill_block_super+0x27/0x70
        [<ffffffffa02a3c60>] kill_f2fs_super+0x20/0x30 [f2fs]
        [<ffffffff8123ca49>] deactivate_locked_super+0x49/0x80
        [<ffffffff8123d05e>] deactivate_super+0x4e/0x70
        [<ffffffff8125df63>] cleanup_mnt+0x43/0x90
        [<ffffffff8125e002>] __cleanup_mnt+0x12/0x20
        [<ffffffff810a82e4>] task_work_run+0xc4/0xf0
        [<ffffffff8101f0bd>] do_notify_resume+0x8d/0xa0
        [<ffffffff81864141>] int_signal+0x12/0x17

 -> #0 (&sbi->gc_mutex){+.+.+.}:
        [<ffffffff810e2866>] __lock_acquire+0x1ac6/0x1c90
        [<ffffffff810e2b11>] lock_acquire+0xe1/0x2f0
        [<ffffffff8185e1b3>] mutex_lock_nested+0x63/0x530
        [<ffffffffa02bdd33>] f2fs_balance_fs+0x73/0x90 [f2fs]
        [<ffffffffa02b5938>] f2fs_write_data_page+0x348/0x5b0 [f2fs]
        [<ffffffffa02af9da>] __f2fs_writepage+0x1a/0x50 [f2fs]
        [<ffffffff811c1b54>] write_cache_pages+0x274/0x6f0
        [<ffffffffa02b2630>] f2fs_write_data_pages+0xe0/0x3a0 [f2fs]
        [<ffffffff811c38c1>] do_writepages+0x21/0x50
        [<ffffffff8126c5a6>] __writeback_single_inode+0x76/0xbf0
        [<ffffffff8126d44a>] writeback_sb_inodes+0x32a/0x710
        [<ffffffff8126d8cf>] __writeback_inodes_wb+0x9f/0xd0
        [<ffffffff8126dcdb>] wb_writeback+0x3db/0x850
        [<ffffffff8126e848>] bdi_writeback_workfn+0x148/0x980
        [<ffffffff810a3782>] process_one_work+0x1e2/0x840
        [<ffffffff810a3f01>] worker_thread+0x121/0x460
        [<ffffffff810a9dc8>] kthread+0xf8/0x110
        [<ffffffff81863dbc>] ret_from_fork+0x7c/0xb0
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d5669f7b

f2fs: add stat info for moved blocks by background gc · e1235983

由 Changman Lee 提交于 12月 23, 2014

This patch is for looking into gc performance of f2fs in detail.
Signed-off-by: NChangman Lee <cm224.lee@samsung.com>
[Jaegeuk Kim: fix build errors]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e1235983

f2fs: fix to issue small discard in real-time mode discard · b28c3f94

由 Chao Yu 提交于 2月 28, 2015

Now in f2fs, we share functions and structures for batch mode and real-time mode
discard. For real-time mode discard, in shared function add_discard_addrs, we
will use uninitialized trim_minlen in struct cp_control to compare with length
of contiguous free blocks to decide whether skipping discard fragmented freespace
or not, this makes us ignore small discard sometimes. Fix it.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Reviewed-by : Changman Lee <cm224.lee@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b28c3f94

f2fs: add cond_resched() to sync_dirty_dir_inodes() · 7ecebe5e

由 Sebastian Andrzej Siewior 提交于 2月 27, 2015

In a preempt-off enviroment a alot of FS activity (write/delete) I run
into a CPU stall:

| NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u2:2:59]
| Modules linked in:
| CPU: 0 PID: 59 Comm: kworker/u2:2 Tainted: G        W      3.19.0-00010-g10c11c51ffed #153
| Workqueue: writeback bdi_writeback_workfn (flush-179:0)
| task: df230000 ti: df23e000 task.ti: df23e000
| PC is at __submit_merged_bio+0x6c/0x110
| LR is at f2fs_submit_merged_bio+0x74/0x80
…
| [<c00085c4>] (gic_handle_irq) from [<c0012e84>] (__irq_svc+0x44/0x5c)
| Exception stack(0xdf23fb48 to 0xdf23fb90)
| fb40:                   deef3484 ffff0001 ffff0001 00000027 deef3484 00000000
| fb60: deef3440 00000000 de426000 deef34ec deefc440 df23fbb4 df23fbb8 df23fb90
| fb80: c02191f0 c0218fa0 60000013 ffffffff
| [<c0012e84>] (__irq_svc) from [<c0218fa0>] (__submit_merged_bio+0x6c/0x110)
| [<c0218fa0>] (__submit_merged_bio) from [<c02191f0>] (f2fs_submit_merged_bio+0x74/0x80)
| [<c02191f0>] (f2fs_submit_merged_bio) from [<c021624c>] (sync_dirty_dir_inodes+0x70/0x78)
| [<c021624c>] (sync_dirty_dir_inodes) from [<c0216358>] (write_checkpoint+0x104/0xc10)
| [<c0216358>] (write_checkpoint) from [<c021231c>] (f2fs_sync_fs+0x80/0xbc)
| [<c021231c>] (f2fs_sync_fs) from [<c0221eb8>] (f2fs_balance_fs_bg+0x4c/0x68)
| [<c0221eb8>] (f2fs_balance_fs_bg) from [<c021e9b8>] (f2fs_write_node_pages+0x40/0x110)
| [<c021e9b8>] (f2fs_write_node_pages) from [<c00de620>] (do_writepages+0x34/0x48)
| [<c00de620>] (do_writepages) from [<c0145714>] (__writeback_single_inode+0x50/0x228)
| [<c0145714>] (__writeback_single_inode) from [<c0146184>] (writeback_sb_inodes+0x1a8/0x378)
| [<c0146184>] (writeback_sb_inodes) from [<c01463e4>] (__writeback_inodes_wb+0x90/0xc8)
| [<c01463e4>] (__writeback_inodes_wb) from [<c01465f8>] (wb_writeback+0x1dc/0x28c)
| [<c01465f8>] (wb_writeback) from [<c0146dd8>] (bdi_writeback_workfn+0x2ac/0x460)
| [<c0146dd8>] (bdi_writeback_workfn) from [<c003c3fc>] (process_one_work+0x11c/0x3a4)
| [<c003c3fc>] (process_one_work) from [<c003c844>] (worker_thread+0x17c/0x490)
| [<c003c844>] (worker_thread) from [<c0041398>] (kthread+0xec/0x100)
| [<c0041398>] (kthread) from [<c000ed10>] (ret_from_fork+0x14/0x24)

As it turns out, the code loops in sync_dirty_dir_inodes() and waits for
others to make progress but since it never leaves the CPU there is no
progress made. At the time of this stall, there is also a rm process
blocked:
| rm              R running      0  1989   1774 0x00000000
| [<c047c55c>] (__schedule) from [<c00486dc>] (__cond_resched+0x30/0x4c)
| [<c00486dc>] (__cond_resched) from [<c047c8c8>] (_cond_resched+0x4c/0x54)
| [<c047c8c8>] (_cond_resched) from [<c00e1aec>] (truncate_inode_pages_range+0x1f0/0x5e8)
| [<c00e1aec>] (truncate_inode_pages_range) from [<c00e1fd8>] (truncate_inode_pages+0x28/0x30)
| [<c00e1fd8>] (truncate_inode_pages) from [<c00e2148>] (truncate_inode_pages_final+0x60/0x64)
| [<c00e2148>] (truncate_inode_pages_final) from [<c020c92c>] (f2fs_evict_inode+0x4c/0x268)
| [<c020c92c>] (f2fs_evict_inode) from [<c0137214>] (evict+0x94/0x140)
| [<c0137214>] (evict) from [<c01377e8>] (iput+0xc8/0x134)
| [<c01377e8>] (iput) from [<c01333e4>] (d_delete+0x154/0x180)
| [<c01333e4>] (d_delete) from [<c0129870>] (vfs_rmdir+0x114/0x12c)
| [<c0129870>] (vfs_rmdir) from [<c012d644>] (do_rmdir+0x158/0x168)
| [<c012d644>] (do_rmdir) from [<c012dd90>] (SyS_unlinkat+0x30/0x3c)
| [<c012dd90>] (SyS_unlinkat) from [<c000ec40>] (ret_fast_syscall+0x0/0x4c)

As explained by Jaegeuk Kim:
|This inode is the directory (c.f., do_rmdir) causing a infinite loop on
|sync_dirty_dir_inodes.
|The sync_dirty_dir_inodes tries to flush dirty dentry pages, but if the
|inode is under eviction, it submits bios and do it again until eviction
|is finished.

This patch adds a cond_resched() (as suggested by Jaegeuk) after a BIO
is submitted so other thread can make progress.
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
[Jaegeuk Kim: change fs/f2fs to f2fs in subject as naming convention]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7ecebe5e