提交 · e2b4e2bc8865e03eecd49caa9713a2402a96bba9 · openanolis / cloud-kernel

22 8月, 2015 4 次提交

f2fs: fix incorrect mapping for bmap · e2b4e2bc

由 Chao Yu 提交于 8月 19, 2015

The test step is like below:
1. touch file
2. truncate -s $((1024*1024)) file
3. fallocate -o 0 -l $((1024*1024)) file
4. fibmap.f2fs file

Our result of fibmap.f2fs showed below is not correct:

file_pos   start_blk     end_blk        blks
       0    -937166132    -937166132           1
    4096    -937166132    -937166132           1
    8192    -937166132    -937166132           1
   12288    -937166132    -937166132           1
   16384    -937166132    -937166132           1
   20480    -937166132    -937166132           1
...
 1040384    -937166132    -937166132           1
 1044480    -937166132    -937166132           1

This is because f2fs_map_blocks will return with no error when meeting
a hole or preallocated block, the caller __get_data_block will map the
uninitialized variable value to bh->b_blocknr.

Unfortunately generic_block_bmap will neither check the return value of
get_data() nor check mapping info of buffer_head, result in returning
the random block address.

After fixing the issue, our result shows correctly:

file_pos   start_blk     end_blk        blks
       0           0           0         256
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e2b4e2bc

f2fs: add annotation for space utilization of regular/inline dentry · c031f6a9

由 Chao Yu 提交于 8月 19, 2015

Add annotation to let us know more clearly about space utilization
information of regular dentry and inline dentry.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c031f6a9

f2fs: fix to update cached_en of extent tree properly · f8b703da

由 Fan Li 提交于 8月 18, 2015

In f2fs_lookup_extent_tree, et->cached_en was read and updated with only
read lock held,
it could cause __lookup_extent_tree within return entirely wrong
extent_node, if other
thread update et->cached_en just before __lookup_extent_tree return.

However, there are two things about this patch that need to be noticed:
1. It does no good to arrange the order of concurrent read/write, the result
would still
be random in such case.
2. It's built on this assumption: the mix up of reads and writes on a single
pointer would
not make the pointer partially wrong at any time. Please let me know if I'm
wrong, thx.
Signed-off-by: NFan li <fanofcode.li@samsung.com>
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

f8b703da

f2fs: fix typo · 217940d4

由 Junesung Lee 提交于 8月 18, 2015

Fix typo.
Signed-off-by: NJunesung Lee <junesoung412@gmail.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

217940d4

21 8月, 2015 12 次提交

f2fs: check the node block address of newly allocated nid · 24928634

由 Jaegeuk Kim 提交于 8月 16, 2015

This patch adds a routine which checks the block address of newly allocated nid.
If an nid has already allocated by other thread due to subtle data races, it
will result in filesystem corruption.
So, it needs to check whether its block address was already allocated or not
in prior to nid allocation as the last chance.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

24928634

f2fs: go out for insert_inode_locked failure · a21c20f0

由 Jaegeuk Kim 提交于 8月 16, 2015

We should not call unlock_new_inode when insert_inode_locked failed.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

a21c20f0

f2fs: retry gc if one section is not successfully reclaimed · 5ee5293c

由 Jaegeuk Kim 提交于 8月 15, 2015

If FG_GC failed to reclaim one section, let's retry with another section
from the start, since we can get anoterh good candidate.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

5ee5293c

f2fs: fix to cover lock_op for update_inode_page · 2286c020

由 Jaegeuk Kim 提交于 8月 15, 2015

Previously, update_inode_page is not called under f2fs_lock_op.
Instead we should call with f2fs_write_inode.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2286c020

f2fs: reuse nids more aggressively · 26834466

由 Jaegeuk Kim 提交于 8月 14, 2015

If we can reuse nids as many as possible, we can mitigate producing obsolete
node pages in the page cache.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

26834466

f2fs: avoid garbage collecting already moved node blocks · 26d58599

由 Jaegeuk Kim 提交于 8月 14, 2015

If node blocks were already moved, we don't need to move them again.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

26d58599

f2fs: handle failed bio allocation · 740432f8

由 Jaegeuk Kim 提交于 8月 14, 2015

As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the
same time. So, we can't guarantee that bio is allocated all the time.

"
 *   When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be
 *   able to allocate a bio. This is due to the mempool guarantees. To make this
 *   work, callers must never allocate more than 1 bio at a time from this pool.
 *   Callers that need to allocate more than 1 bio must always submit the
 *   previously allocated bio for IO before attempting to allocate a new one.
 *   Failure to do so can cause deadlocks under memory pressure.
"
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

740432f8

f2fs: increase the number of max hard links · a6db67f0

由 Jaegeuk Kim 提交于 8月 10, 2015

This patch increases the number of maximum hard links for one file.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

a6db67f0

f2fs: skip checkpoint if there is no dirty and prefree segments · 798c1b16

由 Jaegeuk Kim 提交于 8月 11, 2015

We should avoid needless checkpoints when there is no dirty and prefree segment.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

798c1b16

f2fs: shrink free_nids entries · 31696580

由 Chao Yu 提交于 7月 28, 2015

This patch introduces __count_free_nids/try_to_free_nids and registers
them in slab shrinker for shrinking under memory pressure.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

31696580

f2fs: avoid clear valid page · 206e61be

由 Chao Yu 提交于 8月 12, 2015

In f2fs_delete_entry, if last dirent is remove from the dentry page,
we will try to punch that page since it has no valid date in it.

But truncate_hole which is used for punching could fail because of
no memory or IO error, if that happened, we'd better skip clearing
this valid dentry page.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

206e61be

MAINTAINERS: add myself as a dedicated reviewer of f2fs · 7b2a246b

由 Chao Yu 提交于 8月 12, 2015

I volunteer to be a dedicated reviewer of f2fs, add my email address in
maintainship entry of f2fs.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7b2a246b

20 8月, 2015 1 次提交

f2fs: do not write any node pages related to orphan inodes · 315df839

由 Jaegeuk Kim 提交于 8月 11, 2015

We should not write node pages when deleting orphan inodes.
In order to do that, we can eaisly set POR_DOING flag earlier before entering
orphan inode routine.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

315df839

15 8月, 2015 3 次提交

f2fs: avoid a build warning · 4c278394

由 Jaegeuk Kim 提交于 8月 11, 2015

If F2FS_CHECK_FS is turned off, we can get a build warning for unused variable.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4c278394

f2fs: handle error of f2fs_iget correctly · 8c14bfad

由 Chao Yu 提交于 8月 07, 2015

In recover_orphan_inode, whenever f2fs_iget fail, we will make kernel panic,
but it's not reasonable, because f2fs_iget can fail due to a lot of reasons
including out of memory.

So we change error handling method as below:
a) when finding no entry for the orphan inode, bug_on for catching bugs;
b) for other reasons, report it to caller.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

8c14bfad

f2fs: do not assign a new segment for dio under space shortage · 47e70ca4

由 Jaegeuk Kim 提交于 8月 11, 2015

If there is not enough free segment, we should not assign a new segment
explicitly. Otherwise, we can run out of free segment.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

47e70ca4

12 8月, 2015 2 次提交

f2fs: remove inmem radix tree · decd36b6

由 Chao Yu 提交于 8月 07, 2015

Previously, we use radix tree to index all registered page entries for
atomic file, but now we only use radix tree to see whether current page
is indexed or not, since the other user of radix tree is gone in commit
042b7816 ("f2fs: remove unnecessary call to invalidate inmemory pages").

So in this patch, we try to use one more efficient way:
Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private
value to indicate page indexing status. By using this way, we can save
memory and lookup time.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

decd36b6

f2fs: report EINVAL for unalignment direct IO · c15e8599

由 Chao Yu 提交于 8月 07, 2015

We run ltp testcase with f2fs and obtain a TFAIL in diotest4, the result in
detail is as fallow:

dio04

<<<test_start>>>
tag=dio04 stime=1432278894
cmdline="diotest4"
contacts=""
analysis=exit
<<<test_output>>>
diotest4    1  TPASS  :  Negative Offset
diotest4    2  TPASS  :  removed
diotest4    3  TFAIL  :  diotest4.c:129: write allows odd count.returns 1: Success
diotest4    4  TFAIL  :  diotest4.c:183: Odd count of read and write
diotest4    5  TPASS  :  Read beyond the file size
......

the result of ext4 with same environment:

dio04

<<<test_start>>>
tag=dio04 stime=1432259643
cmdline="diotest4"
contacts=""
analysis=exit
<<<test_output>>>
diotest4    1  TPASS  :  Negative Offset
diotest4    2  TPASS  :  removed
diotest4    3  TPASS  :  Odd count of read and write
diotest4    4  TPASS  :  Read beyond the file size
......

The reason is that when triggering DIO in f2fs, we will return zero value
in ->direct_IO if writer's buffer offset, file offset and transfer size is
not alignment to block size of filesystem, resulting in falling back into
buffered write instead of returning -EINVAL.

This patch fixes that problem by returning correct error number for above
case, and removing the judgement condition in check_direct_IO to make sure
the verification will be enabled for direct reader too.

Besides, Jaegeuk Kim pointed out that there is expectional cases we should
always make direct-io falling back into buffered write, such as dio in
encrypted file.
Signed-off-by: NYunlei He <heyunlei@huawei.com>
[Chao Yu make small change and add detail description in commit message]
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c15e8599

11 8月, 2015 1 次提交

f2fs: report error of fill_zero · 6394328a

由 Chao Yu 提交于 8月 07, 2015

fill_zero can fail due to a lot of reason, but previously we do not handle
its return value, so its callers such as punch_hole/f2fs_zero_range may
report success, but actually can fail because of error occurs inside
fill_zero.

This patch fixes to report correct return value of fill_zero.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

6394328a

06 8月, 2015 2 次提交

f2fs: recover invalid/reserved block address for fsynced file · 12a8343e

由 Chao Yu 提交于 8月 05, 2015

When testing with generic/101 in xfstests, error message outputed as below:

    --- tests/generic/101.out
    +++ results//generic/101.out.bad
    @@ -10,10 +10,14 @@
     File foo content after log replay:
     0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
     *
    -0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    +0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
     *
     0372000
    ...
    (Run 'diff -u tests/generic/101.out results/generic/101.out.bad'  to see the entire diff)

The test flow is like below:
1. pwrite foo -S 0xaa 0 64K
2. pwrite foo -S 0xbb 64K 61K
3. sync
4. truncate foo 64K
5. truncate foo 125K
6. fsync foo
7. flakey drop writes
8. umount

After this test, we expect the data of recovered file will have the first
64k of data filling with value 0xaa and the next 61k of data filling with
value 0x00 because we have fsynced it before dropping writes in dm.

In f2fs, during recovering, we will only recover the valid block address
in direct node page if it is marked as a fsynced dnode, but block address
which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be
recovered. So, the file recovered shows its incorrect data 0xbb in range of
[61k, 125k].

In this patch, we fix to recover invalid/reserved block during recover flow.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

12a8343e

f2fs: use extent cache to optimize f2fs_reserve_block · 759af1c9

由 Fan Li 提交于 8月 05, 2015

In some cases, we only need the block address when we call
f2fs_reserve_block,
other fields of struct dnode_of_data aren't necessary.
We can try extent cache first for such cases in order to speed up the
process.
Signed-off-by: NFan li <fanofcode.li@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

759af1c9

05 8月, 2015 15 次提交

f2fs: invalidate temporary meta page · e90c2d28

由 Chao Yu 提交于 7月 28, 2015

To avoid meeting garbage data in next free node block at the end of warm
node chain when doing recovery, we will try to zero out that invalid block.

If the device is not support discard, our way for zeroing out block is:
grabbing a temporary zeroed page in meta inode, then, issue write request
with this page.

But, we forget to release that temporary page, so our memory usage will
increase without gaining any hit ratio benefit, so it's better to free it
for saving memory.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e90c2d28

f2fs: fix to release inode page correctly · 470f00e9

由 Chao Yu 提交于 7月 14, 2015

In following call path, we will pass a locked and referenced ipage
pointer to get_new_data_page:
 - init_inode_metadata
  - make_empty_dir
   - get_new_data_page

There are two exit paths in get_new_data_page when error occurs:
1) grab_cache_page fails, ipage will not be released;
2) f2fs_reserve_block fails, ipage will be released in callee.

So, it's not consistent for error handling in get_new_data_page.

For f2fs_reserve_block, it's not very easy to change the rule
of error handling, since it's already complicated.

Here we deside to choose an easy way to fix this issue:
If any error occur in get_new_data_page, we will ensure releasing
ipage in this function.

The same issue is in f2fs_convert_inline_dir, fix that too.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

470f00e9

f2fs: unify f2fs_bug_on when check blocks and segment · 7a04f64d

由 Liu Xue 提交于 7月 27, 2015

Replace BUG_ON with f2fs_bug_on to deal with
block and segment validity check failed.
Signed-off-by: NXue Liu <liuxueliu.liu@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7a04f64d

f2fs: freeze filesystem when fail to update meta page due to IO error · f3f338ca

由 Chao Yu 提交于 7月 29, 2015

In get_meta_page, we guarantee no failure for the returned page,
but sometimes, IO error from device will incur returning an
non-updated page.

Then, we still use this page as updated one, exception could happen
when using this kind of page.

So in this condition, we'd better freeze fs by making fs readonly and
and stop doing checkpoint.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

f3f338ca

f2fs: change the timing of f2fs_wait_on_page_writeback · 5768dcdd

由 Fan Li 提交于 8月 04, 2015

some backing devices need pages to be stable during writeback. It doesn't
matter if
the page is completely overwritten or already uptodate, it needs to wait
before write.
Signed-off-by: NFan li <fanofcode.li@samsung.com>
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

5768dcdd

f2fs: handle error cases in commit_inmem_pages · edb27dee

由 Jaegeuk Kim 提交于 7月 25, 2015

This patch adds to handle error cases in commit_inmem_pages.
If an error occurs, it stops to write the pages and return the error right
away.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

edb27dee

f2fs: fix to build free nids from readaheaded nat pages · a6d494b6

由 Chao Yu 提交于 7月 24, 2015

When there is no enough free nids in free nid cache, we will try to
readahead FREE_NID_PAGES:4 nat pages into page cache of meta_inode,
then, reading nat entries in nat page for adding free nids to free nid
cache.

But when traversing all nat pages we readaheaded in a circulation,
our exit condition is not set right, one more nat page will be scanned
without readaheading, resulting worse read performance.

This patch fixes to read the correct number nat pages to avoid bad
performance.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

a6d494b6

f2fs: fix inline data/dentry stat number leak · e4e76272

由 Chao Yu 提交于 7月 24, 2015

If we clear inline data/dentry flag in handle_failed_inode, we will fail
to decline the stat count of inline data/dentry in f2fs_evict_inode due
to no flag in inode. So remove the wrong clearing.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e4e76272

f2fs: convert inline data before set atomic/volatile flag · f4c9c743

由 Chao Yu 提交于 7月 17, 2015

In f2fs_ioc_start_{atomic,volatile}_write, if we failed in converting
inline data, we will report error to user, but still remain atomic/volatile
flag in inode, it will impact further writes for this file. Fix it.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

f4c9c743

f2fs: fix to wait all atomic written pages writeback · a5f64b6a

由 Chao Yu 提交于 7月 17, 2015

This patch fixes the incorrect range (0, LONG_MAX) which is used
in ranged fsync. If we use LONG_MAX as the parameter for indicating
the end of file we want to synchronize, in 32-bits architecture
machine, these datas after 4GB offset may not be persisted in
storage after ->fsync returned.

Here, we alter LONG_MAX to LLONG_MAX to fix this issue.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

a5f64b6a

f2fs: skip writing in ->writepages when no dirty pages exist · 6a290544

由 Chao Yu 提交于 7月 17, 2015

When flushing comes from background, if there is no dirty page in the
mapping of inode, we'd better to skip seeking dirty page from mapping
for writebacking.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

6a290544

f2fs: optimize f2fs_write_cache_pages · 737f1899

由 Tiezhu Yang 提交于 7月 17, 2015

The if statement "goto continue_unlock" is exactly the same when
each if condition is true that is depended on the value of both
"step" and "is_cold_data(page)" are 0 or 1. That means when the
value of "step" equals to "is_cold_data(page)", the if condition
is true and the if statement "goto continue_unlock" appears only
once, so it can be optimized to reduce the duplicated code.
Signed-off-by: NTiezhu Yang <kernelpatch@126.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

737f1899

f2fs: fix double lock in handle_failed_inode · 55f57d2c

由 Chao Yu 提交于 7月 16, 2015

In handle_failed_inode, there is a potential deadlock which can happen
in below call path:

- f2fs_create
 - f2fs_lock_op   down_read(cp_rwsem)
 - f2fs_add_link
  - __f2fs_add_link
   - init_inode_metadata
    - f2fs_init_security    failed
    - truncate_blocks    failed
 - handle_failed_inode
  - f2fs_truncate
   - truncate_blocks(..,true)
					- write_checkpoint
					 - block_operations
					  - f2fs_lock_all  down_write(cp_rwsem)
    - f2fs_lock_op   down_read(cp_rwsem)

So in this path, we pass parameter to f2fs_truncate to make sure
cp_rwsem in truncate_blocks will not be locked again.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

55f57d2c

f2fs: reduce region of cp_rwsem covered in f2fs_do_collapse · ecbaa406

由 Chao Yu 提交于 7月 16, 2015

In f2fs_do_collapse, region cp_rwsem covered is large, since it will be
held until all blocks are left shifted, so if we try to collapse small
area at the beginning of large file, checkpoint who want to grab writer's
lock of cp_rwsem will be delayed for long time.

In order to avoid this condition, altering to lock/unlock cp_rwsem each
shift operation.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

ecbaa406

f2fs: add new interfaces for extent tree · 0f825ee6

由 Fan Li 提交于 7月 15, 2015

Add a lookup and a insertion interface for extent tree.
The new lookup return the insert position and the prev/next
extents closest to the offset we lookup when find no match.
The new insertion uses above parameters to improve performance.

There are three possible insertions after the lookup in
f2fs_update_extent_tree, two of them insert parts of removed extent
back to tree, since no merge happens during this process, new insertion
skips the merge check in this scanario; the another insertion inserts a
new extent to tree, new insertion uses prev/next extent and insert
position to insert this extent directly, and save the time of searching
down the tree.

As long as tree remains unchanged between lookup and insertion, this
would work fine. And the new lookup would be useful when add
multi-blocks extent support for insertion interface.
Signed-off-by: NFan li <fanofcode.li@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

0f825ee6

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功