提交 · 1e84371ffeef451e8532e0cd04c2fe59ff10c514 · openeuler / Kernel

10 1月, 2015 1 次提交

f2fs: change atomic and volatile write policies · 1e84371f

由 Jaegeuk Kim 提交于 10年前

This patch adds two new ioctls to release inmemory pages grabbed by atomic
writes.
 o f2fs_ioc_abort_volatile_write
  - If transaction was failed, all the grabbed pages and data should be written.
 o f2fs_ioc_release_volatile_write
  - This is to enhance the performance of PERSIST mode in sqlite.

In order to avoid huge memory consumption which causes OOM, this patch changes
volatile writes to use normal dirty pages, instead blocked flushing to the disk
as long as system does not suffer from memory pressure.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

1e84371f

09 12月, 2014 4 次提交

f2fs: avoid to ra unneeded blocks in recover flow · 635aee1f

由 Chao Yu 提交于 10年前

To improve recovery speed, f2fs try to readahead many contiguous blocks in warm
node segment, but for most time, abnormal power-off do not occur frequently, so
when mount a normal power-off f2fs image, by contrary ra so many blocks and then
invalid them will hurt the performance of mount.
It's better to just ra the first next-block for normal condition.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

635aee1f

f2fs: use atomic for counting inode with inline_{dir,inode} flag · 03e14d52

由 Chao Yu 提交于 10年前

As inline_{dir,inode} stat is increased/decreased concurrently by multi threads,
so the value is not so accurate, let's use atomic type for counting accurately.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

03e14d52

f2fs: count the number of inmemory pages · 8dcf2ff7

由 Jaegeuk Kim 提交于 10年前

This patch adds counting # of inmemory pages in the page cache.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

8dcf2ff7

f2fs: do retry operations with cond_resched · 9be32d72

由 Jaegeuk Kim 提交于 10年前

This patch revists retrial paths in f2fs.
The basic idea is to use cond_resched instead of retrying from the very early
stage.
Suggested-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

9be32d72

04 12月, 2014 1 次提交

f2fs: use rw_semaphore for nat entry lock · 8b26ef98

由 Jaegeuk Kim 提交于 10年前

Previoulsy, we used rwlock for nat_entry lock.
But, now we have a lot of complex operations in set_node_addr.
(e.g., allocating kernel memories, handling radix_trees, and so on)

So, this patches tries to change spinlock to rw_semaphore to give CPUs to other
threads.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

8b26ef98

24 11月, 2014 1 次提交

f2fs: introduce f2fs_dentry_kunmap to clean up · 9486ba44

由 Jaegeuk Kim 提交于 10年前

This patch introduces f2fs_dentry_kunmap to clean up dirty codes.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

9486ba44

20 11月, 2014 1 次提交

f2fs: introduce struct inode_management to wrap inner fields · 67298804

由 Chao Yu 提交于 10年前

Now in f2fs, we have three inode cache: ORPHAN_INO, APPEND_INO, UPDATE_INO,
and we manage fields related to inode cache separately in struct f2fs_sb_info
for each inode cache type.
This makes codes a bit messy, so that this patch intorduce a new struct
inode_management to wrap inner fields as following which make codes more neat.

/* for inner inode cache management */
struct inode_management {
	struct radix_tree_root ino_root;	/* ino entry array */
	spinlock_t ino_lock;			/* for ino entry lock */
	struct list_head ino_list;		/* inode list head */
	unsigned long ino_num;			/* number of entries */
};

struct f2fs_sb_info {
	...
	struct inode_management im[MAX_INO_ENTRY];      /* manage inode cache */
	...
}
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

67298804

07 11月, 2014 1 次提交

f2fs: introduce the number of inode entries · 8c402946

由 Jaegeuk Kim 提交于 10年前

This patch adds to monitor the number of ino entries.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

8c402946

05 11月, 2014 3 次提交

f2fs: introduce -o fastboot for reducing booting time only · d5053a34

由 Jaegeuk Kim 提交于 10年前

If a system wants to reduce the booting time as a top priority, now we can
use a mount option, -o fastboot.
With this option, f2fs conducts a little bit slow write_checkpoint, but
it can avoid the node page reads during the next mount time.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d5053a34

f2fs: avoid race condition in handling wait_io · 6a8f8ca5

由 Jaegeuk Kim 提交于 10年前

__submit_merged_bio    f2fs_write_end_io        f2fs_write_end_io
                       wait_io = X              wait_io = x
                       complete(X)              complete(X)
                       wait_io = NULL
wait_for_completion()
free(X)
                                                 spin_lock(X)
                                                 kernel panic

In order to avoid this, this patch removes the wait_io facility.
Instead, we can use wait_on_all_pages_writeback(sbi) to wait for end_ios.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

6a8f8ca5

f2fs: revisit inline_data to avoid data races and potential bugs · b3d208f9

由 Jaegeuk Kim 提交于 10年前

This patch simplifies the inline_data usage with the following rule.
1. inline_data is set during the file creation.
2. If new data is requested to be written ranges out of inline_data,
 f2fs converts that inode permanently.
3. There is no cases which converts non-inline_data inode to inline_data.
4. The inline_data flag should be changed under inode page lock.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b3d208f9

04 11月, 2014 15 次提交

f2fs: rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bit · 52aca074

由 Gu Zheng 提交于 10年前

Rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bit, which mean
set/clear bit and return the old value, for better readability.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

52aca074

f2fs: introduce f2fs_change_bit to simplify the change bit logic · c6ac4c0e

由 Gu Zheng 提交于 10年前

Introduce f2fs_change_bit to simplify the change bit logic in
function set_to_next_nat{sit}.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c6ac4c0e

f2fs: remove the redundant function cond_clear_inode_flag · fa528722

由 Gu Zheng 提交于 10年前

Use clear_inode_flag to replace the redundant cond_clear_inode_flag.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

fa528722

f2fs: reuse make_empty_dir code for inline_dentry · 062a3e7b

由 Jaegeuk Kim 提交于 10年前

This patch introduces do_make_empty_dir to mitigate code redundancy
for inline_dentry.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

062a3e7b

f2fs: introduce f2fs_dentry_ptr structure for code clean-up · 7b3cd7d6

由 Jaegeuk Kim 提交于 10年前

This patch introduces f2fs_dentry_ptr structure for the use of a function
parameter in inline_dentry operations.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7b3cd7d6

f2fs: reuse core function in f2fs_readdir for inline_dentry · 38594de7

由 Jaegeuk Kim 提交于 10年前

This patch introduces a core function, f2fs_fill_dentries, to remove
redundant code in f2fs_readdir and f2fs_read_inline_dir.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

38594de7

f2fs: add stat info for inline_dentry inodes · 3289c061

由 Jaegeuk Kim 提交于 10年前

This patch adds status information for inline_dentry inodes.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

3289c061

f2fs: avoid deadlock on init_inode_metadata · bce8d112

由 Jaegeuk Kim 提交于 10年前

Previously, init_inode_metadata does not hold any parent directory's inode
page. So, f2fs_init_acl can grab its parent inode page without any problem.
But, when we use inline_dentry, that page is grabbed during f2fs_add_link,
so that we can fall into deadlock condition like below.

INFO: task mknod:11006 blocked for more than 120 seconds.
      Tainted: G           OE  3.17.0-rc1+ #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mknod           D ffff88003fc94580     0 11006  11004 0x00000000
 ffff880007717b10 0000000000000002 ffff88003c323220 ffff880007717fd8
 0000000000014580 0000000000014580 ffff88003daecb30 ffff88003c323220
 ffff88003fc94e80 ffff88003ffbb4e8 ffff880007717ba0 0000000000000002
Call Trace:
 [<ffffffff8173dc40>] ? bit_wait+0x50/0x50
 [<ffffffff8173d4cd>] io_schedule+0x9d/0x130
 [<ffffffff8173dc6c>] bit_wait_io+0x2c/0x50
 [<ffffffff8173da3b>] __wait_on_bit_lock+0x4b/0xb0
 [<ffffffff811640a7>] __lock_page+0x67/0x70
 [<ffffffff810acf50>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff811652cc>] pagecache_get_page+0x14c/0x1e0
 [<ffffffffa029afa9>] get_node_page+0x59/0x130 [f2fs]
 [<ffffffffa02a63ad>] read_all_xattrs+0x24d/0x430 [f2fs]
 [<ffffffffa02a6ca2>] f2fs_getxattr+0x52/0xe0 [f2fs]
 [<ffffffffa02a7481>] f2fs_get_acl+0x41/0x2d0 [f2fs]
 [<ffffffff8122d847>] get_acl+0x47/0x70
 [<ffffffff8122db5a>] posix_acl_create+0x5a/0x150
 [<ffffffffa02a7759>] f2fs_init_acl+0x29/0xcb [f2fs]
 [<ffffffffa0286a8d>] init_inode_metadata+0x5d/0x340 [f2fs]
 [<ffffffffa029253a>] f2fs_add_inline_entry+0x12a/0x2e0 [f2fs]
 [<ffffffffa0286ea5>] __f2fs_add_link+0x45/0x4a0 [f2fs]
 [<ffffffffa028b5b6>] ? f2fs_new_inode+0x146/0x220 [f2fs]
 [<ffffffffa028b816>] f2fs_mknod+0x86/0xf0 [f2fs]
 [<ffffffff811e3ec1>] vfs_mknod+0xe1/0x160
 [<ffffffff811e4b26>] SyS_mknod+0x1f6/0x200
 [<ffffffff81741d7f>] tracesys+0xe1/0xe6
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

bce8d112

f2fs: reuse find_in_block code for find_in_inline_dir · 4e6ebf6d

由 Jaegeuk Kim 提交于 10年前

This patch removes redundant copied code in find_in_inline_dir.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4e6ebf6d

f2fs: reuse room_for_filename for inline dentry operation · a82afa20

由 Jaegeuk Kim 提交于 10年前

This patch introduces to reuse the existing room_for_filename for inline dentry
operation.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

a82afa20

f2fs: add key function to handle inline dir · 201a05be

由 Chao Yu 提交于 10年前

Adds Functions to implement inline dir init/lookup/insert/delete/convert ops.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
[Jaegeuk Kim: remove needless reserved area copy, pointed by Dan Carpenter]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

201a05be

f2fs: export dir operations for inline dir · dbeacf02

由 Chao Yu 提交于 10年前

This patch exports some dir operations for inline dir, additionally introduces
f2fs_drop_nlink from f2fs_delete_entry for reusing by inline dir function.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

dbeacf02

f2fs: add infra struct and helper for inline dir · 34d67deb

由 Chao Yu 提交于 10年前

This patch defines macro/inline dentry structure, and adds some helpers for
inline dir infrastructure.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

34d67deb

f2fs: invalidate inmemory page · cbcb2872

由 Jaegeuk Kim 提交于 10年前

If user truncates file's data, we should truncate inmemory pages too.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

cbcb2872

f2fs: do not make dirty any inmemory pages · 34ba94ba

由 Jaegeuk Kim 提交于 10年前

This patch let inmemory pages be clean all the time.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

34ba94ba

08 10月, 2014 1 次提交

f2fs: support volatile operations for transient data · 02a1335f

由 Jaegeuk Kim 提交于 10年前

This patch adds support for volatile writes which keep data pages in memory
until f2fs_evict_inode is called by iput.

For instance, we can use this feature for the sqlite database as follows.
While supporting atomic writes for main database file, we can keep its journal
data temporarily in the page cache by the following sequence.

1. open
 -> ioctl(F2FS_IOC_START_VOLATILE_WRITE);
2. writes
 : keep all the data in the page cache.
3. flush to the database file with atomic writes
  a. ioctl(F2FS_IOC_START_ATOMIC_WRITE);
  b. writes
  c. ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
4. close
 -> drop the cached data
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

02a1335f

07 10月, 2014 1 次提交

f2fs: support atomic writes · 88b88a66

由 Jaegeuk Kim 提交于 10年前

This patch introduces a very limited functionality for atomic write support.
In order to support atomic write, this patch adds two ioctls:
 o F2FS_IOC_START_ATOMIC_WRITE
 o F2FS_IOC_COMMIT_ATOMIC_WRITE

The database engine should be aware of the following sequence.
1. open
 -> ioctl(F2FS_IOC_START_ATOMIC_WRITE);
2. writes
  : all the written data will be treated as atomic pages.
3. commit
 -> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE);
  : this flushes all the data blocks to the disk, which will be shown all or
  nothing by f2fs recovery procedure.
4. repeat to #2.

The IO pattens should be:

  ,- START_ATOMIC_WRITE                  ,- COMMIT_ATOMIC_WRITE
 CP | D D D D D D | FSYNC | D D D D | FSYNC ...
                      `- COMMIT_ATOMIC_WRITE
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

88b88a66

01 10月, 2014 4 次提交

f2fs: call f2fs_unlock_op after error was handled · 44c16156

由 Jaegeuk Kim 提交于 10年前

This patch relocates f2fs_unlock_op in every directory operations to be called
after any error was processed.
Otherwise, the checkpoint can be entered with valid node ids without its
dentry when -ENOSPC is occurred.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

44c16156

f2fs: refactor flush_nat_entries to remove costly reorganizing ops · 309cc2b6

由 Jaegeuk Kim 提交于 10年前

Previously, f2fs tries to reorganize the dirty nat entries into multiple sets
according to its nid ranges. This can improve the flushing nat pages, however,
if there are a lot of cached nat entries, it becomes a bottleneck.

This patch introduces a new set management flow by removing dirty nat list and
adding a series of set operations when the nat entry becomes dirty.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

309cc2b6

f2fs: introduce FITRIM in f2fs_ioctl · 4b2fecc8

由 Jaegeuk Kim 提交于 10年前

This patch introduces FITRIM in f2fs_ioctl.
In this case, f2fs will issue small discards and prefree discards as many as
possible for the given area.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4b2fecc8

f2fs: introduce cp_control structure · 75ab4cb8

由 Jaegeuk Kim 提交于 10年前

This patch add a new data structure to control checkpoint parameters.
Currently, it presents the reason of checkpoint such as is_umount and normal
sync.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

75ab4cb8

24 9月, 2014 3 次提交

f2fs: remove redundant operation during roll-forward recovery · c52e1b10

由 Jaegeuk Kim 提交于 10年前

If same data is updated multiple times, we don't need to redo whole the
operations.
Let's just update the lastest one.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c52e1b10

f2fs: fix conditions to remain recovery information in f2fs_sync_file · 88bd02c9

由 Jaegeuk Kim 提交于 10年前

This patch revisited whole the recovery information during the f2fs_sync_file.

In this patch, there are three information to make a decision.

a) IS_CHECKPOINTED,	/* is it checkpointed before? */
b) HAS_FSYNCED_INODE,	/* is the inode fsynced before? */
c) HAS_LAST_FSYNC,	/* has the latest node fsync mark? */

And, the scenarios for our rule are based on:

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
2. inode(x) | CP | inode(F) | dnode(F)
3. inode(x) | CP | dnode(F) | inode(x) | inode(F)
4. inode(x) | CP | dnode(F) | inode(F)
5. CP | inode(x) | dnode(F) | inode(DF)
6. CP | inode(DF) | dnode(F)
7. CP | dnode(F) | inode(DF)
8. CP | dnode(F) | inode(x) | inode(DF)

For example, #3, the three conditions should be changed as follows.

   inode(x) | CP | dnode(F) | inode(x) | inode(F)
a)    x       o      o          o          o
b)    x       x      x          x          o
c)    x       o      o          x          o

If f2fs_sync_file stops   ------^,
 it should write inode(F)    --------------^

So, the need_inode_block_update should return true, since
 c) get_nat_flag(e, HAS_LAST_FSYNC), is false.

For example, #8,
      CP | alloc | dnode(F) | inode(x) | inode(DF)
a)    o      x        x          x          x
b)    x               x          x          o
c)    o               o          x          o

If f2fs_sync_file stops   -------^,
 it should write inode(DF)    --------------^

Note that, the roll-forward policy should follow this rule, which means,
if there are any missing blocks, we doesn't need to recover that inode.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

88bd02c9

f2fs: use meta_inode cache to improve roll-forward speed · 4c521f49

由 Jaegeuk Kim 提交于 10年前

Previously, all the dnode pages should be read during the roll-forward recovery.
Even worsely, whole the chain was traversed twice.
This patch removes that redundant and costly read operations by using page cache
of meta_inode and readahead function as well.
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4c521f49

16 9月, 2014 2 次提交

f2fs: give an option to enable in-place-updates during fsync to users · c1ce1b02

由 Jaegeuk Kim 提交于 10年前

If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file
only starts to try in-place-updates.
And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it
keeps out-of-order manner. Otherwise, it triggers in-place-updates.

This may be used by storage showing very high random write performance.

For example, it can be used when,

Seq. writes (Data) + wait + Seq. writes (Node)

is pretty much slower than,

Rand. writes (Data)
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

c1ce1b02

f2fs: expand counting dirty pages in the inode page cache · a7ffdbe2

由 Jaegeuk Kim 提交于 10年前

Previously f2fs only counts dirty dentry pages, but there is no reason not to
expand the scope.

This patch changes the names on the management of dirty pages and to count
dirty pages in each inode info as well.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

a7ffdbe2

10 9月, 2014 2 次提交

f2fs: use lock-less list(llist) to simplify the flush cmd management · 721bd4d5

由 Gu Zheng 提交于 10年前

We use flush cmd control to collect many flush cmds, and flush them
together. In this case, we use two list to manage the flush cmds
(collect and dispatch), and one spin lock is used to protect this.
In fact, the lock-less list(llist) is very suitable to this case,
and we use simplify this routine.

-
v2:
-use llist_for_each_entry_safe to fix possible use-after-free issue.
-remove the unused field from struct flush_cmd.
Thanks for Yu's suggestion.
-
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

721bd4d5

f2fs: refactor flush_sit_entries codes for reducing SIT writes · 184a5cd2

由 Chao Yu 提交于 10年前

In commit aec71382 ("f2fs: refactor flush_nat_entries codes for reducing NAT
writes"), we descripte the issue as below:

"Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
   nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
   journal is full, then flush the left dirty entries to disk without merge
   journaled entries, so these journaled entries may be flushed to disk at next
   checkpoint but lost chance to flushed last time."

Actually, we have the same problem in using SIT journal area.

In this patch, firstly we will update sit journal with dirty entries as many as
possible. Secondly if there is no space in sit journal, we will remove all
entries in journal and walk through the whole dirty entry bitmap of sit,
accounting dirty sit entries located in same SIT block to sit entry set. All
entry sets are linked to list sit_entry_set in sm_info, sorted ascending order
by count of entries in set. Later we flush entries in set which have fewest
entries into journal as many as we can, and then flush dense set with merged
entries to disk.

In this way we can use sit journal area more effectively, also we will reduce
SIT update, result in gaining in performance and saving lifetime of flash
device.

In my testing environment, it shows this patch can help to reduce SIT block
update obviously.

virtual machine + hard disk:
fsstress -p 20 -n 400 -l 5
		sit page num	cp count	sit pages/cp
based		2006.50		1349.75		1.486
patched		1566.25		1463.25		1.070

Our latency of merging op is small when handling a great number of dirty SIT
entries in flush_sit_entries:
latency(ns)	dirty sit count
36038		2151
49168		2123
37174		2232
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

184a5cd2

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功