提交 · edd748e6c8e824a8281f8a8450f12c4a95ec61ee · openeuler / Kernel

06 9月, 2017 2 次提交

f2fs: avoid race in between atomic_read & atomic_inc · edd748e6

由 Chao Yu 提交于 8月 31, 2017

Previously, we will miss merging flush command during fsync due to below
race condition:

Thread A 		Thread B		Thread C
- f2fs_issue_flush
 - atomic_read(&issing_flush)
			- f2fs_issue_flush
			 - atomic_read(&issing_flush)
						- f2fs_issue_flush
						 - atomic_read(&issing_flush)
  - atomic_inc(&issing_flush)
			  - atomic_inc(&issing_flush)
						  - atomic_inc(&issing_flush)
   - submit_flush_wait
			   - submit_flush_wait
						   - submit_flush_wait

It needs to use atomic_inc_return instead to avoid such race.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

edd748e6

f2fs: remove unneeded parameter of change_curseg · 025d63a4

由 Chao Yu 提交于 8月 30, 2017

allocate_segment_by_default is the only caller of change_curseg passing
@reuse with 'false', but commit 763bfe1b ("f2fs: remove reusing any
prefree segments") removes the calling, after that, @reuse in
change_curseg always be true, so, let's clean up the unneeded parameter.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

025d63a4

30 8月, 2017 4 次提交

f2fs: wake up discard_thread iff there is a candidate · 01983c71

由 Jaegeuk Kim 提交于 8月 22, 2017

This patch fixes to avoid needless wake ups.
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

01983c71

f2fs: clear FI_HOT_DATA correctly · 84a23fbe

由 Chao Yu 提交于 8月 18, 2017

This patch fixes to clear FI_HOT_DATA correctly in below path:
- error handling in f2fs_ioc_start_atomic_write
- after commit atomic write in f2fs_ioc_commit_atomic_write
- after drop atomic write in drop_inmem_pages
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

84a23fbe

f2fs: fix out-of-order execution in f2fs_issue_flush · 6f890df0

由 Chao Yu 提交于 8月 21, 2017

In f2fs_issue_flush, due to out-of-order execution of CPU, wake_up can
be called before we insert issue_list, result in long latency of
wait_for_completion. Fix this by adding smp_mb() to force the order of
related codes.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

6f890df0

f2fs: issue discard commands if gc_urgent is set · 5f656541

由 Jaegeuk Kim 提交于 8月 15, 2017

It's time to issue all the discard commands, if user sets the idle time.
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

5f656541

22 8月, 2017 2 次提交

f2fs: introduce discard_granularity sysfs entry · 969d1b18

由 Chao Yu 提交于 8月 07, 2017

Commit d618ebaf ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.

This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.

Jaegeuk Kim:
 We must issue all the accumulated discard commands when fstrim is called.
 So, I've added pend_list_tag[] to indicate whether we should issue the
 commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
 P_TRIM is set once at a time, given fstrim trigger.
 In addition, issue_discard_thread is calling too much due to the number of
 discard commands remaining in the pending list. I added a timer to control
 it likewise gc_thread.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

969d1b18

f2fs: retry to revoke atomic commit in -ENOMEM case · 7f2b4e8e

由 Chao Yu 提交于 8月 08, 2017

During atomic committing, if we encounter -ENOMEM in revoke path, it's
better to give a chance to retry revoking.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7f2b4e8e

16 8月, 2017 1 次提交

f2fs: fix the size value in __check_sit_bitmap · 008396e1

由 Yunlong Song 提交于 8月 04, 2017

The current size value is not correct and will miss bitmap check.
Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

008396e1

10 8月, 2017 3 次提交

f2fs: add app/fs io stat · b0af6d49

由 Chao Yu 提交于 8月 02, 2017

This patch enables inner app/fs io stats and introduces below virtual fs
nodes for exposing stats info:
/sys/fs/f2fs/<dev>/iostat_enable
/proc/fs/f2fs/<dev>/iostat_info
Signed-off-by: NChao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix wrong stat assignment]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b0af6d49

f2fs: do not change the valid_block value if cur_valid_map was wrongly set or cleared · 35ee82ca

由 Yunlong Song 提交于 8月 02, 2017

Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

35ee82ca

f2fs: update cur_valid_map_mir together with cur_valid_map · 6415fedc

由 Yunlong Song 提交于 8月 02, 2017

When cur_valid_map passes the f2fs_test_and_set(,clear)_bit test,
cur_valid_map_mir update is skipped unlikely, so fix it. The fix
now changes the mirror check together with cur_valid_map all the
time.
Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
Signed-off-by: NChao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: Fix unused variable and add unlikely for corner condition.]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

6415fedc

04 8月, 2017 1 次提交

f2fs: support inode checksum · 704956ec

由 Chao Yu 提交于 7月 31, 2017

This patch adds to support inode checksum in f2fs.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix verification flow]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

704956ec

01 8月, 2017 1 次提交

f2fs: make background threads of f2fs being aware of freezing · dc6febb6

由 Chao Yu 提交于 7月 22, 2017

When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.

This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread

Note that, don't use sb_start_intwrite() in gc_thread_func() due to:

generic/241 reports below bug:

 ======================================================
 WARNING: possible circular locking dependency detected
 4.13.0-rc1+ #32 Tainted: G           O
 ------------------------------------------------------
 f2fs_gc-250:0/22186 is trying to acquire lock:
  (&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]

 but task is already holding lock:
  (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #2 (sb_internal#2){++++.-}:
        __lock_acquire+0x405/0x7b0
        lock_acquire+0xae/0x220
        __sb_start_write+0x11d/0x1f0
        f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
        evict+0xa8/0x170
        iput+0x1fb/0x2c0
        f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
        write_checkpoint+0x1b1/0x750 [f2fs]
        f2fs_sync_fs+0x85/0x1b0 [f2fs]
        f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
        f2fs_sync_file+0x34/0x40 [f2fs]
        vfs_fsync_range+0x4a/0xa0
        do_fsync+0x3c/0x60
        SyS_fdatasync+0x15/0x20
        do_fast_syscall_32+0xa1/0x1b0
        entry_SYSENTER_32+0x4c/0x7b

 -> #1 (&sbi->cp_mutex){+.+...}:
        __lock_acquire+0x405/0x7b0
        lock_acquire+0xae/0x220
        __mutex_lock+0x4f/0x830
        mutex_lock_nested+0x25/0x30
        write_checkpoint+0x2f/0x750 [f2fs]
        f2fs_sync_fs+0x85/0x1b0 [f2fs]
        sync_filesystem+0x67/0x80
        generic_shutdown_super+0x27/0x100
        kill_block_super+0x22/0x50
        kill_f2fs_super+0x3a/0x40 [f2fs]
        deactivate_locked_super+0x3d/0x70
        deactivate_super+0x40/0x60
        cleanup_mnt+0x39/0x70
        __cleanup_mnt+0x10/0x20
        task_work_run+0x69/0x80
        exit_to_usermode_loop+0x57/0x92
        do_fast_syscall_32+0x18c/0x1b0
        entry_SYSENTER_32+0x4c/0x7b

 -> #0 (&sbi->gc_mutex){+.+...}:
        validate_chain.isra.36+0xc50/0xdb0
        __lock_acquire+0x405/0x7b0
        lock_acquire+0xae/0x220
        __mutex_lock+0x4f/0x830
        mutex_lock_nested+0x25/0x30
        f2fs_sync_fs+0x7b/0x1b0 [f2fs]
        f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
        gc_thread_func+0x302/0x4a0 [f2fs]
        kthread+0xe9/0x120
        ret_from_fork+0x19/0x24

 other info that might help us debug this:

 Chain exists of:
   &sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(sb_internal#2);
                                lock(&sbi->cp_mutex);
                                lock(sb_internal#2);
   lock(&sbi->gc_mutex);

  *** DEADLOCK ***

 1 lock held by f2fs_gc-250:0/22186:
  #0:  (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]

 stack backtrace:
 CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G           O    4.13.0-rc1+ #32
 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
 Call Trace:
  dump_stack+0x5f/0x92
  print_circular_bug+0x1b3/0x1bd
  validate_chain.isra.36+0xc50/0xdb0
  ? __this_cpu_preempt_check+0xf/0x20
  __lock_acquire+0x405/0x7b0
  lock_acquire+0xae/0x220
  ? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
  __mutex_lock+0x4f/0x830
  ? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
  mutex_lock_nested+0x25/0x30
  ? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
  f2fs_sync_fs+0x7b/0x1b0 [f2fs]
  f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
  gc_thread_func+0x302/0x4a0 [f2fs]
  ? preempt_schedule_common+0x2f/0x4d
  ? f2fs_gc+0x540/0x540 [f2fs]
  kthread+0xe9/0x120
  ? f2fs_gc+0x540/0x540 [f2fs]
  ? kthread_create_on_node+0x30/0x30
  ret_from_fork+0x19/0x24

The deadlock occurs in below condition:
GC Thread			Thread B
- sb_start_intwrite
				- f2fs_sync_file
				 - f2fs_sync_fs
				  - mutex_lock(&sbi->gc_mutex)
				   - write_checkpoint
				    - block_operations
				     - f2fs_sync_inode_meta
				      - iput
				       - sb_start_intwrite
 - mutex_lock(&sbi->gc_mutex)

Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

dc6febb6

29 7月, 2017 1 次提交

f2fs: give a try to do atomic write in -ENOMEM case · 640cc189

由 Jaegeuk Kim 提交于 7月 19, 2017

It'd be better to retry writing atomic pages when we get -ENOMEM.
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

640cc189

08 7月, 2017 3 次提交

f2fs: introduce __check_sit_bitmap · 6915ea9d

由 Chao Yu 提交于 6月 30, 2017

After we introduce discard thread, discard command can be issued
concurrently with data allocating, this patch adds new function to
heck sit bitmap to ensure that userdata was invalid in which on-going
discard command covered.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

6915ea9d

f2fs: stop gc/discard thread in prior during umount · cce13252

由 Chao Yu 提交于 6月 29, 2017

This patch resolves kernel panic for xfstests/081, caused by recent f2fs_bug_on

 f2fs: add f2fs_bug_on in __remove_discard_cmd

For fixing, we will stop gc/discard thread in prior in ->kill_sb in order to
avoid referring and releasing race among them.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

cce13252

f2fs: avoid redundant f2fs_flush after remount · d871cd04

由 Yunlong Song 提交于 6月 24, 2017

create_flush_cmd_control will create redundant issue_flush_thread after each
remount with flush_merge option.
Signed-off-by: NYunlong Song <yunlong.song@huawei.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d871cd04

04 7月, 2017 5 次提交

f2fs: add f2fs_bug_on in __remove_discard_cmd · d9703d90

由 Chao Yu 提交于 6月 05, 2017

Recently, discard related codes have changed a lot, so add f2fs_bug_on to
detect potential bug.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d9703d90

f2fs: introduce __wait_one_discard_bio · 2a510c00

由 Chao Yu 提交于 6月 05, 2017

In order to avoid copied codes.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2a510c00

f2fs: sanity check size of nat and sit cache · 21d3f8e1

由 Jin Qian 提交于 6月 01, 2017

Make sure number of entires doesn't exceed max journal size.

Cc: stable@vger.kernel.org
Signed-off-by: NJin Qian <jinqian@android.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

21d3f8e1

f2fs: fix a panic caused by NULL flush_cmd_control · d4fdf8ba

由 Yunlei He 提交于 6月 01, 2017

Mount fs with option noflush_merge, boot failed for illegal address
fcc in function f2fs_issue_flush:

        if (!test_opt(sbi, FLUSH_MERGE)) {
                ret = submit_flush_wait(sbi);
                atomic_inc(&fcc->issued_flush);   ->  Here, fcc illegal
                return ret;
        }
Signed-off-by: NYunlei He <heyunlei@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d4fdf8ba

f2fs: Do not issue small discards in LFS mode · acfd2810

由 Damien Le Moal 提交于 5月 26, 2017

clear_prefree_segments() issues small discards after discarding full
segments. These small discards may not be section aligned, so not zone
aligned on a zoned block device, causing __f2fs_iissue_discard_zone() to fail.
Fix this by not issuing small discards for a volume mounted with the BLKZONED
feature enabled.

Cc: stable@vger.kernel.org
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

acfd2810

09 6月, 2017 1 次提交

block: switch bios to blk_status_t · 4e4cbee9

由 Christoph Hellwig 提交于 6月 03, 2017

Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

4e4cbee9

24 5月, 2017 11 次提交

f2fs: wait discard IO completion without cmd_lock held · 6afae633

由 Chao Yu 提交于 5月 19, 2017

Wait discard IO completion outside cmd_lock to avoid long latency
of holding cmd_lock in IO busy scenario.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

6afae633

f2fs: wake up all waiters in f2fs_submit_discard_endio · e31b9821

由 Chao Yu 提交于 5月 19, 2017

There could be more than one waiter waiting discard IO completion, so we
need use complete_all() instead of complete() in f2fs_submit_discard_endio
to avoid hungtask.

Fixes: 	ec9895ad ("f2fs: don't hold cmd_lock during waiting discard
command")
Cc: <stable@vger.kernel.org>
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e31b9821

f2fs: show more info if fail to issue discard · 04dfc230

由 Chao Yu 提交于 5月 19, 2017

Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

04dfc230

f2fs: introduce io_list for serialize data/node IOs · fb830fc5

由 Chao Yu 提交于 5月 19, 2017

Serialize data/node IOs by using fifo list instead of mutex lock,
it will help to enhance concurrency of f2fs, meanwhile keeping LFS
IO semantics.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

fb830fc5

f2fs: split wio_mutex · e41e6d75

由 Chao Yu 提交于 5月 19, 2017

Split wio_mutex to adjust different temperature bio cache.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e41e6d75

f2fs: combine huge num of discard rb tree consistence checks · 963932a9

由 Yunlei He 提交于 5月 19, 2017

Came across a hungtask caused by huge number of rb tree traversing
during adding discard addrs in cp. This patch combine these consistence
checks and move it to discard thread.
Signed-off-by: NYunlei He <heyunlei@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

963932a9

f2fs: try to freeze in gc and discard threads · 1d7be270

由 Jaegeuk Kim 提交于 5月 17, 2017

This allows to freeze gc and discard threads.

Cc: stable@vger.kernel.org
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

1d7be270

f2fs: avoid f2fs_lock_op for IPU writes · cc15620b

由 Jaegeuk Kim 提交于 5月 12, 2017

Currently, if we do get_node_of_data before f2fs_lock_op, there may be dead lock
as follows, where process A would be in infinite loop, and B will NOT be awaked.

Process A(cp):            Process B:
f2fs_lock_all(sbi)
                        get_dnode_of_data <---- lock dn.node_page
flush_nodes             f2fs_lock_op

So, this patch adds f2fs_trylock_op to avoid f2fs_lock_op done by IPU.
Signed-off-by: NHou Pengyang <houpengyang@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

cc15620b

f2fs: split bio cache · a912b54d

由 Jaegeuk Kim 提交于 5月 10, 2017

Split DATA/NODE type bio cache according to different temperature,
so write IOs with the same temperature can be merged in corresponding
bio cache as much as possible, otherwise, different temperature write
IOs submitting into one bio cache will always cause split of bio.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

a912b54d

f2fs: use fio instead of multiple parameters · 81377bd6

由 Jaegeuk Kim 提交于 5月 10, 2017

This patch just changes using fio instead of parameters.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

81377bd6

f2fs: remove unnecessary read cases in merged IO flow · b9109b0e

由 Jaegeuk Kim 提交于 5月 10, 2017

Merged IO flow doesn't need to care about read IOs.

f2fs_submit_merged_bio -> f2fs_submit_merged_write
f2fs_submit_merged_bios -> f2fs_submit_merged_writes
f2fs_submit_merged_bio_cond -> f2fs_submit_merged_write_cond
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b9109b0e

09 5月, 2017 2 次提交

fs: f2fs: use ktime_get_real_seconds for sit_info times · 48fbfe50

由 Deepa Dinamani 提交于 5月 08, 2017

CURRENT_TIME_SEC is not y2038 safe.

Replace use of CURRENT_TIME_SEC with ktime_get_real_seconds in segment
timestamps used by GC algorithm including the segment mtime timestamps.

Link: http://lkml.kernel.org/r/1491613030-11599-2-git-send-email-deepa.kernel@gmail.comSigned-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <yuchao0@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

48fbfe50

mm: introduce kv[mz]alloc helpers · a7c3e901

由 Michal Hocko 提交于 5月 08, 2017

Patch series "kvmalloc", v5.

There are many open coded kmalloc with vmalloc fallback instances in the
tree.  Most of them are not careful enough or simply do not care about
the underlying semantic of the kmalloc/page allocator which means that
a) some vmalloc fallbacks are basically unreachable because the kmalloc
part will keep retrying until it succeeds b) the page allocator can
invoke a really disruptive steps like the OOM killer to move forward
which doesn't sound appropriate when we consider that the vmalloc
fallback is available.

As it can be seen implementing kvmalloc requires quite an intimate
knowledge if the page allocator and the memory reclaim internals which
strongly suggests that a helper should be implemented in the memory
subsystem proper.

Most callers, I could find, have been converted to use the helper
instead.  This is patch 6.  There are some more relying on __GFP_REPEAT
in the networking stack which I have converted as well and Eric Dumazet
was not opposed [2] to convert them as well.

[1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com

This patch (of 9):

Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code.  Yet we do not have any common helper
for that and so users have invented their own helpers.  Some of them are
really creative when doing so.  Let's just add kv[mz]alloc and make sure
it is implemented properly.  This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures.  This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.

This patch also changes some existing users and removes helpers which
are specific for them.  In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
require GFP_NO{FS,IO} context which is not vmalloc compatible in general
(note that the page table allocation is GFP_KERNEL).  Those need to be
fixed separately.

While we are at it, document that __vmalloc{_node} about unsupported gfp
mask because there seems to be a lot of confusion out there.
kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
superset) flags to catch new abusers.  Existing ones would have to die
slowly.

[sfr@canb.auug.org.au: f2fs fixup]
  Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>	[ext4 part]
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a7c3e901

04 5月, 2017 3 次提交

f2fs: Make flush bios explicitely sync · 3adc5fcb

由 Jan Kara 提交于 5月 02, 2017

Commit b685d3d6 "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_{FUA|PREFLUSH|...}
definitions.  generic_make_request_checks() however strips REQ_FUA and
REQ_PREFLUSH flags from a bio when the storage doesn't report volatile
write cache and thus write effectively becomes asynchronous which can
lead to performance regressions.

Fix the problem by making sure all bios which are synchronous are
properly marked with REQ_SYNC.

Fixes: b685d3d6
Cc: stable@vger.kernel.org # 4.9+
CC: Jaegeuk Kim <jaegeuk@kernel.org>
CC: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: NJan Kara <jack@suse.cz>
Acked-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

3adc5fcb

f2fs: flush dirty nats periodically · 1c0f4bf5

由 Jaegeuk Kim 提交于 5月 01, 2017

This patch flushes dirty nats in order to acquire available nids by writing
checkpoint. Otherwise, we can have no chance to get freed nids.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

1c0f4bf5

f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard · 1f43e2ad

由 Chao Yu 提交于 4月 28, 2017

Introduce CP_TRIMMED_FLAG to indicate all invalid block were trimmed
before umount, so once we do mount with image which contain the flag,
we don't record invalid blocks as undiscard one, when fstrim is being
triggered, we can avoid issuing redundant discard commands.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

1f43e2ad

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功