提交 · 325c4ef5c4b17372c3222d896040d7848e67fbdb · OpenHarmony / kernel_linux

12 9月, 2013 2 次提交

mm/writeback: make writeback_inodes_wb static · 7d9f073b

由 Wanpeng Li 提交于 9月 11, 2013

It's not used globally and could be static.
Signed-off-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7d9f073b

writeback: fix occasional slow sync(1) · 47df3dde

由 Jan Kara 提交于 9月 11, 2013

In case when system contains no dirty pages, wakeup_flusher_threads() will
submit WB_SYNC_NONE writeback for 0 pages so wb_writeback() exits
immediately without doing anything, even though there are dirty inodes in
the system.  Thus sync(1) will write all the dirty inodes from a
WB_SYNC_ALL writeback pass which is slow.

Fix the problem by using get_nr_dirty_pages() in wakeup_flusher_threads()
instead of calculating number of dirty pages manually.  That function also
takes number of dirty inodes into account.
Signed-off-by: NJan Kara <jack@suse.cz>
Reported-by: NPaul Taysom <taysom@chromium.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

47df3dde

10 7月, 2013 2 次提交

mm/writeback: don't check force_wait to handle bdi->work_list · 25d130ba

由 Wanpeng Li 提交于 7月 08, 2013

After commit 839a8e86 ("writeback: replace custom worker pool
implementation with unbound workqueue"), bdi_writeback_workfn runs off
bdi_writeback->dwork, on each execution, it processes bdi->work_list and
reschedules if there are more things to do instead of flush any work
that race with us existing.  It is unecessary to check force_wait in
wb_do_writeback since it is always 0 after the mentioned commit.  This
patch remove the force_wait in wb_do_writeback.
Signed-off-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
Reviewed-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

25d130ba

fs/fs-writeback.c: : make wb_do_writeback() as static · 12057841

由 Haicheng Li 提交于 7月 08, 2013

It's not used globally and could be static.
Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12057841

03 7月, 2013 1 次提交

sync: don't block the flusher thread waiting on IO · 7747bd4b

由 Dave Chinner 提交于 7月 02, 2013

When sync does it's WB_SYNC_ALL writeback, it issues data Io and
then immediately waits for IO completion. This is done in the
context of the flusher thread, and hence completely ties up the
flusher thread for the backing device until all the dirty inodes
have been synced. On filesystems that are dirtying inodes constantly
and quickly, this means the flusher thread can be tied up for
minutes per sync call and hence badly affect system level write IO
performance as the page cache cannot be cleaned quickly.

We already have a wait loop for IO completion for sync(2), so cut
this out of the flusher thread and delegate it to wait_sb_inodes().
Hence we can do rapid IO submission, and then wait for it all to
complete.

Effect of sync on fsmark before the patch:

FSUse%        Count         Size    Files/sec     App Overhead
.....
     0       640000         4096      35154.6          1026984
     0       720000         4096      36740.3          1023844
     0       800000         4096      36184.6           916599
     0       880000         4096       1282.7          1054367
     0       960000         4096       3951.3           918773
     0      1040000         4096      40646.2           996448
     0      1120000         4096      43610.1           895647
     0      1200000         4096      40333.1           921048

And a single sync pass took:

  real    0m52.407s
  user    0m0.000s
  sys     0m0.090s

After the patch, there is no impact on fsmark results, and each
individual sync(2) operation run concurrently with the same fsmark
workload takes roughly 7s:

  real    0m6.930s
  user    0m0.000s
  sys     0m0.039s

IOWs, sync is 7-8x faster on a busy filesystem and does not have an
adverse impact on ongoing async data write operations.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7747bd4b

01 5月, 2013 1 次提交

writeback: set worker desc to identify writeback workers in task dumps · ef3b1019

由 Tejun Heo 提交于 4月 30, 2013

Writeback has been recently converted to use workqueue instead of its
private thread pool implementation.  One negative side effect of this
conversion is that there's no easy to tell which backing device a
writeback work item was working on at the time of task dump, be it
sysrq-t, BUG, WARN or whatever, which, according to our writeback
brethren, is important in tracking down issues with a lot of mounted
file systems on a lot of different devices.

This patch restores that information using the new worker description
facility.  bdi_writeback_workfn() calls set_work_desc() to identify
which bdi it's working on.  The description is printed out together with
the worqueue name and worker function as in the following example dump.

 WARNING: at fs/fs-writeback.c:1015 bdi_writeback_workfn+0x2b4/0x3c0()
 Modules linked in:
 Pid: 28, comm: kworker/u18:0 Not tainted 3.9.0-rc1-work+ #24 empty empty/S3992
 Workqueue: writeback bdi_writeback_workfn (flush-8:16)
  ffffffff820a3a98 ffff88015b927cb8 ffffffff81c61855 ffff88015b927cf8
  ffffffff8108f500 0000000000000000 ffff88007a171948 ffff88007a1716b0
  ffff88015b49df00 ffff88015b8d3940 0000000000000000 ffff88015b927d08
 Call Trace:
  [<ffffffff81c61855>] dump_stack+0x19/0x1b
  [<ffffffff8108f500>] warn_slowpath_common+0x70/0xa0
  [<ffffffff8108f54a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff81200144>] bdi_writeback_workfn+0x2b4/0x3c0
  [<ffffffff810b4c87>] process_one_work+0x1d7/0x660
  [<ffffffff810b5c72>] worker_thread+0x122/0x380
  [<ffffffff810bdfea>] kthread+0xea/0xf0
  [<ffffffff81c6cedc>] ret_from_fork+0x7c/0xb0
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef3b1019

02 4月, 2013 1 次提交

writeback: replace custom worker pool implementation with unbound workqueue · 839a8e86

由 Tejun Heo 提交于 4月 01, 2013

Writeback implements its own worker pool - each bdi can be associated
with a worker thread which is created and destroyed dynamically.  The
worker thread for the default bdi is always present and serves as the
"forker" thread which forks off worker threads for other bdis.

there's no reason for writeback to implement its own worker pool when
using unbound workqueue instead is much simpler and more efficient.
This patch replaces custom worker pool implementation in writeback
with an unbound workqueue.

The conversion isn't too complicated but the followings are worth
mentioning.

* bdi_writeback->last_active, task and wakeup_timer are removed.
  delayed_work ->dwork is added instead.  Explicit timer handling is
  no longer necessary.  Everything works by either queueing / modding
  / flushing / canceling the delayed_work item.

* bdi_writeback_thread() becomes bdi_writeback_workfn() which runs off
  bdi_writeback->dwork.  On each execution, it processes
  bdi->work_list and reschedules itself if there are more things to
  do.

  The function also handles low-mem condition, which used to be
  handled by the forker thread.  If the function is running off a
  rescuer thread, it only writes out limited number of pages so that
  the rescuer can serve other bdis too.  This preserves the flusher
  creation failure behavior of the forker thread.

* INIT_LIST_HEAD(&bdi->bdi_list) is used to tell
  bdi_writeback_workfn() about on-going bdi unregistration so that it
  always drains work_list even if it's running off the rescuer.  Note
  that the original code was broken in this regard.  Under memory
  pressure, a bdi could finish unregistration with non-empty
  work_list.

* The default bdi is no longer special.  It now is treated the same as
  any other bdi and bdi_cap_flush_forker() is removed.

* BDI_pending is no longer used.  Removed.

* Some tracepoints become non-applicable.  The following TPs are
  removed - writeback_nothread, writeback_wake_thread,
  writeback_wake_forker_thread, writeback_thread_start,
  writeback_thread_stop.

Everything, including devices coming and going away and rescuer
operation under simulated memory pressure, seems to work fine in my
test setup.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>

839a8e86

14 1月, 2013 1 次提交

writeback: add more tracepoints · 9fb0a7da

由 Tejun Heo 提交于 1月 11, 2013

Add tracepoints for page dirtying, writeback_single_inode start, inode
dirtying and writeback.  For the latter two inode events, a pair of
events are defined to denote start and end of the operations (the
starting one has _start suffix and the one w/o suffix happens after
the operation is complete).  These inode ops are FS specific and can
be non-trivial and having enclosing tracepoints is useful for external
tracers.

This is part of tracepoint additions to improve visiblity into
dirtying / writeback operations for io tracer and userland.

v2: writeback_dirty_inode[_start] TPs may be called for files on
    pseudo FSes w/ unregistered bdi.  Check whether bdi->dev is %NULL
    before dereferencing.

v3: buffer dirtying moved to a block TP.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9fb0a7da

12 1月, 2013 1 次提交

vfs: re-implement writeback_inodes_sb(_nr)_if_idle() and rename them · 10ee27a0

由 Miao Xie 提交于 1月 10, 2013

writeback_inodes_sb(_nr)_if_idle() is re-implemented by replacing down_read()
with down_read_trylock() because

- If ->s_umount is write locked, then the sb is not idle. That is
  writeback_inodes_sb(_nr)_if_idle() needn't wait for the lock.

- writeback_inodes_sb(_nr)_if_idle() grabs s_umount lock when it want to start
  writeback, it may bring us deadlock problem when doing umount. In order to
  fix the problem, ext4 and btrfs implemented their own writeback functions
  instead of writeback_inodes_sb(_nr)_if_idle(), but it introduced the redundant
  code, it is better to implement a new writeback_inodes_sb(_nr)_if_idle().

The name of these two functions is cumbersome, so rename them to
try_to_writeback_inodes_sb(_nr).

This idea came from Christoph Hellwig.
Some code is from the patch of Kamal Mostafa.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>

10ee27a0

13 12月, 2012 1 次提交

writeback: fix a typo in comment · 5aaea51d

由 Yan Hong 提交于 12月 12, 2012

Signed-off-by: NYan Hong <clouds.yan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5aaea51d

27 11月, 2012 1 次提交

writeback: put unused inodes to LRU after writeback completion · 4eff96dd

由 Jan Kara 提交于 11月 26, 2012

Commit 169ebd90 ("writeback: Avoid iput() from flusher thread")
removed iget-iput pair from inode writeback.  As a side effect, inodes
that are dirty during iput_final() call won't be ever added to inode LRU
(iput_final() doesn't add dirty inodes to LRU and later when the inode
is cleaned there's noone to add the inode there).  Thus inodes are
effectively unreclaimable until someone looks them up again.

The practical effect of this bug is limited by the fact that inodes are
pinned by a dentry for long enough that the inode gets cleaned.  But
still the bug can have nasty consequences leading up to OOM conditions
under certain circumstances.  Following can easily reproduce the
problem:

  for (( i = 0; i < 1000; i++ )); do
    mkdir $i
    for (( j = 0; j < 1000; j++ )); do
      touch $i/$j
      echo 2 > /proc/sys/vm/drop_caches
    done
  done

then one needs to run 'sync; ls -lR' to make inodes reclaimable again.

We fix the issue by inserting unused clean inodes into the LRU after
writeback finishes in inode_sync_complete().
Signed-off-by: NJan Kara <jack@suse.cz>
Reported-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: <stable@vger.kernel.org>		[3.5+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4eff96dd

09 10月, 2012 1 次提交

fs/fs-writeback.c: remove unneccesary parameter of __writeback_single_inode() · cd8ed2a4

由 Yan Hong 提交于 10月 08, 2012

The parameter 'wb' is never used in this function.
Signed-off-by: NYan Hong <clouds.yan@gmail.com>
Acked-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cd8ed2a4

21 9月, 2012 1 次提交

fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldoc · 7dfd8cc5

由 Liu Bo 提交于 9月 20, 2012

Argument @only_this_sb has been removed.
Signed-off-by: NLiu Bo <liub.liubo@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

7dfd8cc5

20 9月, 2012 1 次提交

ext4: fix potential deadlock in ext4_nonda_switch() · 00d4e736

由 Theodore Ts'o 提交于 9月 19, 2012

In ext4_nonda_switch(), if the file system is getting full we used to
call writeback_inodes_sb_if_idle().  The problem is that we can be
holding i_mutex already, and this causes a potential deadlock when
writeback_inodes_sb_if_idle() when it tries to take s_umount.  (See
lockdep output below).

As it turns out we don't need need to hold s_umount; the fact that we
are in the middle of the write(2) system call will keep the superblock
pinned.  Unfortunately writeback_inodes_sb() checks to make sure
s_umount is taken, and the VFS uses a different mechanism for making
sure the file system doesn't get unmounted out from under us.  The
simplest way of dealing with this is to just simply grab s_umount
using a trylock, and skip kicking the writeback flusher thread in the
very unlikely case that we can't take a read lock on s_umount without
blocking.

Also, we now check the cirteria for kicking the writeback thread
before we decide to whether to fall back to non-delayed writeback, so
if there are any outstanding delayed allocation writes, we try to get
them resolved as soon as possible.

   [ INFO: possible circular locking dependency detected ]
   3.6.0-rc1-00042-gce894ca #367 Not tainted
   -------------------------------------------------------
   dd/8298 is trying to acquire lock:
    (&type->s_umount_key#18){++++..}, at: [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46

   but task is already holding lock:
    (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3

   which lock already depends on the new lock.

   2 locks held by dd/8298:
    #0:  (sb_writers#2){.+.+.+}, at: [<c01ddcc5>] generic_file_aio_write+0x56/0xd3
    #1:  (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3

   stack backtrace:
   Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367
   Call Trace:
    [<c015b79c>] ? console_unlock+0x345/0x372
    [<c06d62a1>] print_circular_bug+0x190/0x19d
    [<c019906c>] __lock_acquire+0x86d/0xb6c
    [<c01999db>] ? mark_held_locks+0x5c/0x7b
    [<c0199724>] lock_acquire+0x66/0xb9
    [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46
    [<c06db935>] down_read+0x28/0x58
    [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46
    [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46
    [<c026f3b2>] ext4_nonda_switch+0xe1/0xf4
    [<c0271ece>] ext4_da_write_begin+0x27/0x193
    [<c01dcdb0>] generic_file_buffered_write+0xc8/0x1bb
    [<c01ddc47>] __generic_file_aio_write+0x1dd/0x205
    [<c01ddce7>] generic_file_aio_write+0x78/0xd3
    [<c026d336>] ext4_file_write+0x480/0x4a6
    [<c0198c1d>] ? __lock_acquire+0x41e/0xb6c
    [<c0180944>] ? sched_clock_cpu+0x11a/0x13e
    [<c01967e9>] ? trace_hardirqs_off+0xb/0xd
    [<c018099f>] ? local_clock+0x37/0x4e
    [<c0209f2c>] do_sync_write+0x67/0x9d
    [<c0209ec5>] ? wait_on_retry_sync_kiocb+0x44/0x44
    [<c020a7b9>] vfs_write+0x7b/0xe6
    [<c020a9a6>] sys_write+0x3b/0x64
    [<c06dd4bd>] syscall_call+0x7/0xb
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

00d4e736

11 9月, 2012 1 次提交

writeback: correct comment for move_expired_inodes() · 0e2f2b23

由 Wang Sheng-Hui 提交于 9月 11, 2012

The function scans @delaying_queue and stops at the first inode
whose dirtied_when is after *work->older_than_this. So the expired
ones being moved are those before *work->older_than_this. Correct
the comment here.
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>

0e2f2b23

01 8月, 2012 1 次提交

mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threads · 3965c9ae

由 Wanpeng Li 提交于 7月 31, 2012

Since per-BDI flusher threads were introduced in 2.6, the pdflush
mechanism is not used any more.  But the old interface exported through
/proc/sys/vm/nr_pdflush_threads still exists and is obviously useless.

For back-compatibility, printk warning information and return 2 to notify
the users that the interface is removed.
Signed-off-by: NWanpeng Li <liwp@linux.vnet.ibm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3965c9ae

23 7月, 2012 1 次提交

vfs: Move noop_backing_dev_info check from sync into writeback · 6eedc701

由 Jan Kara 提交于 7月 03, 2012

In principle, a filesystem may want to have ->sync_fs() called during sync(1)
although it does not have a bdi (i.e. s_bdi is set to noop_backing_dev_info).
Only writeback code really needs bdi set to something reasonable. So move the
checks where they are more logical.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6eedc701

09 6月, 2012 2 次提交

writeback: Fix some comment errors · 331cbdee

由 Wanpeng Li 提交于 6月 09, 2012

Signed-off-by: NWanpeng Li <liwp@linux.vnet.ibm.com>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>

331cbdee

writeback: Fix lock imbalance in writeback_sb_inodes() · ead188f9

由 Jan Kara 提交于 6月 08, 2012

Fix bug introduced by 169ebd90.  We have to have wb_list_lock locked when
restarting writeback loop after having waited for inode writeback.

Bug description by Ted Tso:

  I can reproduce this fairly easily by using ext4 w/o a journal, running
  under KVM with 1024megs memory, with fsstress (xfstests #13):

  [   45.153294] =====================================
  [   45.154784] [ BUG: bad unlock balance detected! ]
  [   45.155591] 3.5.0-rc1-00002-gb22b1f17 #124 Not tainted
  [   45.155591] -------------------------------------
  [   45.155591] flush-254:16/2499 is trying to release lock (&(&wb->list_lock)->rlock) at:
  [   45.155591] [<c022c3da>] writeback_sb_inodes+0x160/0x327
  [   45.155591] but there are no more locks to release!
Reported-by: NTheodore Ts'o <tytso@mit.edu>
Tested-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>

ead188f9

06 5月, 2012 7 次提交

writeback: Avoid iput() from flusher thread · 169ebd90

由 Jan Kara 提交于 5月 03, 2012

Doing iput() from flusher thread (writeback_sb_inodes()) can create problems
because iput() can do a lot of work - for example truncate the inode if it's
the last iput on unlinked file. Some filesystems depend on flusher thread
progressing (e.g. because they need to flush delay allocated blocks to reduce
allocation uncertainty) and so flusher thread doing truncate creates
interesting dependencies and possibilities for deadlocks.

We get rid of iput() in flusher thread by using the fact that I_SYNC inode
flag effectively pins the inode in memory. So if we take care to either hold
i_lock or have I_SYNC set, we can get away without taking inode reference
in writeback_sb_inodes().

As a side effect of these changes, we also fix possible use-after-free in
wb_writeback() because inode_wait_for_writeback() call could try to reacquire
i_lock on the inode that was already free.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>

169ebd90

writeback: Refactor writeback_single_inode() · 4f8ad655

由 Jan Kara 提交于 5月 03, 2012

The code in writeback_single_inode() is relatively complex. The list requeing
logic makes sense only for flusher thread but not really for sync_inode() or
write_inode_now() callers. Also when we want to get rid of inode references
held by flusher thread, we will need a special I_SYNC handling there.

So separate part of writeback_single_inode() which does the real writeback work
into __writeback_single_inode() and make writeback_single_inode() do only stuff
necessary for callers writing only one inode, moving the special list handling
into writeback_sb_inodes(). As a sideeffect this fixes a possible race where we
could skip some inode during sync(2) because other writer refiled it from b_io
to b_dirty list. Also I_SYNC handling is moved into the callers of
__writeback_single_inode() to make locking easier.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>

4f8ad655

writeback: Remove wb->list_lock from writeback_single_inode() · f0d07b7f

由 Jan Kara 提交于 5月 03, 2012

writeback_single_inode() doesn't need wb->list_lock for anything on entry now.
So remove the requirement. This makes locking of writeback_single_inode()
temporarily awkward (entering with i_lock, returning with i_lock and
wb->list_lock) but it will be sanitized in the next patch.

Also inode_wait_for_writeback() doesn't need wb->list_lock for anything. It was
just taking it to make usage convenient for callers but with
writeback_single_inode() changing it's not very convenient anymore. So remove
the lock from that function.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>

f0d07b7f

writeback: Separate inode requeueing after writeback · ccb26b5a

由 Jan Kara 提交于 5月 03, 2012

Move inode requeueing after inode has been written out into a separate
function.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>

ccb26b5a

writeback: Move I_DIRTY_PAGES handling · 6290be1c

由 Jan Kara 提交于 5月 03, 2012

Instead of clearing I_DIRTY_PAGES and resetting it when we didn't succeed in
writing them all, just clear the bit only when we succeeded writing all the
pages. We also move the clearing of the bit close to other i_state handling to
separate it from writeback list handling. This is desirable because list
handling will differ for flusher thread and other writeback_single_inode()
callers in future. No filesystem plays any tricks with I_DIRTY_PAGES (like
checking it in ->writepages or ->write_inode implementation) so this movement
is safe.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>

6290be1c

writeback: Move requeueing when I_SYNC set to writeback_sb_inodes() · cc1676d9

由 Jan Kara 提交于 5月 03, 2012

When writeback_single_inode() is called on inode which has I_SYNC already
set while doing WB_SYNC_NONE, inode is moved to b_more_io list. However
this makes sense only if the caller is flusher thread. For other callers of
writeback_single_inode() it doesn't really make sense and may be even wrong
- flusher thread may be doing WB_SYNC_ALL writeback in parallel.

So we move requeueing from writeback_single_inode() to writeback_sb_inodes().
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>

cc1676d9

writeback: Move clearing of I_SYNC into inode_sync_complete() · 365b94ae

由 Jan Kara 提交于 5月 03, 2012

Move clearing of I_SYNC into inode_sync_complete(). It is more logical to have
clearing of I_SYNC bit and waking of waiters in one place. Also later we will
have two places needing to clear I_SYNC and wake up waiters so this allows them
to use the common helper. Moving of I_SYNC clearing to a later stage of
writeback_single_inode() is safe since we hold i_lock all the time.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>

365b94ae

21 3月, 2012 2 次提交

writeback: Remove outdated comment · 697e6fed

由 Jan Kara 提交于 3月 09, 2012

The comment is hopelessly outdated and misplaced. We no longer have 'bdi'
part of writeback work, the comment about blockdev super is outdated,
comment about throttling as well. Information about list handling is in
more detail at queue_io(). So just move the bit about older_than_this to
close to move_expired_inodes() and remove the rest.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>

697e6fed

fs: Remove bogus wait in write_inode_now() · f469ec9c

由 Jan Kara 提交于 3月 09, 2012

inode_sync_wait() in write_inode_now() is just bogus. That function waits for
I_SYNC bit to be cleared but writeback_single_inode() clears the bit on return
so the wait is effectivelly a nop unless someone else submits the inode for
writeback again. All the waiting write_inode_now() needs is achieved by using
WB_SYNC_ALL writeback mode.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>

f469ec9c

07 3月, 2012 1 次提交
- F
  writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header · c097b2ca
  由 Fengguang Wu 提交于 3月 05, 2012
```
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
```
  c097b2ca
29 2月, 2012 1 次提交

fs: reduce the use of module.h wherever possible · 630d9c47

由 Paul Gortmaker 提交于 11月 16, 2011

For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include.  Fix up any implicit
include dependencies that were being masked by module.h along
the way.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

630d9c47

01 2月, 2012 1 次提交

writeback: fix NULL bdi->dev in trace writeback_single_inode · 15eb77a0

由 Wu Fengguang 提交于 1月 17, 2012

bdi_prune_sb() resets sb->s_bdi to default_backing_dev_info when the
tearing down the original bdi. Fix trace_writeback_single_inode to
use sb->s_bdi=default_backing_dev_info rather than bdi->dev=NULL for a
teared down bdi.

Cc: <stable@kernel.org>
Reported-by: NRabin Vincent <rabin@rab.in>
Tested-by: NRabin Vincent <rabin@rab.in>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>

15eb77a0

08 1月, 2012 1 次提交

writeback: move MIN_WRITEBACK_PAGES to fs-writeback.c · bc31b86a

由 Wu Fengguang 提交于 1月 07, 2012

Fix compile error

 fs/fs-writeback.c:515:33: error: ‘PAGE_CACHE_SHIFT’ undeclared (first use in this function)
Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>

bc31b86a

04 1月, 2012 1 次提交

fs: move code out of buffer.c · ff01bb48

由 Al Viro 提交于 9月 16, 2011

Move invalidate_bdev, block_sync_page into fs/block_dev.c.  Export
kill_bdev as well, so brd doesn't have to open code it.  Reduce
buffer_head.h requirement accordingly.

Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving.  The small comment replacing it says enough.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ff01bb48

18 12月, 2011 2 次提交

writeback: Include all dirty inodes in background writeback · 1bc36b64

由 Jan Kara 提交于 10月 19, 2011

Current livelock avoidance code makes background work to include only inodes
that were dirtied before background writeback has started. However background
writeback can be running for a long time and thus excluding newly dirtied
inodes can eventually exclude significant portion of dirty inodes making
background writeback inefficient. Since background writeback avoids livelocking
the flusher thread by yielding to any other work, there is no real reason why
background work should not include all dirty inodes so change the logic in
wb_writeback().
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>

1bc36b64

writeback: show writeback reason with __print_symbolic · b3bba872

由 Wu Fengguang 提交于 12月 08, 2011

This makes the binary trace understandable by trace-cmd.

CC: Dave Chinner <david@fromorbit.com>
CC: Curt Wohlgemuth <curtw@google.com>
CC: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>

b3bba872

29 11月, 2011 1 次提交

writeback: Fix issue on make htmldocs · 786228ab

由 Marcos Paulo de Souza 提交于 11月 23, 2011

Document the @reason parameter to make "make htmldocs" happy.
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NMarcos Paulo de Souza <marcos.mage@gmail.com>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>

786228ab

22 11月, 2011 1 次提交

freezer: implement and use kthread_freezable_should_stop() · 8a32c441

由 Tejun Heo 提交于 11月 21, 2011

Writeback and thinkpad_acpi have been using thaw_process() to prevent
deadlock between the freezer and kthread_stop(); unfortunately, this
is inherently racy - nothing prevents freezing from happening between
thaw_process() and kthread_stop().

This patch implements kthread_freezable_should_stop() which enters
refrigerator if necessary but is guaranteed to return if
kthread_stop() is invoked.  Both thaw_process() users are converted to
use the new function.

Note that this deadlock condition exists for many of freezable
kthreads.  They need to be converted to use the new should_stop or
freezable workqueue.

Tested with synthetic test case.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NHenrique de Moraes Holschuh <ibm-acpi@hmh.eng.br>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>

8a32c441

31 10月, 2011 2 次提交

writeback: Add a 'reason' to wb_writeback_work · 0e175a18

由 Curt Wohlgemuth 提交于 10月 07, 2011

This creates a new 'reason' field in a wb_writeback_work
structure, which unambiguously identifies who initiates
writeback activity.  A 'wb_reason' enumeration has been
added to writeback.h, to enumerate the possible reasons.

The 'writeback_work_class' and tracepoint event class and
'writeback_queue_io' tracepoints are updated to include the
symbolic 'reason' in all trace events.

And the 'writeback_inodes_sbXXX' family of routines has had
a wb_stats parameter added to them, so callers can specify
why writeback is being started.
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>

0e175a18

writeback: send work item to queue_io, move_expired_inodes · ad4e38dd

由 Curt Wohlgemuth 提交于 10月 07, 2011

Instead of sending ->older_than_this to queue_io() and
move_expired_inodes(), send the entire wb_writeback_work
structure.  There are other fields of a work item that are
useful in these routines and in tracepoints.
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>

ad4e38dd

03 10月, 2011 1 次提交

writeback: per-bdi background threshold · b00949aa

由 Wu Fengguang 提交于 11月 18, 2010

One thing puzzled me is that in JBOD case, the per-disk writeout
performance is smaller than the corresponding single-disk case even
when they have comparable bdi_thresh. Tracing shows find that in single
disk case, bdi_writeback is always kept high while in JBOD case, it
could drop low from time to time and correspondingly bdi_reclaimable
could sometimes rush high.

The fix is to watch bdi_reclaimable and kick background writeback as
soon as it goes high. This resembles the global background threshold
but in per-bdi manner. The trick is, as long as bdi_reclaimable does
not go high, bdi_writeback naturally won't go low because
bdi_reclaimable+bdi_writeback ~= bdi_thresh.

With less fluctuated writeback pages, JBOD performance is observed to
increase noticeably in various cases.

vmstat:nr_written values before/after patch:

3.1.0-rc4-wo-underrun+ 3.1.0-rc4-bgthresh3+
------------------------ ------------------------
125596480 +25.9% 158179363 JBOD-10HDD-16G/ext4-100dd-1M-24p-16384M-20:10-X
61790815 +110.4% 130032231 JBOD-10HDD-16G/ext4-10dd-1M-24p-16384M-20:10-X
58853546 -0.1% 58823828 JBOD-10HDD-16G/ext4-1dd-1M-24p-16384M-20:10-X
110159811 +24.7% 137355377 JBOD-10HDD-16G/xfs-100dd-1M-24p-16384M-20:10-X
69544762 +10.8% 77080047 JBOD-10HDD-16G/xfs-10dd-1M-24p-16384M-20:10-X
50644862 +0.5% 50890006 JBOD-10HDD-16G/xfs-1dd-1M-24p-16384M-20:10-X
42677090 +28.0% 54643527 JBOD-10HDD-thresh=100M/ext4-100dd-1M-24p-16384M-100M:10-X
47491324 +13.3% 53785605 JBOD-10HDD-thresh=100M/ext4-10dd-1M-24p-16384M-100M:10-X
52548986 +0.9% 53001031 JBOD-10HDD-thresh=100M/ext4-1dd-1M-24p-16384M-100M:10-X
26783091 +36.8% 36650248 JBOD-10HDD-thresh=100M/xfs-100dd-1M-24p-16384M-100M:10-X
35526347 +14.0% 40492312 JBOD-10HDD-thresh=100M/xfs-10dd-1M-24p-16384M-100M:10-X
44670723 -1.1% 44177606 JBOD-10HDD-thresh=100M/xfs-1dd-1M-24p-16384M-100M:10-X
127996037 +22.4% 156719990 JBOD-10HDD-thresh=2G/ext4-100dd-1M-24p-16384M-2048M:10-X
57518856 +3.8% 59677625 JBOD-10HDD-thresh=2G/ext4-10dd-1M-24p-16384M-2048M:10-X
51919909 +12.2% 58269894 JBOD-10HDD-thresh=2G/ext4-1dd-1M-24p-16384M-2048M:10-X
86410514 +79.0% 154660433 JBOD-10HDD-thresh=2G/xfs-100dd-1M-24p-16384M-2048M:10-X
40132519 +38.6% 55617893 JBOD-10HDD-thresh=2G/xfs-10dd-1M-24p-16384M-2048M:10-X
48423248 +7.5% 52042927 JBOD-10HDD-thresh=2G/xfs-1dd-1M-24p-16384M-2048M:10-X
206041046 +44.1% 296846536 JBOD-10HDD-thresh=4G/xfs-100dd-1M-24p-16384M-4096M:10-X
72312903 -19.4% 58272885 JBOD-10HDD-thresh=4G/xfs-10dd-1M-24p-16384M-4096M:10-X
50635672 -0.5% 50384787 JBOD-10HDD-thresh=4G/xfs-1dd-1M-24p-16384M-4096M:10-X
68308534 +115.7% 147324758 JBOD-10HDD-thresh=800M/ext4-100dd-1M-24p-16384M-800M:10-X
57882933 +14.5% 66269621 JBOD-10HDD-thresh=800M/ext4-10dd-1M-24p-16384M-800M:10-X
52183472 +12.8% 58855181 JBOD-10HDD-thresh=800M/ext4-1dd-1M-24p-16384M-800M:10-X
53788956 +94.2% 104460352 JBOD-10HDD-thresh=800M/xfs-100dd-1M-24p-16384M-800M:10-X
44493342 +35.5% 60298210 JBOD-10HDD-thresh=800M/xfs-10dd-1M-24p-16384M-800M:10-X
42641209 +18.9% 50681038 JBOD-10HDD-thresh=800M/xfs-1dd-1M-24p-16384M-800M:10-X
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>

b00949aa

OpenHarmony / kernel_linux 上一次同步 接近 4 年

OpenHarmony / kernel_linux
上一次同步接近 4 年