提交 · e913fc825dc685a444cb4c1d0f9d32f372f59861 · openanolis / cloud-kernel

17 5月, 2010 2 次提交

writeback: fix WB_SYNC_NONE writeback from umount · e913fc82

由 Jens Axboe 提交于 5月 17, 2010

When umount calls sync_filesystem(), we first do a WB_SYNC_NONE
writeback to kick off writeback of pending dirty inodes, then follow
that up with a WB_SYNC_ALL to wait for it. Since umount already holds
the sb s_umount mutex, WB_SYNC_NONE ends up doing nothing and all
writeback happens as WB_SYNC_ALL. This can greatly slow down umount,
since WB_SYNC_ALL writeback is a data integrity operation and thus
a bigger hammer than simple WB_SYNC_NONE. For barrier aware file systems
it's a lot slower.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

e913fc82

writeback: disable periodic old data writeback for !dirty_writeback_centisecs · 69b62d01

由 Jens Axboe 提交于 5月 17, 2010

Prior to 2.6.32, setting /proc/sys/vm/dirty_writeback_centisecs disabled
periodic dirty writeback from kupdate. This got broken and now causes
excessive sys CPU usage if set to zero, as we'll keep beating on
schedule().

Cc: stable@kernel.org
Reported-by: NJustin Maggard <jmaggard10@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

69b62d01

30 3月, 2010 1 次提交

include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6

由 Tejun Heo 提交于 3月 24, 2010

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

5a0e3ad6

12 3月, 2010 1 次提交

vfs: improve writeback_inodes_wb() · f11c9c5c

由 Edward Shishkin 提交于 3月 11, 2010

Do not pin/unpin superblock for every inode in writeback_inodes_wb(), pin
it for the whole group of inodes which belong to the same superblock and
call writeback_sb_inodes() handler for them.
Signed-off-by: NEdward Shishkin <edward.shishkin@gmail.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f11c9c5c

06 3月, 2010 2 次提交

pass writeback_control to ->write_inode · a9185b41

由 Christoph Hellwig 提交于 3月 05, 2010

This gives the filesystem more information about the writeback that
is happening.  Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a9185b41

make sure data is on disk before calling ->write_inode · 26821ed4

由 Christoph Hellwig 提交于 3月 05, 2010

Similar to the fsync issue fixed a while ago in commit
2daea67e we need to write for data to
actually hit the disk before writing out the metadata to guarantee
data integrity for filesystems that modify the inode in the data I/O
completion path.  Currently XFS and NFS handle this manually, and AFS
has a write_inode method that does nothing but waiting for data, while
others are possibly missing out on this.

Fortunately this change has a lot less impact than the fsync change
as none of the write_inode methods starts data writeout of any form
by itself.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

26821ed4

03 1月, 2010 1 次提交

writeback: add missing kernel-doc notation · 4b6764fa

由 Jaswinder Singh Rajput 提交于 1月 01, 2010

Fix the following htmldocs warning:

  Warning(fs/fs-writeback.c:255): No description found for parameter 'sb'
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b6764fa

23 12月, 2009 1 次提交

fs-writeback: Add helper function to start writeback if idle · 17bd55d0

由 Eric Sandeen 提交于 12月 23, 2009

ext4, at least, would like to start pushing on writeback if it starts
to get close to ENOSPC when reserving worst-case blocks for delalloc
writes.  Writing out delalloc data will convert those worst-case
predictions into usually smaller actual usage, freeing up space
before we hit ENOSPC based on this speculation.

Thanks to Jens for the suggestion for the helper function,
& the naming help.

I've made the helper return status on whether writeback was
started even though I don't plan to use it in the ext4 patch;
it seems like it would be potentially useful to test this
in some cases.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Acked-by: NJan Kara <jack@suse.cz>

17bd55d0

03 12月, 2009 3 次提交

writeback: remove unused nonblocking and congestion checks · 0d99519e

由 Wu Fengguang 提交于 12月 03, 2009

- no one is calling wb_writeback and write_cache_pages with
  wbc.nonblocking=1 any more
- lumpy pageout will want to do nonblocking writeback without the
  congestion wait

So remove the congestion checks as suggested by Chris.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Evgeniy Polyakov <zbr@ioremap.net>
Cc: Alex Elder <aelder@sgi.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

0d99519e

writeback: introduce wbc.for_background · b17621fe

由 Wu Fengguang 提交于 12月 03, 2009

It will lower the flush priority for NFS, and maybe more in future.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b17621fe

writeback: remove the always false bdi_cap_writeback_dirty() test · 951c30d1

由 Wu Fengguang 提交于 12月 03, 2009

This is dead code because no bdi flush thread will be started for
!bdi_cap_writeback_dirty bdi.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

951c30d1

26 9月, 2009 12 次提交

writeback: pass in super_block to bdi_start_writeback() · a72bfd4d

由 Jens Axboe 提交于 9月 26, 2009

Sometimes we only want to write pages from a specific super_block,
so allow that to be passed in.

This fixes a problem with commit 56a131dc
causing writeback on all super_blocks on a bdi, where we only really
want to sync a specific sb from writeback_inodes_sb().
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a72bfd4d

writeback: writeback_inodes_sb() should use bdi_start_writeback() · 56a131dc

由 Jens Axboe 提交于 9月 25, 2009

Pointless to iterate other devices looking for a super, when
we have a bdi mapping.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

56a131dc

writeback: don't delay inodes redirtied by a fast dirtier · b3af9468

由 Wu Fengguang 提交于 9月 25, 2009

Debug traces show that in per-bdi writeback, the inode under writeback
almost always get redirtied by a busy dirtier.  We used to call
redirty_tail() in this case, which could delay inode for up to 30s.

This is unacceptable because it now happens so frequently for plain cp/dd,
that the accumulated delays could make writeback of big files very slow.

So let's distinguish between data redirty and metadata only redirty.
The first one is caused by a busy dirtier, while the latter one could
happen in XFS, NFS, etc. when they are doing delalloc or updating isize.

The inode being busy dirtied will now be requeued for next io, while
the inode being redirtied by fs will continue to be delayed to avoid
repeated IO.

CC: Jan Kara <jack@suse.cz>
CC: Theodore Ts'o <tytso@mit.edu>
CC: Dave Chinner <david@fromorbit.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b3af9468

writeback: make the super_block pinning more efficient · 9ecc2738

由 Jens Axboe 提交于 9月 24, 2009

Currently we pin the inode->i_sb for every single inode. This
increases cache traffic on sb->s_umount sem. Lets instead
cache the inode sb pin state and keep the super_block pinned
for as long as keep writing out inodes from the same
super_block.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

9ecc2738

writeback: don't resort for a single super_block in move_expired_inodes() · cf137307

由 Jens Axboe 提交于 9月 24, 2009

If we only moved inodes from a single super_block to the temporary
list, there's no point in doing a resort for multiple super_blocks.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cf137307

writeback: move inodes from one super_block together · 5c03449d

由 Shaohua Li 提交于 9月 24, 2009

__mark_inode_dirty adds inode to wb dirty list in random order. If a disk has
several partitions, writeback might keep spindle moving between partitions.
To reduce the move, better write big chunk of one partition and then move to
another. Inodes from one fs usually are in one partion, so idealy move indoes
from one fs together should reduce spindle move. This patch tries to address
this. Before per-bdi writeback is added, the behavior is write indoes
from one fs first and then another, so the patch restores previous behavior.
The loop in the patch is a bit ugly, should we add a dirty list for each
superblock in bdi_writeback?

Test in a two partition disk with attached fio script shows about 3% ~ 6%
improvement.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5c03449d

J
writeback: get rid to incorrect references to pdflush in comments · 5b0830cb
由 Jens Axboe 提交于 9月 23, 2009
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
5b0830cb

writeback: improve readability of the wb_writeback() continue/break logic · 71fd05a8

由 Jens Axboe 提交于 9月 23, 2009

And throw some comments in there, too.
Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

71fd05a8

writeback: cleanup writeback_single_inode() · ae1b7f7d

由 Wu Fengguang 提交于 9月 23, 2009

Make the if-else straight in writeback_single_inode().
No behavior change.

Cc: Jan Kara <jack@suse.cz>
Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ae1b7f7d

writeback: kupdate writeback shall not stop when more io is possible · 7fbdea32

由 Wu Fengguang 提交于 9月 23, 2009

Fix the kupdate case, which disregards wbc.more_io and stop writeback
prematurely even when there are more inodes to be synced.

wbc.more_io should always be respected.

Also remove the pages_skipped check. It will set when some page(s) of some
inode(s) cannot be written for now. Such inodes will be delayed for a while.
This variable has nothing to do with whether there are other writeable inodes.

CC: Jan Kara <jack@suse.cz>
CC: Dave Chinner <david@fromorbit.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7fbdea32

writeback: stop background writeback when below background threshold · d3ddec76

由 Wu Fengguang 提交于 9月 23, 2009

Treat bdi_start_writeback(0) as a special request to do background write,
and stop such work when we are below the background dirty threshold.

Also simplify the (nr_pages <= 0) checks. Since we already pass in
nr_pages=LONG_MAX for WB_SYNC_ALL and background writes, we don't
need to worry about it being decreased to zero.
Reported-by: NRichard Kennedy <richard@rsk.demon.co.uk>
CC: Jan Kara <jack@suse.cz>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d3ddec76

fs: Fix busyloop in wb_writeback() · a5989bdc

由 Jan Kara 提交于 9月 16, 2009

If all inodes are under writeback (e.g. in case when there's only one inode
with dirty pages), wb_writeback() with WB_SYNC_NONE work basically degrades
to busylooping until I_SYNC flags of the inode is cleared. Fix the problem by
waiting on I_SYNC flags of an inode on b_more_io list in case we failed to
write anything.
Tested-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a5989bdc

16 9月, 2009 12 次提交

writeback: fix possible bdi writeback refcounting problem · 1ef7d9aa

由 Nick Piggin 提交于 9月 15, 2009

wb_clear_pending AFAIKS should not be called after the item has been
put on the list, except by the worker threads. It could lead to the
situation where the refcount is decremented below 0 and cause lots of
problems.

Presumably the !wb_has_dirty_io case is not a common one, so it can
be discovered when the thread wakes up to check?

Also add a comment in bdi_work_clear.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1ef7d9aa

writeback: Fix bdi use after free in wb_work_complete() · 77b9d059

由 Nick Piggin 提交于 9月 15, 2009

By the time bdi_work_on_stack gets evaluated again in bdi_work_free, it
can already have been deallocated and used for something else in the
!on stack case, giving a false positive in this test and causing
corruption.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

77b9d059

writeback: improve scalability of bdi writeback work queues · 77fad5e6

由 Nick Piggin 提交于 9月 15, 2009

If you're going to do an atomic RMW on each list entry, there's not much
point in all the RCU complexities of the list walking. This is only going
to help the multi-thread case I guess, but it doesn't hurt to do now.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

77fad5e6

writeback: remove smp_mb(), it's not needed with list_add_tail_rcu() · deed62ed

由 Nick Piggin 提交于 9月 15, 2009

list_add_tail_rcu contains required barriers.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

deed62ed

J
writeback: use schedule_timeout_interruptible() · 49db0414
由 Jens Axboe 提交于 9月 15, 2009
```
Gets rid of a manual set_current_state().
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
49db0414

writeback: add comments to bdi_work structure · 8010c3b6

由 Jens Axboe 提交于 9月 15, 2009

And document its retriever, get_next_work_item().
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

8010c3b6

writeback: separate starting of sync vs opportunistic writeback · b6e51316

由 Jens Axboe 提交于 9月 16, 2009

bdi_start_writeback() is currently split into two paths, one for
WB_SYNC_NONE and one for WB_SYNC_ALL. Add bdi_sync_writeback()
for WB_SYNC_ALL writeback and let bdi_start_writeback() handle
only WB_SYNC_NONE.

Push down the writeback_control allocation and only accept the
parameters that make sense for each function. This cleans up
the API considerably.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b6e51316

writeback: inline allocation failure handling in bdi_alloc_queue_work() · bcddc3f0

由 Jens Axboe 提交于 9月 13, 2009

This gets rid of work == NULL in bdi_queue_work() and puts the
OOM handling where it belongs.
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

bcddc3f0

writeback: use RCU to protect bdi_list · cfc4ba53

由 Jens Axboe 提交于 9月 14, 2009

Now that bdi_writeback_all() no longer handles integrity writeback,
it doesn't have to block anymore. This means that we can switch
bdi_list reader side protection to RCU.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cfc4ba53

writeback: only use bdi_writeback_all() for WB_SYNC_NONE writeout · f11fcae8

由 Jens Axboe 提交于 9月 15, 2009

Data integrity writeback must use bdi_start_writeback() and ensure
that wbc->sb and wbc->bdi are set.
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f11fcae8

writeback: make wb_writeback() take an argument structure · c4a77a6c

由 Jens Axboe 提交于 9月 16, 2009

We need to be able to pass in range_cyclic as well, so instead
of growing yet another argument, split the arguments into a
struct wb_writeback_args structure that we can use internally.
Also makes it easier to just copy all members to an on-stack
struct, since we can't access work after clearing the pending
bit.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c4a77a6c

writeback: merely wakeup flusher thread if work allocation fails for WB_SYNC_NONE · f0fad8a5

由 Christoph Hellwig 提交于 9月 11, 2009

Since it's an opportunistic writeback and not a data integrity action,
don't punt to blocking writeback. Just wakeup the thread and it will
flush old data.
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f0fad8a5

14 9月, 2009 1 次提交
- J
  vfs: Remove generic_osync_inode() and sync_page_range{_nolock}() · 18f2ee70
  由 Jan Kara 提交于 8月 18, 2009
```
Remove these three functions since nobody uses them anymore.
Signed-off-by: NJan Kara <jack@suse.cz>
```
  18f2ee70
11 9月, 2009 4 次提交

writeback: check for registered bdi in flusher add and inode dirty · 500b067c

由 Jens Axboe 提交于 9月 09, 2009

Also a debugging aid. We want to catch dirty inodes being added to
backing devices that don't do writeback.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

500b067c

J
writeback: get rid of pdflush completely · d0bceac7
由 Jens Axboe 提交于 5月 18, 2009
```
It is now unused, so kill it off.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
d0bceac7

writeback: switch to per-bdi threads for flushing data · 03ba3782

由 Jens Axboe 提交于 9月 09, 2009

This gets rid of pdflush for bdi writeout and kupdated style cleaning.
pdflush writeout suffers from lack of locality and also requires more
threads to handle the same workload, since it has to work in a
non-blocking fashion against each queue. This also introduces lumpy
behaviour and potential request starvation, since pdflush can be starved
for queue access if others are accessing it. A sample ffsb workload that
does random writes to files is about 8% faster here on a simple SATA drive
during the benchmark phase. File layout also seems a LOT more smooth in
vmstat:

r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42
0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44
1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37
0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58
0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34
0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37
0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44
0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38
0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41
0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45

where vanilla tends to fluctuate a lot in the creation phase:

r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36
1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51
0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40
0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37
1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41
0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49
0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36
1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43
0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39
1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45
1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34
0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54

A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
SSD based writeback test on XFS performs over 20% better as well, with
the throughput being very stable around 1GB/sec, where pdflush only
manages 750MB/sec and fluctuates wildly while doing so. Random buffered
writes to many files behave a lot better as well, as does random mmap'ed
writes.

A separate thread is added to sync the super blocks. In the long term,
adding sync_supers_bdi() functionality could get rid of this thread again.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

03ba3782

writeback: move dirty inodes from super_block to backing_dev_info · 66f3b8e2

由 Jens Axboe 提交于 9月 02, 2009

This is a first step at introducing per-bdi flusher threads. We should
have no change in behaviour, although sb_has_dirty_inodes() is now
ridiculously expensive, as there's no easy way to answer that question.
Not a huge problem, since it'll be deleted in subsequent patches.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

66f3b8e2

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功