- 06 3月, 2010 2 次提交
-
-
由 Christoph Hellwig 提交于
This gives the filesystem more information about the writeback that is happening. Trond requested this for the NFS unstable write handling, and other filesystems might benefit from this too by beeing able to distinguish between the different callers in more detail. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Similar to the fsync issue fixed a while ago in commit 2daea67e we need to write for data to actually hit the disk before writing out the metadata to guarantee data integrity for filesystems that modify the inode in the data I/O completion path. Currently XFS and NFS handle this manually, and AFS has a write_inode method that does nothing but waiting for data, while others are possibly missing out on this. Fortunately this change has a lot less impact than the fsync change as none of the write_inode methods starts data writeout of any form by itself. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 03 1月, 2010 1 次提交
-
-
由 Jaswinder Singh Rajput 提交于
Fix the following htmldocs warning: Warning(fs/fs-writeback.c:255): No description found for parameter 'sb' Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Acked-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 12月, 2009 1 次提交
-
-
由 Eric Sandeen 提交于
ext4, at least, would like to start pushing on writeback if it starts to get close to ENOSPC when reserving worst-case blocks for delalloc writes. Writing out delalloc data will convert those worst-case predictions into usually smaller actual usage, freeing up space before we hit ENOSPC based on this speculation. Thanks to Jens for the suggestion for the helper function, & the naming help. I've made the helper return status on whether writeback was started even though I don't plan to use it in the ext4 patch; it seems like it would be potentially useful to test this in some cases. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Acked-by: NJan Kara <jack@suse.cz>
-
- 03 12月, 2009 3 次提交
-
-
由 Wu Fengguang 提交于
- no one is calling wb_writeback and write_cache_pages with wbc.nonblocking=1 any more - lumpy pageout will want to do nonblocking writeback without the congestion wait So remove the congestion checks as suggested by Chris. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: Alex Elder <aelder@sgi.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Wu Fengguang 提交于
It will lower the flush priority for NFS, and maybe more in future. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Wu Fengguang 提交于
This is dead code because no bdi flush thread will be started for !bdi_cap_writeback_dirty bdi. Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 26 9月, 2009 12 次提交
-
-
由 Jens Axboe 提交于
Sometimes we only want to write pages from a specific super_block, so allow that to be passed in. This fixes a problem with commit 56a131dc causing writeback on all super_blocks on a bdi, where we only really want to sync a specific sb from writeback_inodes_sb(). Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Pointless to iterate other devices looking for a super, when we have a bdi mapping. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Wu Fengguang 提交于
Debug traces show that in per-bdi writeback, the inode under writeback almost always get redirtied by a busy dirtier. We used to call redirty_tail() in this case, which could delay inode for up to 30s. This is unacceptable because it now happens so frequently for plain cp/dd, that the accumulated delays could make writeback of big files very slow. So let's distinguish between data redirty and metadata only redirty. The first one is caused by a busy dirtier, while the latter one could happen in XFS, NFS, etc. when they are doing delalloc or updating isize. The inode being busy dirtied will now be requeued for next io, while the inode being redirtied by fs will continue to be delayed to avoid repeated IO. CC: Jan Kara <jack@suse.cz> CC: Theodore Ts'o <tytso@mit.edu> CC: Dave Chinner <david@fromorbit.com> CC: Chris Mason <chris.mason@oracle.com> CC: Christoph Hellwig <hch@infradead.org> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Currently we pin the inode->i_sb for every single inode. This increases cache traffic on sb->s_umount sem. Lets instead cache the inode sb pin state and keep the super_block pinned for as long as keep writing out inodes from the same super_block. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
If we only moved inodes from a single super_block to the temporary list, there's no point in doing a resort for multiple super_blocks. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Shaohua Li 提交于
__mark_inode_dirty adds inode to wb dirty list in random order. If a disk has several partitions, writeback might keep spindle moving between partitions. To reduce the move, better write big chunk of one partition and then move to another. Inodes from one fs usually are in one partion, so idealy move indoes from one fs together should reduce spindle move. This patch tries to address this. Before per-bdi writeback is added, the behavior is write indoes from one fs first and then another, so the patch restores previous behavior. The loop in the patch is a bit ugly, should we add a dirty list for each superblock in bdi_writeback? Test in a two partition disk with attached fio script shows about 3% ~ 6% improvement. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Reviewed-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
And throw some comments in there, too. Reviewed-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Wu Fengguang 提交于
Make the if-else straight in writeback_single_inode(). No behavior change. Cc: Jan Kara <jack@suse.cz> Cc: Michael Rubin <mrubin@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Wu Fengguang 提交于
Fix the kupdate case, which disregards wbc.more_io and stop writeback prematurely even when there are more inodes to be synced. wbc.more_io should always be respected. Also remove the pages_skipped check. It will set when some page(s) of some inode(s) cannot be written for now. Such inodes will be delayed for a while. This variable has nothing to do with whether there are other writeable inodes. CC: Jan Kara <jack@suse.cz> CC: Dave Chinner <david@fromorbit.com> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Wu Fengguang 提交于
Treat bdi_start_writeback(0) as a special request to do background write, and stop such work when we are below the background dirty threshold. Also simplify the (nr_pages <= 0) checks. Since we already pass in nr_pages=LONG_MAX for WB_SYNC_ALL and background writes, we don't need to worry about it being decreased to zero. Reported-by: NRichard Kennedy <richard@rsk.demon.co.uk> CC: Jan Kara <jack@suse.cz> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jan Kara 提交于
If all inodes are under writeback (e.g. in case when there's only one inode with dirty pages), wb_writeback() with WB_SYNC_NONE work basically degrades to busylooping until I_SYNC flags of the inode is cleared. Fix the problem by waiting on I_SYNC flags of an inode on b_more_io list in case we failed to write anything. Tested-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 16 9月, 2009 12 次提交
-
-
由 Nick Piggin 提交于
wb_clear_pending AFAIKS should not be called after the item has been put on the list, except by the worker threads. It could lead to the situation where the refcount is decremented below 0 and cause lots of problems. Presumably the !wb_has_dirty_io case is not a common one, so it can be discovered when the thread wakes up to check? Also add a comment in bdi_work_clear. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Nick Piggin 提交于
By the time bdi_work_on_stack gets evaluated again in bdi_work_free, it can already have been deallocated and used for something else in the !on stack case, giving a false positive in this test and causing corruption. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Nick Piggin 提交于
If you're going to do an atomic RMW on each list entry, there's not much point in all the RCU complexities of the list walking. This is only going to help the multi-thread case I guess, but it doesn't hurt to do now. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Nick Piggin 提交于
list_add_tail_rcu contains required barriers. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Gets rid of a manual set_current_state(). Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
And document its retriever, get_next_work_item(). Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
bdi_start_writeback() is currently split into two paths, one for WB_SYNC_NONE and one for WB_SYNC_ALL. Add bdi_sync_writeback() for WB_SYNC_ALL writeback and let bdi_start_writeback() handle only WB_SYNC_NONE. Push down the writeback_control allocation and only accept the parameters that make sense for each function. This cleans up the API considerably. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
This gets rid of work == NULL in bdi_queue_work() and puts the OOM handling where it belongs. Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Now that bdi_writeback_all() no longer handles integrity writeback, it doesn't have to block anymore. This means that we can switch bdi_list reader side protection to RCU. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Data integrity writeback must use bdi_start_writeback() and ensure that wbc->sb and wbc->bdi are set. Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
We need to be able to pass in range_cyclic as well, so instead of growing yet another argument, split the arguments into a struct wb_writeback_args structure that we can use internally. Also makes it easier to just copy all members to an on-stack struct, since we can't access work after clearing the pending bit. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Christoph Hellwig 提交于
Since it's an opportunistic writeback and not a data integrity action, don't punt to blocking writeback. Just wakeup the thread and it will flush old data. Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 14 9月, 2009 1 次提交
-
-
由 Jan Kara 提交于
Remove these three functions since nobody uses them anymore. Signed-off-by: NJan Kara <jack@suse.cz>
-
- 11 9月, 2009 5 次提交
-
-
由 Jens Axboe 提交于
Also a debugging aid. We want to catch dirty inodes being added to backing devices that don't do writeback. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
It is now unused, so kill it off. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
This gets rid of pdflush for bdi writeout and kupdated style cleaning. pdflush writeout suffers from lack of locality and also requires more threads to handle the same workload, since it has to work in a non-blocking fashion against each queue. This also introduces lumpy behaviour and potential request starvation, since pdflush can be starved for queue access if others are accessing it. A sample ffsb workload that does random writes to files is about 8% faster here on a simple SATA drive during the benchmark phase. File layout also seems a LOT more smooth in vmstat: r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42 0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44 1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37 0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58 0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34 0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37 0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44 0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38 0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41 0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45 where vanilla tends to fluctuate a lot in the creation phase: r b swpd free buff cache si so bi bo in cs us sy id wa 1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36 1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51 0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40 0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37 1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41 0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49 0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36 1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43 0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39 1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45 1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34 0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54 A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A SSD based writeback test on XFS performs over 20% better as well, with the throughput being very stable around 1GB/sec, where pdflush only manages 750MB/sec and fluctuates wildly while doing so. Random buffered writes to many files behave a lot better as well, as does random mmap'ed writes. A separate thread is added to sync the super blocks. In the long term, adding sync_supers_bdi() functionality could get rid of this thread again. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
This is a first step at introducing per-bdi flusher threads. We should have no change in behaviour, although sb_has_dirty_inodes() is now ridiculously expensive, as there's no easy way to answer that question. Not a huge problem, since it'll be deleted in subsequent patches. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
This adds two new exported functions: - writeback_inodes_sb(), which only attempts to writeback dirty inodes on this super_block, for WB_SYNC_NONE writeout. - sync_inodes_sb(), which writes out all dirty inodes on this super_block and also waits for the IO to complete. Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 24 6月, 2009 1 次提交
-
-
由 Christoph Hellwig 提交于
There is no reason to for the split between __writeback_single_inode and __sync_single_inode, the former just does a couple of checks before tail-calling the latter. So merge the two, and while we're at it split out the I_SYNC waiting case for data integrity writers, as it's logically separate function. Finally rename __writeback_single_inode to writeback_single_inode. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 17 6月, 2009 1 次提交
-
-
由 Wu Fengguang 提交于
1) I_FREEING tests should be coupled with I_CLEAR The two I_FREEING tests are racy because clear_inode() can set i_state to I_CLEAR between the clear of I_SYNC and the test of I_FREEING. 2) skip I_WILL_FREE inodes in generic_sync_sb_inodes() to avoid possible races with generic_forget_inode() generic_forget_inode() sets I_WILL_FREE call writeback on its own, so generic_sync_sb_inodes() shall not try to step in and create possible races: generic_forget_inode inode->i_state |= I_WILL_FREE; spin_unlock(&inode_lock); generic_sync_sb_inodes() spin_lock(&inode_lock); __iget(inode); __writeback_single_inode // see non zero i_count may WARN here ==> WARN_ON(inode->i_state & I_WILL_FREE); spin_unlock(&inode_lock); may call generic_forget_inode again ==> iput(inode); The above race and warning didn't turn up because writeback_inodes() holds the s_umount lock, so generic_forget_inode() finds MS_ACTIVE and returns early. But we are not sure the UBIFS calls and future callers will guarantee that. So skip I_WILL_FREE inodes for the sake of safety. Cc: Eric Sandeen <sandeen@sandeen.net> Acked-by: NJeff Layton <jlayton@redhat.com> Cc: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Acked-by: NJan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 6月, 2009 1 次提交
-
-
由 Nick Piggin 提交于
I think the block_dump output in __mark_inode_dirty is missing dentry locking. Surely the i_dentry list can change any time, so we may not even *get* a dentry there. If we do get one by chance, then it would appear to be able to go away or get renamed at any time... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-