- 22 9月, 2009 8 次提交
-
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
In theory it could happen that on one CPU we initialize a new inode but clearing of I_NEW | I_LOCK gets reordered before some of the initialization. Thus on another CPU we return not fully uptodate inode from iget_locked(). This seems to fix a corruption issue on ext3 mounted over NFS. [akpm@linux-foundation.org: add some commentary] Signed-off-by: NJan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 9月, 2009 3 次提交
-
-
由 Jens Axboe 提交于
NFS may free the server structure without ever having used the bdi, so we either need to flag the bdi as being uninitialized or initialize it up front. This does the latter. This fixes a crash with mounting more than one NFS file system, should people ever need that kind of obscure NFS functionality. Tested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Otherwise we could be attempting to flush data for a writeback thread and bdi that have already disappeared. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Ingo Molnar 提交于
Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: NStephane Eranian <eranian@google.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NPaul Mackerras <paulus@samba.org> Reviewed-by: NArjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 17 9月, 2009 10 次提交
-
-
由 Eric Sandeen 提交于
There's no reason to redefine the maximum allowable offset in an extent-based file just for defrag; EXT_MAX_BLOCK already does this. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
In an attempt to avoid doing an unneeded flush after opening a (previously non-existent) file with O_CREAT|O_TRUNC, the code only triggered the hueristic if ei->disksize was non-zero. Turns out that the VFS doesn't call ->truncate() if the file doesn't exist, and ei->disksize is always zero even if the file previously existed. So remove the test, since it isn't necessary and in fact disabled the hueristic. Thanks to Clemens Eisserer that he was seeing problems with files written using kwrite and eclipse after sudden crashes caused by a buggy Intel video driver. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Artem Bityutskiy 提交于
In 'dbg_check_space_info()' we want to dump current lprops statistics, but actually dump old statistics. Fix this. Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
-
由 Theodore Ts'o 提交于
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
EXT4_EXT_MIGRATE is only intended to be used for an in-memory flag, and the hex value assigned to it collides with FS_DIRECTIO_FL (which is also stored in i_flags). There's no reason for the EXT4_EXT_MIGRATE bit to be stored in i_flags, so we switch it to use i_state instead. Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
Today, the ext4 allocator will happily allocate blocks past 2^32 for indirect-block files, which results in the block numbers getting truncated, and corruption ensues. This patch limits such allocations to < 2^32, and adds BUG_ONs if we do get blocks larger than that. This should address RH Bug 519471, ext4 bitmap allocator must limit blocks to < 2^32 * ext4_find_goal() is modified to choose a goal < UINT_MAX, so that our starting point is in an acceptable range. * ext4_xattr_block_set() is modified such that the goal block is < UINT_MAX, as above. * ext4_mb_regular_allocator() is modified so that the group search does not continue into groups which are too high * ext4_mb_use_preallocated() has a check that we don't use preallocated space which is too far out * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs No attempt has been made to limit inode locations to < 2^32, so we may wind up with blocks far from their inodes. Doing this much already will lead to some odd ENOSPC issues when the "lower 32" gets full, and further restricting inodes could make that even weirder. For high inodes, choosing a goal of the original, % UINT_MAX, may be a bit odd, but then we're in an odd situation anyway, and I don't know of a better heuristic. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Akira Fujita 提交于
If logical block offset of original file which is passed to EXT4_IOC_MOVE_EXT is different from donor file's, a calculation error occurs in ext4_calc_swap_extents(), therefore wrong block is exchanged between original file and donor file. As a result, we hit ext4_error() in check_block_validity(). To detect the logical offset difference in EXT4_IOC_MOVE_EXT, add checks to mext_calc_swap_extents() and handle it as error, since data exchange must be done between the same blocks in EXT4_IOC_MOVE_EXT. Reported-by: NPeng Tao <bergwolf@gmail.com> Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Akira Fujita 提交于
There is the possibility that path structure which is taken by ext4_ext_find_extent() indicates null extents. Because during data block exchanging in ext4_move_extents(), constitution of an extent tree may be changed. As a solution, the patch adds null extent check to ext_get_path(). Reported-by: NPeng Tao <bergwolf@gmail.com> Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Akira Fujita 提交于
Replace BUG_ON calls with a call to ext4_error() to print an error message if EXT4_IOC_MOVE_EXT failed with some kind of reasons. This will help to debug. Ted pointed this out, thanks. Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Akira Fujita 提交于
Replace get_ext_path macro with an inline function, since this macro looks like a function call but its arguments get modified. Ted pointed this out, thanks. Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 16 9月, 2009 19 次提交
-
-
由 Jan Kara 提交于
In case we fsync() a file and inode is not dirty, we don't force a transaction to disk and hence don't flush disk caches. Thus file data could be just in disk caches and not on persistent storage. Fix the problem by flushing disk caches if we didn't force a transaction commit. Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Chris Mason 提交于
I've been struggling with this off and on while I've been testing the data=guarded work. The symptom is corrupted orphan lists and inodes with the wrong i_size stored on disk. I was convinced the data=guarded code was just missing a call to ext3_mark_inode_dirty, but tracing showed the i_disksize I was sending to ext3_mark_inode_dirty wasn't actually making it to the drive. ext3_mark_inode_dirty can be called without locks held (atime updates and a few others), so the data=guarded code uses locks while updating the in-memory inode, and then calls ext3_mark_inode_dirty without any locks held. But, ext3_mark_inode_dirty has no internal locking to make sure that only one CPU is updating the buffer head at a time. Generally this works out ok because everyone that changes the inode then calls ext3_mark_inode_dirty themselves. Even though it races, eventually someone updates the buffer heads and things move on. But there is still a risk of the wrong values getting in, and the data=guarded code seems to hit the race very often. Since everyone that changes the inode also logs it, it should be possible to fix this with some memory barriers. I'll leave that as an exercise to the reader and lock the buffer head instead. It it probably a good idea to have a different patch series for lockless bit flipping on the ext3 i_state field. ext3_do_update_inode &= clears EXT3_STATE_NEW without any locks held. Signed-off-by: NChris Mason <chris.mason@oracle.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
During truncate we are sometimes forced to start a new transaction as the amount of blocks to be journaled is both quite large and hard to predict. So far we restarted a transaction while holding truncate_mutex and that violates lock ordering because truncate_mutex ranks below transaction start (and it can lead to a real deadlock with ext3_get_blocks() allocating new blocks from ext3_writepage()). Luckily, the problem is easy to fix: We just drop the truncate_mutex before restarting the transaction and acquire it afterwards. We are safe to do this as by the time ext3_truncate() is called, all the page cache for the truncated part of the file is dropped and so writepage() cannot come and allocate new blocks in the part of the file we are truncating. The rest of writers is stopped by us holding i_mutex. Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
lockdep annotation for a transaction start has been at the end of journal_start(). But a transaction is also started from journal_restart(). Move the lockdep annotation to start_this_handle() which covers both cases. Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
It does not make sense to store block number for journal as unsigned long since they can be only 32-bit (because of on-disk format limitation). So change in-memory structures and variables to use unsigned int instead. Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Andreas Dilger 提交于
Fix jiffie rounding in jbd commit timer setup code. Rounding down could cause the timer to be fired before the corresponding transaction has expired. That transaction can stay not committed forever if no new transaction is created or explicit sync/umount happens. Signed-off-by: NAndreas Dilger <adilger@sun.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Nick Piggin 提交于
wb_clear_pending AFAIKS should not be called after the item has been put on the list, except by the worker threads. It could lead to the situation where the refcount is decremented below 0 and cause lots of problems. Presumably the !wb_has_dirty_io case is not a common one, so it can be discovered when the thread wakes up to check? Also add a comment in bdi_work_clear. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Nick Piggin 提交于
By the time bdi_work_on_stack gets evaluated again in bdi_work_free, it can already have been deallocated and used for something else in the !on stack case, giving a false positive in this test and causing corruption. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Nick Piggin 提交于
If you're going to do an atomic RMW on each list entry, there's not much point in all the RCU complexities of the list walking. This is only going to help the multi-thread case I guess, but it doesn't hurt to do now. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Nick Piggin 提交于
list_add_tail_rcu contains required barriers. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Gets rid of a manual set_current_state(). Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
And document its retriever, get_next_work_item(). Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
bdi_start_writeback() is currently split into two paths, one for WB_SYNC_NONE and one for WB_SYNC_ALL. Add bdi_sync_writeback() for WB_SYNC_ALL writeback and let bdi_start_writeback() handle only WB_SYNC_NONE. Push down the writeback_control allocation and only accept the parameters that make sense for each function. This cleans up the API considerably. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
This gets rid of work == NULL in bdi_queue_work() and puts the OOM handling where it belongs. Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Now that bdi_writeback_all() no longer handles integrity writeback, it doesn't have to block anymore. This means that we can switch bdi_list reader side protection to RCU. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
Data integrity writeback must use bdi_start_writeback() and ensure that wbc->sb and wbc->bdi are set. Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
We do this automatically in get_sb_bdev() from the set_bdev_super() callback. Filesystems that have their own private backing_dev_info must assign that in ->fill_super(). Note that ->s_bdi assignment is required for proper writeback! Acked-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Jens Axboe 提交于
We need to be able to pass in range_cyclic as well, so instead of growing yet another argument, split the arguments into a struct wb_writeback_args structure that we can use internally. Also makes it easier to just copy all members to an on-stack struct, since we can't access work after clearing the pending bit. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Christoph Hellwig 提交于
Since it's an opportunistic writeback and not a data integrity action, don't punt to blocking writeback. Just wakeup the thread and it will flush old data. Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-