- 20 12月, 2010 3 次提交
-
-
由 Theodore Ts'o 提交于
This function gets called a lot for large directories, and the answer is almost always "no, no, there's no problem". This means using unlikely() is a good thing. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jesper Juhl 提交于
Use advantage of kmem_cache_zalloc() to remove a memset() call in ext4_init_io_end() and save a few bytes. Before: [jj@dragon linux-2.6]$ size fs/ext4/page-io.o text data bss dec hex filename 3016 0 624 3640 e38 fs/ext4/page-io.o After: [jj@dragon linux-2.6]$ size fs/ext4/page-io.o text data bss dec hex filename 3000 0 624 3624 e28 fs/ext4/page-io.o Signed-off-by: NJesper Juhl <jj@chaosbits.net> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Tobias Klauser 提交于
IS_ERR() already implies unlikely(), so it can be omitted here. Signed-off-by: NTobias Klauser <tklauser@distanz.ch> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 17 12月, 2010 2 次提交
-
-
由 Theodore Ts'o 提交于
printk_ratelimit() is deprecated since it is a global instead of a per-printk ratelimit. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This fixes up some broken argument descriptions that Namhyung Kim had originally submitted for ext3. This fixes the comments that were still applicable in ext4. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 16 12月, 2010 3 次提交
-
-
由 Theodore Ts'o 提交于
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Move the ext4_mount_options structure definition from ext4.h, since it is only used in super.c. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Change clear_opt() and set_opt() to take a superblock pointer instead of a pointer to EXT4_SB(sb)->s_mount_opt. This makes it easier for us to support a second mount option field. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 15 12月, 2010 2 次提交
-
-
由 Aaro Koskinen 提交于
There should be a check for the NUL character instead of '0'. Fortunately the only thing that cares about this is NFS serving, which is why we didn't notice this in the merge window testing. Reported-by: NPhil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: NAaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Jon Nelson has found a test case which causes postgresql to fail with the error: psql:t.sql:4: ERROR: invalid page header in block 38269 of relation base/16384/16581 Under memory pressure, it looks like part of a file can end up getting replaced by zero's. Until we can figure out the cause, we'll roll back the change and use block_write_full_page() instead of ext4_bio_write_page(). The new, more efficient writing function can be used via the mount option mblk_io_submit, so we can test and fix the new page I/O code. To reproduce the problem, install postgres 8.4 or 9.0, and pin enough memory such that the system just at the end of triggering writeback before running the following sql script: begin; create temporary table foo as select x as a, ARRAY[x] as b FROM generate_series(1, 10000000 ) AS x; create index foo_a_idx on foo (a); create index foo_b_idx on foo USING GIN (b); rollback; If the temporary table is created on a hard drive partition which is encrypted using dm_crypt, then under memory pressure, approximately 30-40% of the time, pgsql will issue the above failure. This patch should fix this problem, and the problem will come back if the file system is mounted with the mblk_io_submit mount option. Reported-by: NJon Nelson <jnelson@jamponi.net> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 20 11月, 2010 2 次提交
-
-
由 Lukas Czerner 提交于
Filesystem independent ioctl was rejected as not common enough to be in core vfs ioctl. Since we still need to access to this functionality this commit adds ext4 specific ioctl EXT4_IOC_TRIM to dispatch ext4_trim_fs(). It takes fstrim_range structure as an argument. fstrim_range is definec in the include/linux/fs.h and its definition is as follows. struct fstrim_range { __u64 start; __u64 len; __u64 minlen; } start - first Byte to trim len - number of Bytes to trim from start minlen - minimum extent length to trim, free extents shorter than this number of Bytes will be ignored. This will be rounded up to fs block size. After the FITRIM is done, the number of actually discarded Bytes is stored in fstrim_range.len to give the user better insight on how much storage space has been really released for wear-leveling. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
There was concern that FITRIM ioctl is not common enough to be included in core vfs ioctl, as Christoph Hellwig pointed out there's no real point in dispatching this out to a separate vector instead of just through ->ioctl. So this commit removes ioctl_fstrim() from vfs ioctl and trim_fs from super_operation structure. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 19 11月, 2010 1 次提交
-
-
由 Darrick J. Wong 提交于
At the start of ext4_fill_super, ret is set to -EINVAL, and any failure path out of that function returns ret. However, the generic_check_addressable clause sets ret = 0 (if it passes), which means that a subsequent failure (e.g. a group checksum error) returns 0 even though the mount should fail. This causes vfs_kern_mount in turn to think that the mount succeeded, leading to an oops. A simple fix is to avoid using ret for the generic_check_addressable check, which was last changed in commit 30ca22c7. Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 18 11月, 2010 2 次提交
-
-
由 Dan Carpenter 提交于
If the the li_request_list was empty then it returned with the lock held. Instead of adding a "goto unlock" I just removed that special case and let it go past the empty list_for_each_safe(). Signed-off-by: NDan Carpenter <error27@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Markus Trippelsdorf 提交于
ext4_end_bio calls put_page and kmem_cache_free before calling SetPageUpdate(). This can result in setting the PageUptodate bit on random pages and causes the following BUG: BUG: Bad page state in process rm pfn:52e54 page:ffffea0001222260 count:0 mapcount:0 mapping: (null) index:0x0 arch kernel: page flags: 0x4000000000000008(uptodate) Fix the problem by moving put_io_page() after the SetPageUpdate() call. Thanks to Hugh Dickins for analyzing this problem. Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 09 11月, 2010 5 次提交
-
-
由 Theodore Ts'o 提交于
Add ext4_evict_inode, ext4_drop_inode, ext4_mark_inode_dirty, and ext4_begin_ordered_truncate() Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Commit 5c521830 (ext4: Support discard requests when running in no-journal mode) attempts to add sb_issue_discard() for data blocks (in data=writeback mode) and in no-journal mode. Unfortunately, this no longer works, because in commit dd3932ed (block: remove BLKDEV_IFL_WAIT), sb_issue_discard() only presents a synchronous interface, and there are times when we call ext4_free_blocks() when we are are holding a spinlock, or are otherwise in an atomic context. For now, I've removed the call to sb_issue_discard() to prevent a deadlock or (if spinlock debugging is enabled) failures like this: BUG: scheduling while atomic: rc.sysinit/1376/0x00000002 Pid: 1376, comm: rc.sysinit Not tainted 2.6.36-ARCH #1 Call Trace: [<ffffffff810397ce>] __schedule_bug+0x5e/0x70 [<ffffffff81403110>] schedule+0x950/0xa70 [<ffffffff81060bad>] ? insert_work+0x7d/0x90 [<ffffffff81060fbd>] ? queue_work_on+0x1d/0x30 [<ffffffff81061127>] ? queue_work+0x37/0x60 [<ffffffff8140377d>] schedule_timeout+0x21d/0x360 [<ffffffff812031c3>] ? generic_make_request+0x2c3/0x540 [<ffffffff81402680>] wait_for_common+0xc0/0x150 [<ffffffff81041490>] ? default_wake_function+0x0/0x10 [<ffffffff812034bc>] ? submit_bio+0x7c/0x100 [<ffffffff810680a0>] ? wake_bit_function+0x0/0x40 [<ffffffff814027b8>] wait_for_completion+0x18/0x20 [<ffffffff8120a969>] blkdev_issue_discard+0x1b9/0x210 [<ffffffff811ba03e>] ext4_free_blocks+0x68e/0xb60 [<ffffffff811b1650>] ? __ext4_handle_dirty_metadata+0x110/0x120 [<ffffffff811b098c>] ext4_ext_truncate+0x8cc/0xa70 [<ffffffff810d713e>] ? pagevec_lookup+0x1e/0x30 [<ffffffff81191618>] ext4_truncate+0x178/0x5d0 [<ffffffff810eacbb>] ? unmap_mapping_range+0xab/0x280 [<ffffffff810d8976>] vmtruncate+0x56/0x70 [<ffffffff811925cb>] ext4_setattr+0x14b/0x460 [<ffffffff811319e4>] notify_change+0x194/0x380 [<ffffffff81117f80>] do_truncate+0x60/0x90 [<ffffffff811e08fa>] ? security_inode_permission+0x1a/0x20 [<ffffffff811eaec1>] ? tomoyo_path_truncate+0x11/0x20 [<ffffffff81127539>] do_last+0x5d9/0x770 [<ffffffff811278bd>] do_filp_open+0x1ed/0x680 [<ffffffff8140644f>] ? page_fault+0x1f/0x30 [<ffffffff81132bfc>] ? alloc_fd+0xec/0x140 [<ffffffff81118db1>] do_sys_open+0x61/0x120 [<ffffffff81118e8b>] sys_open+0x1b/0x20 [<ffffffff81002e6b>] system_call_fastpath+0x16/0x1b https://bugzilla.kernel.org/show_bug.cgi?id=22302Reported-by: NMathias Burén <mathias.buren@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: jiayingz@google.com
-
由 Dmitry Monakhov 提交于
It's not needed to sync the filesystem, and it fixes a lock_dep complaint. Signed-off-by: NDmitry Monakhov <dmonakhov@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 Theodore Ts'o 提交于
Use an atomic_t and make sure we don't free the structure while we might still be submitting I/O for that page. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The following BUG can occur when an inode which is getting freed when it still has dirty pages outstanding, and it gets deleted (in this because it was the target of a rename). In ordered mode, we need to make sure the data pages are written just in case we crash before the rename (or unlink) is committed. If the inode is being freed then when we try to igrab the inode, we end up tripping the BUG_ON at fs/ext4/page-io.c:146. To solve this problem, we need to keep track of the number of io callbacks which are pending, and avoid destroying the inode until they have all been completed. That way we don't have to bump the inode count to keep the inode from being destroyed; an approach which doesn't work because the count could have already been dropped down to zero before the inode writeback has started (at which point we're not allowed to bump the count back up to 1, since it's already started getting freed). Thanks to Dave Chinner for suggesting this approach, which is also used by XFS. kernel BUG at /scratch_space/linux-2.6/fs/ext4/page-io.c:146! Call Trace: [<ffffffff811075b1>] ext4_bio_write_page+0x172/0x307 [<ffffffff811033a7>] mpage_da_submit_io+0x2f9/0x37b [<ffffffff811068d7>] mpage_da_map_and_submit+0x2cc/0x2e2 [<ffffffff811069b3>] mpage_add_bh_to_extent+0xc6/0xd5 [<ffffffff81106c66>] write_cache_pages_da+0x2a4/0x3ac [<ffffffff81107044>] ext4_da_writepages+0x2d6/0x44d [<ffffffff81087910>] do_writepages+0x1c/0x25 [<ffffffff810810a4>] __filemap_fdatawrite_range+0x4b/0x4d [<ffffffff810815f5>] filemap_fdatawrite_range+0xe/0x10 [<ffffffff81122a2e>] jbd2_journal_begin_ordered_truncate+0x7b/0xa2 [<ffffffff8110615d>] ext4_evict_inode+0x57/0x24c [<ffffffff810c14a3>] evict+0x22/0x92 [<ffffffff810c1a3d>] iput+0x212/0x249 [<ffffffff810bdf16>] dentry_iput+0xa1/0xb9 [<ffffffff810bdf6b>] d_kill+0x3d/0x5d [<ffffffff810be613>] dput+0x13a/0x147 [<ffffffff810b990d>] sys_renameat+0x1b5/0x258 [<ffffffff81145f71>] ? _atomic_dec_and_lock+0x2d/0x4c [<ffffffff810b2950>] ? cp_new_stat+0xde/0xea [<ffffffff810b29c1>] ? sys_newlstat+0x2d/0x38 [<ffffffff810b99c6>] sys_rename+0x16/0x18 [<ffffffff81002a2b>] system_call_fastpath+0x16/0x1b Reported-by: NNick Bowler <nbowler@elliptictech.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Tested-by: NNick Bowler <nbowler@elliptictech.com>
-
- 04 11月, 2010 1 次提交
-
-
由 Theodore Ts'o 提交于
We now initialize the percpu counters before replaying the journal, but after the journal, we recalculate the global counters, to deal with the possibility of the per-blockgroup counts getting updated by the journal replay. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 03 11月, 2010 2 次提交
-
-
由 Theodore Ts'o 提交于
Newer GCC's reported the following build warning: fs/ext4/super.c: In function 'ext4_lazyinit_thread': fs/ext4/super.c:2702: warning: 'ret' may be used uninitialized in this function Fix it by removing the need for the ret variable in the first place. Signed-off-by: N"Lukas Czerner" <lczerner@redhat.com> Reported-by: N"Stefan Richter" <stefanr@s5r6.in-berlin.de> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
When the request has been removed from the list and no other request has been issued, we will end up with next wakeup scheduled to MAX_JIFFY_OFFSET which is bad. So check for that. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 02 11月, 2010 1 次提交
-
-
由 Theodore Ts'o 提交于
Linus noted, and complained to me, that doing while lots of "git diff"'s of kernel sources, these spinlocks were responsible for 27% of the spinlock cost on his two-processor system as reported by perf. Git was doing lots of parallel stats, and this was putting a lot of pressure on ext4_getattr(). A spinlock to protect a single memory-to-memory copy is pointless, so remove it. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 10月, 2010 3 次提交
-
-
由 Al Viro 提交于
... and switch of the obvious get_sb_bdev() users to ->mount() Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Theodore Ts'o 提交于
We need to make check if a page does not have buffes by checking page_has_buffers(page) before calling page_buffers(page) in ext4_writepage(). Otherwise page_buffers() could throw a BUG_ON. Thanks also to Markus Trippelsdorf and Avinash Kurup who also reported the problem. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reported-by: NSedat Dilek <sedat.dilek@googlemail.com> Tested-by: NSedat Dilek <sedat.dilek@googlemail.com>
-
由 Ingo Molnar 提交于
Commit 5dabfc78 ("ext4: rename {exit,init}_ext4_*() to ext4_{exit,init}_*()") causes fs/ext4/super.c:4776: error: implicit declaration of function ‘ext4_init_xattr’ when CONFIG_EXT4_FS_XATTR is disabled. It renamed init_ext4_xattr to ext4_init_xattr but forgot to update the dummy definition in fs/ext4/xattr.h. Signed-off-by: NIngo Molnar <mingo@elte.hu> Acked-by: N"Theodore Ts'o" <tytso@mit.edu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 10月, 2010 13 次提交
-
-
由 Dmitry Monakhov 提交于
Surprisingly chown() on ext4 is not SMP scalable operation. Due to unconditional orphan_del(NULL, inode) in ext4_setattr() result in significant performance overhead because of global orphan mutex, especially in no-journal mode (where orphan_add() is noop). It is possible to skip explicit orphan_del if possible. Results of fchown() micro-benchmark in no-journal mode while (1) { iteration++; fchown(fd, uid, gid); fchown(fd, uid + 1, gid + 1) } measured: iterations per millisecond | nr_tasks | w/o patch | with patch | | 1 | 142 | 185 | | 4 | 109 | 642 | Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Nicolas Kaiser 提交于
Signed-off-by: NNicolas Kaiser <nikai@nikai.net> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Kazuya Mio 提交于
When I compiled 2.6.36-rc3 kernel with EXT4FS_DEBUG definition, I got the following compile error. CC [M] fs/ext4/extents.o fs/ext4/extents.c: In function 'ext4_fallocate': fs/ext4/extents.c:3772: error: 'block' undeclared (first use in this function) fs/ext4/extents.c:3772: error: (Each undeclared identifier is reported only once fs/ext4/extents.c:3772: error: for each function it appears in.) make[2]: *** [fs/ext4/extents.o] Error 1 The patch fixes this problem. Signed-off-by: NKazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
These functions are only used within fs/ext4/mballoc.c, so move them so they are used after they are defined, and then make them be static. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Fix a namespace leak from fs/ext4 Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Fix a namespace leak by moving the function to the file where it is used and making it static. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Cleanup namespace leaks from fs/ext4 and the inline trivial functions ext4_{ext,idx}_pblock() and ext4_{ext,idx}_store_pblock() since the code size actually shrinks when we make these functions inline, they're so trivial. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
These functions have no need to be exported beyond file context. No functions needed to be moved for this commit; just some function declarations changed to be static and removed from header files. (A similar patch was submitted by Eric Sandeen, but I wanted to handle code movement in separate patches to make sure code changes didn't accidentally get dropped.) Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This is a cleanup to avoid namespace leaks out of fs/ext4 Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Commit 84061e07 fixed an accounting bug only to introduce the possibility of a kernel OOPS if the journal has a non-zero j_errno field indicating that the file system had detected a fs inconsistency. After the journal replay, if the journal superblock indicates that the file system has an error, this indication is transfered to the file system and then ext4_commit_super() is called to write this to the disk. But since the percpu counters are now initialized after the journal replay, the call to ext4_commit_super() will cause a kernel oops since it needs to use the percpu counters the ext4 superblock structure. The fix is to skip setting the ext4 free block and free inode fields if the percpu counter has not been set. Thanks to Ken Sumrall for reporting and analyzing the root causes of this bug. Addresses-Google-Bug: #3054080 Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
As pointed out in a prior patch, updating the mapping's writeback_index based on pages written isn't quite right; what the writeback index is really supposed to reflect is the next page which should be scanned for writeback during periodic flush. As in write_cache_pages(), write_cache_pages_da() does this scanning for us as we assemble the mpd for later writeout. If we keep track of the next page after the current scan, we can easily update writeback_index without worrying about pages written vs. pages skipped, etc. Without this, an fsync will reset writeback_index to 0 (its starting index) + however many pages it wrote, which can mess up the progress of periodic flush. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
This is analogous to Jan Kara's commit, f446daae mm: implement writeback livelock avoidance using page tagging but since we forked write_cache_pages, we need to reimplement it there (and in ext4_da_writepages, since range_cyclic handling was moved to there) If you start a large buffered IO to a file, and then set fsync after it, you'll find that fsync does not complete until the other IO stops. If you continue re-dirtying the file (say, putting dd with conv=notrunc in a loop), when fsync finally completes (after all IO is done), it reports via tracing that it has written many more pages than the file contains; in other words it has synced and re-synced pages in the file multiple times. This then leads to problems with our writeback_index update, since it advances it by pages written, and essentially sets writeback_index off the end of the file... With the following patch, we only sync as much as was dirty at the time of the sync. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
This doesn't fix anything at all, it just removes a vestige of prior use from __mpage_da_writepage() __mpage_da_writepage() had a *void argument leftover from its previous life as a callback; make it reflect the actual type. Fixing this up makes it slightly more obvious to read, and enables proper typechecking. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-