- 08 2月, 2011 1 次提交
-
-
由 Curt Wohlgemuth 提交于
This fixes a corruption problem with the multi-block writepages submittal change for ext4, from commit bd2d0210 ("ext4: use bio layer instead of buffer layer in mpage_da_submit_io"). (Note that this corruption is not present in 2.6.37 on ext4, because the corruption was detected after the feature was merged in 2.6.37-rc1, and so it was turned off by adding a non-default mount option, mblk_io_submit. With this commit, which hopefully fixes the last of the bugs with this feature, we'll be able to turn on this performance feature by default in 2.6.38, and remove the mblk_io_submit option.) The ext4 code path to bundle multiple pages for writeback in ext4_bio_write_page() had a bug: we should be clearing buffer head dirty flags *before* we submit the bio, not in the completion routine. The patch below was tested on 2.6.37 under KVM with the postgresql script which was submitted by Jon Nelson as documented in commit 1449032b. Without the patch, I'd hit the corruption problem about 50-70% of the time. With the patch, I executed the script > 100 times with no corruption seen. I also fixed a bug to make sure ext4_end_bio() doesn't dereference the bio after the bio_put() call. Reported-by: NJon Nelson <jnelson@jamponi.net> Reported-by: NMatthias Bayer <jackdachef@gmail.com> Signed-off-by: NCurt Wohlgemuth <curtw@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
- 04 2月, 2011 3 次提交
-
-
由 Theodore Ts'o 提交于
Make sure we the correct cleanup happens if we die while trying to load the ext4 file system. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
Ext4 features interface was not properly unregistered which led to problems while unloading/reloading ext4 module. This commit fixes that by adding proper kobject unregistration code into ext4_exit_fs() as well as fail-path of ext4_init_fs() Reported-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
-
由 Eric Sandeen 提交于
https://bugzilla.kernel.org/show_bug.cgi?id=27652 If the lazyinit thread is running, the teardown function ext4_destroy_lazyinit_thread() has problems: ext4_clear_request_list(); while (ext4_li_info->li_task) { wake_up(&ext4_li_info->li_wait_daemon); wait_event(ext4_li_info->li_wait_task, ext4_li_info->li_task == NULL); } Clearing the request list will cause the thread to exit and free ext4_li_info, so then we're waiting on something which is getting freed. Fix this up by making the thread respond to kthread_stop, and exit, without the need to wait for that exit in some other homegrown way. Cc: stable@kernel.org Reported-and-Tested-by: NTao Ma <boyu.mt@taobao.com> Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 17 1月, 2011 2 次提交
-
-
由 Christoph Hellwig 提交于
Currently all filesystems except XFS implement fallocate asynchronously, while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC I/O we really want our allocation on disk, especially for the !KEEP_SIZE case where we actually grow the file with user-visible zeroes. On the other hand always commiting the transaction is a bad idea for fast-path uses of fallocate like for example in recent Samba versions. Given that block allocation is a data plane operation anyway change it from an inode operation to a file operation so that we have the file structure available that lets us check for O_SYNC. This also includes moving the code around for a few of the filesystems, and remove the already unnedded S_ISDIR checks given that we only wire up fallocate for regular files. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Instead of various home grown checks that might need updates for new flags just check for any bit outside the mask of the features supported by the filesystem. This makes the check future proof for any newly added flag. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 14 1月, 2011 1 次提交
-
-
由 Andrew Morton 提交于
pr_warning_ratelimited() doesn't exist. Also include printk.h, which defines these things. Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 1月, 2011 2 次提交
-
-
由 Josef Bacik 提交于
Ext4 doesn't have the ability to punch holes yet, so make sure we return EOPNOTSUPP if we try to use hole punching through fallocate. This support can be added later. Thanks, Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jan Kara 提交于
As Al Viro pointed out path resolution during Q_QUOTAON calls to quotactl is prone to deadlocks. We hold s_umount semaphore for reading during the path resolution and resolution itself may need to acquire the semaphore for writing when e. g. autofs mountpoint is passed. Solve the problem by performing the resolution before we get hold of the superblock (and thus s_umount semaphore). The whole thing is complicated by the fact that some filesystems (OCFS2) ignore the path argument. So to distinguish between filesystem which want the path and which do not we introduce new .quota_on_meta callback which does not get the path. OCFS2 then uses this callback instead of old .quota_on. CC: Al Viro <viro@ZenIV.linux.org.uk> CC: Christoph Hellwig <hch@lst.de> CC: Ted Ts'o <tytso@mit.edu> CC: Joel Becker <joel.becker@oracle.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 12 1月, 2011 2 次提交
-
-
由 Jan Kara 提交于
When s_first_data_block is not zero (which happens e.g. when block size is 1KB) and trim ioctl is called to start trimming from block 0, the math in ext4_get_group_no_and_offset() overflows. The overall result is that ioctl returns EINVAL which is kind of unexpected and we probably don't want userspace tools to bother with internal details of filesystem structure. So just silently increase starting offset (and shorten length) when starting block is below s_first_data_block. CC: Lukas Czerner <lczerner@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This reverts commit 4f531501: ext4: fix possible overflow in ext4_trim_fs() Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 11 1月, 2011 23 次提交
-
-
由 Eric Sandeen 提交于
Since check_eofblocks_fl() only uses the m_lblk portion of the map structure, we may as well pass that directly, rather than passing the entire map, which IMHO obfuscates what parameters check_eofblocks_fl() cares about. Not a big deal, but seems tidier and less confusing, to me. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Commit 40389687 moved a call to ext4_forget() out of ext4_free_branches and let ext4_free_blocks() handle calling bforget(). But that change unfortunately did not replace the call to ext4_forget() with brelse(), which was needed to drop the in-use count of the indirect block's buffer head, which lead to a memory leak when deleting files that used indirect blocks. Fix this. Thanks to Hugh Dickins for pointing this out. Cc: stable@kernel.org Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This function was never implemented, except for a BUG_ON which was tripping when ext4 is run without a journal. The problem is that although the comment asserts that "truncate (which is the only way to free block) discards all preallocations", ext4_free_blocks() is also called in various error recovery paths when blocks have been allocated, but for various reasons, we were not able to use those data blocks (for example, because we ran out of memory while trying to manipulate the extent tree, or some other similar situation). In addition to the fact that this function isn't implemented except for the incorrect BUG_ON, the single caller of this function, ext4_free_blocks(), doesn't use it all if the journal is enabled. So remove the (stub) function entirely for now. If we decide it's better to add it back, it's only going to be useful with a relatively large number of code changes anyway. Google-Bug-Id: 3236408 Cc: Jiaying Zhang <jiayingz@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jiaying Zhang 提交于
Ted first found the bug when running 2.6.36 kernel with dioread_nolock mount option that xfstests #13 complained about wrong file size during fsck. However, the bug exists in the older kernels as well although it is somehow harder to trigger. The problem is that ext4_end_io_work() can happen after we have truncated an inode to a smaller size. Then when ext4_end_io_work() calls ext4_convert_unwritten_extents(), we may reallocate some blocks that have been truncated, so the inode size becomes inconsistent with the allocated blocks. The following patch flushes the i_completed_io_list during truncate to reduce the risk that some pending end_io requests are executed later and convert already truncated blocks to initialized. Note that although the fix helps reduce the problem a lot there may still be a race window between vmtruncate() and ext4_end_io_work(). The fundamental problem is that if vmtruncate() is called without either i_mutex or i_alloc_sem held, it can race with an ongoing write request so that the io_end request is processed later when the corresponding blocks have been truncated. Ted and I have discussed the problem offline and we saw a few ways to fix the race completely: a) We guarantee that i_mutex lock and i_alloc_sem write lock are both hold whenever vmtruncate() is called. The i_mutex lock prevents any new write requests from entering writeback and the i_alloc_sem prevents the race from ext4_page_mkwrite(). Currently we hold both locks if vmtruncate() is called from do_truncate(), which is probably the most common case. However, there are places where we may call vmtruncate() without holding either i_mutex or i_alloc_sem. I would like to ask for other people's opinions on what locks are expected to be held before calling vmtruncate(). There seems a disagreement among the callers of that function. b) We change the ext4 write path so that we change the extent tree to contain the newly allocated blocks and update i_size both at the same time --- when the write of the data blocks is completed. c) We add some additional locking to synchronize vmtruncate() and ext4_end_io_work(). This approach may have performance implications so we need to be careful. All of the above proposals may require more substantial changes, so we may consider to take the following patch as a bandaid. Signed-off-by: NJiaying Zhang <jiayingz@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Call ext4_std_error() in various places when we can't bail out cleanly, so the file system can be marked as in error. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
When ext4_trim_fs() is called to trim a part of a single group, the logic will wrongly set last block of the interval to 'len' instead of 'first_block + len'. Thus a shorter interval is possibly trimmed. Fix it. CC: Lukas Czerner <lczerner@redhat.com> Cc: stable@kernel.org Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Andrew Morton 提交于
fs/ext4/super.c: In function 'ext4_register_li_request': fs/ext4/super.c:2936: warning: 'ret' may be used uninitialized in this function It looks buggy to me, too. Cc: Lukas Czerner <lczerner@redhat.com> Cc: stable@kernel.org Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Replace the jbd2_inode structure (which is 48 bytes) with a pointer and only allocate the jbd2_inode when it is needed --- that is, when the file system has a journal present and the inode has been opened for writing. This allows us to further slim down the ext4_inode_info structure. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
We can store the dynamic inode state flags in the high bits of EXT4_I(inode)->i_flags, and eliminate i_state_flags. This saves 8 bytes from the size of ext4_inode_info structure, which when multiplied by the number of the number of in the inode cache, can save a lot of memory. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
By reordering the elements in the ext4_inode_info structure, we can reduce the padding needed on an x86_64 system by 16 bytes. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
We can encode the ec_type information by using ee_len == 0 to denote EXT4_EXT_CACHE_NO, ee_start == 0 to denote EXT4_EXT_CACHE_GAP, and if neither is true, then the cache type must be EXT4_EXT_CACHE_EXTENT. This allows us to reduce the size of ext4_ext_inode by another 8 bytes. (ec_type is 4 bytes, plus another 4 bytes of padding) Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This fixes a number of places where we used sector_t instead of ext4_lblk_t for logical blocks, which for ext4 are still 32-bit data types. No point wasting space in the ext4_inode_info structure, and requiring 64-bit arithmetic on 32-bit systems, when it isn't necessary. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Remove the short element i_delalloc_reserved_flag from the ext4_inode_info structure and replace it a new bit in i_state_flags. Since we have an ext4_inode_info for every ext4 inode cached in the inode cache, any savings we can produce here is a very good thing from a memory utilization perspective. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Kazuya Mio 提交于
ext4_ext_find_goal() returns an ideal physical block number that the block allocator tries to allocate first. However, if a required file offset is smaller than the existing extent's one, ext4_ext_find_goal() returns a wrong block number because it may overflow at "block - le32_to_cpu(ex->ee_block)". This patch fixes the problem. ext4_ext_find_goal() will also return a wrong block number in case a file offset of the existing extent is too big. In this case, the ideal physical block number is fixed in ext4_mb_initialize_context(), so it's no problem. reproduce: # dd if=/dev/zero of=/mnt/mp1/tmp bs=127M count=1 oflag=sync # dd if=/dev/zero of=/mnt/mp1/file bs=512K count=1 seek=1 oflag=sync # filefrag -v /mnt/mp1/file Filesystem type is: ef53 File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096) ext logical physical expected length flags 0 128 67456 128 eof /mnt/mp1/file: 2 extents found # rm -rf /mnt/mp1/tmp # echo $((512*4096)) > /sys/fs/ext4/loop0/mb_stream_req # dd if=/dev/zero of=/mnt/mp1/file bs=512K count=1 oflag=sync conv=notrunc result (linux-2.6.37-rc2 + ext4 patch queue): # filefrag -v /mnt/mp1/file Filesystem type is: ef53 File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096) ext logical physical expected length flags 0 0 33280 128 1 128 67456 33407 128 eof /mnt/mp1/file: 2 extents found result(apply this patch): # filefrag -v /mnt/mp1/file Filesystem type is: ef53 File size of /mnt/mp1/file is 1048576 (256 blocks, blocksize 4096) ext logical physical expected length flags 0 0 66560 128 1 128 67456 66687 128 eof /mnt/mp1/file: 2 extents found Signed-off-by: NKazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Namhyung Kim 提交于
Check return value of ext4_journal_get_write_access, ext4_journal_dirty_metadata and ext4_mark_inode_dirty. Move brelse() under 'out_stop' to release bh properly in case of journal error. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Paris 提交于
ext4_ext_migrate() calls ext4_new_inode() and passes 0 instead of a pointer to a struct qstr. This patch uses NULL, to make it obvious to the caller that this was a pointer. Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Where the file pointer is available, use ext4_error_file() instead of ext4_error_inode(). Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dan Carpenter 提交于
d_path() returns an ERR_PTR and it doesn't return NULL. This is in ext4_error_file() and no one actually calls ext4_error_file(). Signed-off-by: NDan Carpenter <error27@gmail.com>
-
由 Dan Carpenter 提交于
This is a copy and paste error. The intent was to check "io_page_cachep". We tested "io_page_cachep" earlier. Signed-off-by: NDan Carpenter <error27@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Wang Sheng-Hui 提交于
Signed-off-by: NWang Sheng-Hui <crosslonelyover@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Any time you see code that tries to add error codes together, you should want to claw your eyes out... Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
ext4_issue_discard is supposed to be helper for calling discard, however in case that underlying device does not support discard it prints out the warning message and clears the DISCARD t_mount_opt flag. Since it can be (and is) used by others, it should not do anything and let the caller to handle the error case. This commit removes warning message and flag setting from ext4_issue_discard and use it just in place where it is really needed (release_blocks_on_commit). FITRIM ioctl should not set any flags nor it should print out warning messages, so get rid of the warning as well. Signed-off-by: NLukas Czerner <lczerner@redhat.com>
-
由 Lukas Czerner 提交于
When determining last group through ext4_get_group_no_and_offset() the result may be wrong in cases when range->start and range-len are too big, because it may overflow when summing up those two numbers. Fix that by checking range->len and limit its value to ext4_blocks_count(). This commit was tested by myself with expected result. Signed-off-by: NLukas Czerner <lczerner@redhat.com>
-
- 07 1月, 2011 3 次提交
-
-
由 Nick Piggin 提交于
This simple implementation just checks for no ACLs on the inode, and if so, then the rcu-walk may proceed, otherwise fail it. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
- 24 12月, 2010 1 次提交
-
-
由 Theodore Ts'o 提交于
https://bugzilla.kernel.org/show_bug.cgi?id=25352 This regression was caused by commit a31437b8: "ext4: use sb_issue_zeroout in setup_new_group_blocks", by accidentally dropping the code which reserved the block group descriptor and inode table blocks. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 20 12月, 2010 2 次提交
-
-
由 Theodore Ts'o 提交于
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Joe Perches 提交于
Using %pV reduces the number of printk calls and eliminates any possible message interleaving from other printk calls. In function __ext4_grp_locked_error also added KERN_CONT to some printks. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-