- 17 8月, 2013 4 次提交
-
-
由 Theodore Ts'o 提交于
Don't use an unsigned long long for the es_status flags; this requires that we pass 64-bit values around which is painful on 32-bit systems. Instead pass the extent status flags around using the low 4 bits of an unsigned int, and shift them into place when we are reading or writing es_pblk. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
-
由 Theodore Ts'o 提交于
When we find an invalid extent tree block, report the block number of the bad block for debugging purposes. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
-
由 Theodore Ts'o 提交于
Refactor out the code needed to read the extent tree block into a single read_extent_tree_block() function. In addition to simplifying the code, it also makes sure that we call the ext4_ext_load_extent tracepoint whenever we need to read an extent tree block from disk. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
-
由 Jan Kara 提交于
Commit 0713ed0c added jbd2_journal_file_inode() call into ext4_block_zero_page_range(). However that function gets called from truncate path and thus inode needn't have jinode attached - that happens in ext4_file_open() but the file needn't be ever open since mount. Calling jbd2_journal_file_inode() without jinode attached results in the oops. We fix the problem by attaching jinode to inode also in ext4_truncate() and ext4_punch_hole() when we are going to zero out partial blocks. Reported-by: Nmajianpeng <majianpeng@gmail.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 12 8月, 2013 2 次提交
-
-
由 Jan Kara 提交于
When jbd2_journal_dirty_metadata() returns error, __ext4_handle_dirty_metadata() stops the handle. However callers of this function do not count with that fact and still happily used now freed handle. This use after free can result in various issues but very likely we oops soon. The motivation of adding __ext4_journal_stop() into __ext4_handle_dirty_metadata() in commit 9ea7a0df seems to be only to improve error reporting. So replace __ext4_journal_stop() with ext4_journal_abort_handle() which was there before that commit and add WARN_ON_ONCE() to dump stack to provide useful information. Reported-by: NSage Weil <sage@inktank.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org # 3.2+
-
由 Theodore Ts'o 提交于
Previously we weren't swapping only some of the extent_status LRU fields during the processing of the EXT4_IOC_SWAP_BOOT ioctl. The much safer thing to do is to just completely flush the extent status tree when doing the swap. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <gnehzuil.liu@gmail.com> Cc: stable@vger.kernel.org
-
- 09 8月, 2013 2 次提交
-
-
由 Piotr Sarna 提交于
Commit 56889787 ("ext4: improve handling of conflicting mount options") introduced incorrect messages shown while choosing wrong mount options. First of all, both cases of incorrect mount options, "data=journal,delalloc" and "data=journal,dioread_nolock" result in the same error message. Secondly, the problem above isn't solved for remount option: the mismatched parameter is simply ignored. Moreover, ext4_msg states that remount with options "data=journal,delalloc" succeeded, which is not true. To fix it up, I added a simple check after parse_options() call to ensure that data=journal and delalloc/dioread_nolock parameters are not present at the same time. Signed-off-by: NPiotr Sarna <p.sarna@partner.samsung.com> Acked-by: NBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Theodore Ts'o 提交于
Commit 26092bf5 ("ext4: use a table-driven handler for mount options") wrongly disallows the specifying the mount options nodelalloc and data=journal simultaneously. This is incorrect; it should have only disallowed the combination of delalloc and data=journal simultaneously. Reported-by: NPiotr Sarna <p.sarna@partner.samsung.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 30 7月, 2013 2 次提交
-
-
由 Zheng Liu 提交于
In commit 921f266b: ext4: add self-testing infrastructure to do a sanity check, some sanity checks were added in map_blocks to make sure 'retval == map->m_len'. Enable these checks by default and report any assertion failures using ext4_warning() and WARN_ON() since they can help us to figure out some bugs that are otherwise hard to hit. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
We tested for ENOMEM instead of -ENOMEM. Oops. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 27 7月, 2013 2 次提交
-
-
由 Eric Sandeen 提交于
Without this, module can't be reloaded. [ 500.521980] kmem_cache_sanity_check (ext4_extent_status): Cache name already exists. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org # v3.8+
-
由 Theodore Ts'o 提交于
When we try to allocate an inode, and there is a race between two CPU's trying to grab the same inode, _and_ this inode is the last free inode in the block group, make sure the group number is bumped before we continue searching the rest of the block groups. Otherwise, we end up searching the current block group twice, and we end up skipping searching the last block group. So in the unlikely situation where almost all of the inodes are allocated, it's possible that we will return ENOSPC even though there might be free inodes in that last block group. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 21 7月, 2013 1 次提交
-
-
由 Zheng Liu 提交于
When we try to open a file with O_TMPFILE flag, we will trigger a bug. The root cause is that in ext4_orphan_add() we check ->i_nlink == 0 and this check always fails because we set ->i_nlink = 1 in inode_init_always(). We can use the following program to trigger it: int main(int argc, char *argv[]) { int fd; fd = open(argv[1], O_TMPFILE, 0666); if (fd < 0) { perror("open "); return -1; } close(fd); return 0; } The oops message looks like this: kernel BUG at fs/ext4/namei.c:2572! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: dlci bridge stp hidp cmtp kernelcapi l2tp_ppp l2tp_netlink l2tp_core sctp libcrc32c rfcomm tun fuse nfnetli nk can_raw ipt_ULOG can_bcm x25 scsi_transport_iscsi ipx p8023 p8022 appletalk phonet psnap vmw_vsock_vmci_transport af_key vmw_vmci rose vsock atm can netrom ax25 af_rxrpc ir da pppoe pppox ppp_generic slhc bluetooth nfc rfkill rds caif_socket caif crc_ccitt af_802154 llc2 llc snd_hda_codec_realtek snd_hda_intel snd_hda_codec serio_raw snd_pcm pcsp kr edac_core snd_page_alloc snd_timer snd soundcore r8169 mii sr_mod cdrom pata_atiixp radeon backlight drm_kms_helper ttm CPU: 1 PID: 1812571 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #12 Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010 task: ffff88007dfe69a0 ti: ffff88010f7b6000 task.ti: ffff88010f7b6000 RIP: 0010:[<ffffffff8125ce69>] [<ffffffff8125ce69>] ext4_orphan_add+0x299/0x2b0 RSP: 0018:ffff88010f7b7cf8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8800966d3020 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88007dfe70b8 RDI: 0000000000000001 RBP: ffff88010f7b7d40 R08: ffff880126a3c4e0 R09: ffff88010f7b7ca0 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801271fd668 R13: ffff8800966d2f78 R14: ffff88011d7089f0 R15: ffff88007dfe69a0 FS: 00007f70441a3740(0000) GS:ffff88012a800000(0000) knlGS:00000000f77c96c0 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000002834000 CR3: 0000000107964000 CR4: 00000000000007e0 DR0: 0000000000780000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Stack: 0000000000002000 00000020810b6dde 0000000000000000 ffff88011d46db00 ffff8800966d3020 ffff88011d7089f0 ffff88009c7f4c10 ffff88010f7b7f2c ffff88007dfe69a0 ffff88010f7b7da8 ffffffff8125cfac ffff880100000004 Call Trace: [<ffffffff8125cfac>] ext4_tmpfile+0x12c/0x180 [<ffffffff811cba78>] path_openat+0x238/0x700 [<ffffffff8100afc4>] ? native_sched_clock+0x24/0x80 [<ffffffff811cc647>] do_filp_open+0x47/0xa0 [<ffffffff811db73f>] ? __alloc_fd+0xaf/0x200 [<ffffffff811ba2e4>] do_sys_open+0x124/0x210 [<ffffffff81010725>] ? syscall_trace_enter+0x25/0x290 [<ffffffff811ba3ee>] SyS_open+0x1e/0x20 [<ffffffff816ca8d4>] tracesys+0xdd/0xe2 [<ffffffff81001001>] ? start_thread_common.constprop.6+0x1/0xa0 Code: 04 00 00 00 89 04 24 31 c0 e8 c4 77 04 00 e9 43 fe ff ff 66 25 00 d0 66 3d 00 80 0f 84 0e fe ff ff 83 7b 48 00 0f 84 04 fe ff ff <0f> 0b 49 8b 8c 24 50 07 00 00 e9 88 fe ff ff 0f 1f 84 00 00 00 Here we couldn't call clear_nlink() directly because in d_tmpfile() we will call inode_dec_link_count() to decrease ->i_nlink. So this commit tries to call d_tmpfile() before ext4_orphan_add() to fix this problem. Reported-by: NDave Jones <davej@redhat.com> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Tested-by: NDarrick J. Wong <darrick.wong@oracle.com> Tested-by: NDave Jones <davej@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 7月, 2013 2 次提交
-
-
由 Theodore Ts'o 提交于
If there are no items in the extent status tree, ext4_es_lru_add() is a no-op. So it is not sufficient to call ext4_es_lru_add() before we try to lookup an entry in the extent status tree. We also need to call it at the end of ext4_ext_map_blocks(), after items have been added to the extent status tree. This could lead to inodes with that have extent status trees but which are not in the LRU list, which means they won't get considered for eviction by the es_shrinker. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <wenqing.lz@taobao.com> Cc: stable@vger.kernel.org
-
由 Theodore Ts'o 提交于
During large unlink operations on files with extents, we can use a lot of CPU time. This adds a cond_resched() call when starting to examine the next level of a multi-level extent tree. Multi-level extent trees are rare in the first place, and this should rarely be executed. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 15 7月, 2013 3 次提交
-
-
由 Theodore Ts'o 提交于
Some callers of ext4_es_remove_extent() and ext4_es_insert_extent() may not be completely robust against ENOMEM failures (or the consequences of reflecting ENOMEM back up to userspace may lead to xfstest or user application failure). To mitigate against this, when trying to insert an entry in the extent status tree, try to shrink the inode's extent status tree before returning ENOMEM. If there are entries which don't record information about extents under delayed allocations, freeing one of them is preferable to returning ENOMEM. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
-
由 Theodore Ts'o 提交于
In ext4_ext_map_blocks(), if we have successfully allocated the data blocks, but then run into trouble inserting the extent into the extent tree, most likely due to an ENOSPC condition, determine the arguments to ext4_free_blocks() in a simpler way which is easier to prove to be correct. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Previously ext4_ext_truncate() was ignoring potential error returns from ext4_es_remove_extent() and ext4_ext_remove_space(). This can lead to the on-diks extent tree and the extent status tree cache getting out of sync, which is particuarlly bad, and can lead to file system corruption and potential data loss. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 13 7月, 2013 2 次提交
-
-
由 Theodore Ts'o 提交于
The filesystem should not be marked inconsistent if ext4_free_blocks() is not able to allocate memory. Unfortunately some callers (most notably ext4_truncate) don't have a way to reflect an error back up to the VFS. And even if we did, most userspace applications won't deal with most system calls returning ENOMEM anyway. Reported-by: NNagachandra P <nagachandra@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Theodore Ts'o 提交于
Replace "assertation" with "assertion" in lots and lots of debugging messages. Correct the comment stating when ext4_es_insert_extent() is used. It was no doubt tree at one point, but it is no longer true... Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <gnehzuil.liu@gmail.com>
-
- 12 7月, 2013 2 次提交
-
-
由 Anatol Pomozov 提交于
If there are a lot of outstanding buffered IOs when a device is taken offline (due to hardware errors etc), ext4_end_bio prints out a message for each failed logical block. While this is desirable, we see thousands of such lines being printed out before the serial console gets overwhelmed, causing ext4_end_bio() wait for the printk to complete. This in itself isn't a disaster, except for the detail that this function is being called with the queue lock held. This causes any other function in the block layer to spin on its spin_lock_irqsave while the serial console is draining. If NMI watchdog is enabled on this machine then it eventually comes along and shoots the machine in the head. The end result is that losing any one disk causes the machine to go down. This patch rate limits the printk to bandaid around the problem. Tested: xfstests Change-Id: I8ab5690dcf4f3a67e78be147d45e489fdf4a88d8 Signed-off-by: NAnatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
We now print mount options in a generic fashion in ext4_show_options(), so we shouldn't be explicitly printing the {usr,grp}quota options in ext4_show_quota_options(). Without this patch, /proc/mounts can look like this: /dev/vdb /vdb ext4 rw,relatime,quota,usrquota,data=ordered,usrquota 0 0 ^^^^^^^^ ^^^^^^^^ Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 11 7月, 2013 1 次提交
-
-
由 Jan Kara 提交于
The following race can lead to ext4_evict_inode() seeing i_ioend_count > 0 and thus triggering a sanity check warning: CPU1 CPU2 ext4_end_bio() ext4_evict_inode() ext4_finish_bio() end_page_writeback(); truncate_inode_pages() evict page WARN_ON(i_ioend_count > 0); ext4_put_io_end_defer() ext4_release_io_end() dec i_ioend_count This is possible use-after-free bug since we decrement i_ioend_count in possibly released inode. Since i_ioend_count is used only for sanity checks one possible solution would be to just remove it but for now I'd like to keep those sanity checks to help debugging the new ext4 writeback code. This patch changes ext4_end_bio() to call ext4_put_io_end_defer() before ext4_finish_bio() in the shortcut case when unwritten extent conversion isn't needed. In that case we don't need the io_end so we are safe to drop it early. Reported-by: NGuenter Roeck <linux@roeck-us.net> Tested-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 06 7月, 2013 2 次提交
-
-
由 Theodore Ts'o 提交于
The function ext4_get_group_number() was introduced as an optimization in commit bd86298e. Unfortunately, this commit incorrectly calculate the group number for file systems with a 1k block size (when s_first_data_block is 1 instead of zero). This could cause the following kernel BUG: [ 568.877799] ------------[ cut here ]------------ [ 568.877833] kernel BUG at fs/ext4/mballoc.c:3728! [ 568.877840] Oops: Exception in kernel mode, sig: 5 [#1] [ 568.877845] SMP NR_CPUS=32 NUMA pSeries [ 568.877852] Modules linked in: binfmt_misc [ 568.877861] CPU: 1 PID: 3516 Comm: fs_mark Not tainted 3.10.0-03216-g7c6809ff-dirty #1 [ 568.877867] task: c0000001fb0b8000 ti: c0000001fa954000 task.ti: c0000001fa954000 [ 568.877873] NIP: c0000000002f42a4 LR: c0000000002f4274 CTR: c000000000317ef8 [ 568.877879] REGS: c0000001fa956ed0 TRAP: 0700 Not tainted (3.10.0-03216-g7c6809ff-dirty) [ 568.877884] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 24000428 XER: 00000000 [ 568.877902] SOFTE: 1 [ 568.877905] CFAR: c0000000002b5464 [ 568.877908] GPR00: 0000000000000001 c0000001fa957150 c000000000c6a408 c0000001fb588000 GPR04: 0000000000003fff c0000001fa9571c0 c0000001fa9571c4 000138098c50625f GPR08: 1301200000000000 0000000000000002 0000000000000001 0000000000000000 GPR12: 0000000024000422 c00000000f33a300 0000000000008000 c0000001fa9577f0 GPR16: c0000001fb7d0100 c000000000c29190 c0000000007f46e8 c000000000a14672 GPR20: 0000000000000001 0000000000000008 ffffffffffffffff 0000000000000000 GPR24: 0000000000000100 c0000001fa957278 c0000001fdb2bc78 c0000001fa957288 GPR28: 0000000000100100 c0000001fa957288 c0000001fb588000 c0000001fdb2bd10 [ 568.877993] NIP [c0000000002f42a4] .ext4_mb_release_group_pa+0xec/0x1c0 [ 568.877999] LR [c0000000002f4274] .ext4_mb_release_group_pa+0xbc/0x1c0 [ 568.878004] Call Trace: [ 568.878008] [c0000001fa957150] [c0000000002f4274] .ext4_mb_release_group_pa+0xbc/0x1c0 (unreliable) [ 568.878017] [c0000001fa957200] [c0000000002fb070] .ext4_mb_discard_lg_preallocations+0x394/0x444 [ 568.878025] [c0000001fa957340] [c0000000002fb45c] .ext4_mb_release_context+0x33c/0x734 [ 568.878032] [c0000001fa957440] [c0000000002fbcf8] .ext4_mb_new_blocks+0x4a4/0x5f4 [ 568.878039] [c0000001fa957510] [c0000000002ef56c] .ext4_ext_map_blocks+0xc28/0x1178 [ 568.878047] [c0000001fa957640] [c0000000002c1a94] .ext4_map_blocks+0x2c8/0x490 [ 568.878054] [c0000001fa957730] [c0000000002c536c] .ext4_writepages+0x738/0xc60 [ 568.878062] [c0000001fa957950] [c000000000168a78] .do_writepages+0x5c/0x80 [ 568.878069] [c0000001fa9579d0] [c00000000015d1c4] .__filemap_fdatawrite_range+0x88/0xb0 [ 568.878078] [c0000001fa957aa0] [c00000000015d23c] .filemap_write_and_wait_range+0x50/0xfc [ 568.878085] [c0000001fa957b30] [c0000000002b8edc] .ext4_sync_file+0x220/0x3c4 [ 568.878092] [c0000001fa957be0] [c0000000001f849c] .vfs_fsync_range+0x64/0x80 [ 568.878098] [c0000001fa957c70] [c0000000001f84f0] .vfs_fsync+0x38/0x4c [ 568.878105] [c0000001fa957d00] [c0000000001f87f4] .do_fsync+0x54/0x90 [ 568.878111] [c0000001fa957db0] [c0000000001f8894] .SyS_fsync+0x28/0x3c [ 568.878120] [c0000001fa957e30] [c000000000009c88] syscall_exit+0x0/0x7c [ 568.878125] Instruction dump: [ 568.878130] 60000000 813d0034 81610070 38000000 7f8b4800 419e001c 813f007c 7d2bfe70 [ 568.878144] 7d604a78 7c005850 54000ffe 7c0007b4 <0b000000> e8a10076 e87f0090 7fa4eb78 [ 568.878160] ---[ end trace 594d911d9654770b ]--- In addition fix the STD_GROUP optimization so that it works for bigalloc file systems as well. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reported-by: NLi Zhong <lizhongfs@gmail.com> Reviewed-by: NLukas Czerner <lczerner@redhat.com> Cc: stable@vger.kernel.org # 3.10
-
由 Jan Kara 提交于
The loop in mpage_map_and_submit_extent() is guaranteed to always run at least once since the caller of mpage_map_and_submit_extent() makes sure map->m_len > 0. So make that explicit using do-while instead of pure while which also silences the compiler warning about uninitialized 'err' variable. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NLukas Czerner <lczerner@redhat.com>
-
- 03 7月, 2013 2 次提交
-
-
由 Al Viro 提交于
very similar to ext3 counterpart... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jie Liu 提交于
For those file systems(btrfs/ext4/ocfs2/tmpfs) that support SEEK_DATA/SEEK_HOLE functions, we end up handling the similar matter in lseek_execute() to update the current file offset to the desired offset if it is valid, ceph also does the simliar things at ceph_llseek(). To reduce the duplications, this patch make lseek_execute() public accessible so that we can call it directly from the underlying file systems. Thanks Dave Chinner for this suggestion. [AV: call it vfs_setpos(), don't bring the removed 'inode' argument back] v2->v1: - Add kernel-doc comments for lseek_execute() - Call lseek_execute() in ceph->llseek() Signed-off-by: NJie Liu <jeff.liu@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Chris Mason <chris.mason@fusionio.com> Cc: Josef Bacik <jbacik@fusionio.com> Cc: Ben Myers <bpm@sgi.com> Cc: Ted Tso <tytso@mit.edu> Cc: Hugh Dickins <hughd@google.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Sage Weil <sage@inktank.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 01 7月, 2013 13 次提交
-
-
由 Ashish Sangwan 提交于
Both hole punch and truncate use ext4_ext_rm_leaf() for removing blocks. Currently we choose the last extent as the starting point for removing blocks: ex = EXT_LAST_EXTENT(eh); This is OK for truncate but for hole punch we can optimize the extent selection as the path is already initialized. We could use this information to select proper starting extent. The code change in this patch will not affect truncate as for truncate path[depth].p_ext will always be NULL. Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com> Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Translate the bitfields used in various flags argument to strings to make the tracepoint output more human-readable. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The function mpage_released_unused_page() must only be called once; otherwise the kernel will BUG() when the second call to mpage_released_unused_page() tries to unlock the pages which had been unlocked by the first call. Also restructure the error handling so that we only give up on writing the dirty pages in the case of ENOSPC where retrying the allocation won't help. Otherwise, a transient failure, such as a kmalloc() failure in calling ext4_map_blocks() might cause us to give up on those pages, leading to a scary message in /var/log/messages plus data loss. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 Lukas Czerner 提交于
Currently if we pass range into ext4_zero_partial_blocks() which covers entire block we would attempt to zero it even though we should only zero unaligned part of the block. Fix this by checking whether the range covers the whole block skip zeroing if so. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The function ext4_write_inline_data_end() can return an error. So we need to assign it to a signed integer variable to check for an error return (since copied is an unsigned int). Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <wenqing.lz@taobao.com> Cc: stable@vger.kernel.org
-
由 jon ernst 提交于
Comparing unsigned variable with 0 always returns false. err = 0 is duplicated and unnecessary. [ tytso: Also cleaned up error handling in ext4_block_zero_page_range() ] Signed-off-by: N"Jon Ernst" <jonernst07@gmx.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Al Viro 提交于
Both ext3 and ext4 htree_dirblock_to_tree() is just filling the in-core rbtree for use by call_filldir(). All updates of ->f_pos are done by the latter; bumping it here (on error) is obviously wrong - we might very well have it nowhere near the block we'd found an error in. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Ashish Sangwan 提交于
No need to pass file pointer when we can directly pass inode pointer. Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com> Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 boxi liu 提交于
In ext4 feature inline_data,it use the xattr's space to store the inline data in inode.When we calculate the inline data as the xattr,we add the pad.But in get_max_inline_xattr_value_size() function we count the free space without pad.It cause some contents are moved to a block even if it can be stored in the inode. Signed-off-by: Nliulei <lewis.liulei@huawei.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NTao Ma <boyu.mt@taobao.com>
-
由 Joe Perches 提交于
Reduce the object size ~10% could be useful for embedded systems. Add #ifdef CONFIG_PRINTK #else #endif blocks to hold formats and arguments, passing " " to functions when !CONFIG_PRINTK and still verifying format and arguments with no_printk. $ size fs/ext4/built-in.o* text data bss dec hex filename 239375 610 888 240873 3ace9 fs/ext4/built-in.o.new 264167 738 888 265793 40e41 fs/ext4/built-in.o.old $ grep -E "CONFIG_EXT4|CONFIG_PRINTK" .config # CONFIG_PRINTK is not set CONFIG_EXT4_FS=y CONFIG_EXT4_USE_FOR_EXT23=y CONFIG_EXT4_FS_POSIX_ACL=y # CONFIG_EXT4_FS_SECURITY is not set # CONFIG_EXT4_DEBUG is not set Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Zheng Liu 提交于
Now we maintain an proper in-order LRU list in ext4 to reclaim entries from extent status tree when we are under heavy memory pressure. For keeping this order, a spin lock is used to protect this list. But this lock burns a lot of CPU time. We can use the following steps to trigger it. % cd /dev/shm % dd if=/dev/zero of=ext4-img bs=1M count=2k % mkfs.ext4 ext4-img % mount -t ext4 -o loop ext4-img /mnt % cd /mnt % for ((i=0;i<160;i++)); do truncate -s 64g $i; done % for ((i=0;i<160;i++)); do cp $i /dev/null &; done % perf record -a -g % perf report This commit tries to fix this problem. Now a new member called i_touch_when is added into ext4_inode_info to record the last access time for an inode. Meanwhile we never need to keep a proper in-order LRU list. So this can avoid to burns some CPU time. When we try to reclaim some entries from extent status tree, we use list_sort() to get a proper in-order list. Then we traverse this list to discard some entries. In ext4_sb_info, we use s_es_last_sorted to record the last time of sorting this list. When we traverse the list, we skip the inode that is newer than this time, and move this inode to the tail of LRU list. When the head of the list is newer than s_es_last_sorted, we will sort the LRU list again. In this commit, we break the loop if s_extent_cache_cnt == 0 because that means that all extents in extent status tree have been reclaimed. Meanwhile in this commit, ext4_es_{un}register_shrinker()'s prototype is changed to save a local variable in these functions. Reported-by: NDave Hansen <dave.hansen@intel.com> Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Alexey Khoroshilov 提交于
If memory allocation in ext4_mb_new_group_pa() is failed, it returns error code, ext4_mb_new_preallocation() propages it, but ext4_mb_new_blocks() ignores it. An observed result was: - allocation fail means ext4_mb_new_group_pa() does not update ext4_allocation_context; - ext4_mb_new_blocks() sets ext4_allocation_request->len (ar->len = ac->ac_b_ex.fe_len;) to number of blocks preallocated (512) instead of number of blocks requested (1); - that activates update cycle in ext4_splice_branch(): for (i = 1; i < blks; i++) <-- blks is 512 instead of 1 here *(where->p + i) = cpu_to_le32(current_block++); - it iterates 511 times and corrupts a chunk of memory including inode structure; - page fault happens at EXT4_SB(inode->i_sb) in ext4_mark_inode_dirty(); - system hangs with 'scheduling while atomic' BUG. The patch implements a check for ext4_mb_new_preallocation() error code and handles its failure as if ext4_mb_regular_allocator() fails. Found by Linux File System Verification project (linuxtesting.org). [ Patch restructed by tytso to make the flow of control easier to follow. ] Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Maarten ter Huurne 提交于
Subtracting the number of the first data block places the superblock backups one block too early, corrupting the file system. When the block size is larger than 1K, the first data block is 0, so the subtraction has no effect and no corruption occurs. Signed-off-by: NMaarten ter Huurne <maarten@treewalker.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> CC: stable@vger.kernel.org
-