- 30 6月, 2016 4 次提交
-
-
由 Arnd Bergmann 提交于
The jbd2 journal stores the commit time in 64-bit seconds and 32-bit nanoseconds, which avoids an overflow in 2038, but it gets the numbers from current_kernel_time(), which uses 'long' seconds on 32-bit architectures. This simply changes the code to call current_kernel_time64() so we use 64-bit seconds consistently. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
-
由 Jan Kara 提交于
So far we were tracking only dependency on transaction commit due to starting a new handle (which may require commit to start a new transaction). Now add tracking also for other cases where we wait for transaction commit. This way lockdep can catch deadlocks e. g. because we call jbd2_journal_stop() for a synchronous handle with some locks held which rank below transaction start. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
Currently lockdep map is tracked in each journal handle. To be able to expand lockdep support to cover also other cases where we depend on transaction commit and where handle is not available, move lockdep map into struct journal_s. Since this makes the lockdep map shared for all handles, we have to use rwsem_acquire_read() for acquisitions now. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
The transaction the handle references is free to commit once we've decremented t_updates counter. Move the lockdep instrumentation to that place. Currently it was a bit later which did not really matter but subsequent improvements to lockdep instrumentation would cause false positives with it. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 24 4月, 2016 1 次提交
-
-
由 Jan Kara 提交于
Currently when filesystem needs to make sure data is on permanent storage before committing a transaction it adds inode to transaction's inode list. During transaction commit, jbd2 writes back all dirty buffers that have allocated underlying blocks and waits for the IO to finish. However when doing writeback for delayed allocated data, we allocate blocks and immediately submit the data. Thus asking jbd2 to write dirty pages just unnecessarily adds more work to jbd2 possibly writing back other redirtied blocks. Add support to jbd2 to allow filesystem to ask jbd2 to only wait for outstanding data writes before committing a transaction and thus avoid unnecessary writes. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 18 4月, 2016 1 次提交
-
-
由 Masanari Iida 提交于
This patch fix spelling typos found in DocBook/filesystem.xml. It is because the file was generated from comments in code, I have to fix the comments in codes, instead of xml file. Signed-off-by: NMasanari Iida <standby24x7@gmail.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 05 4月, 2016 1 次提交
-
-
由 Kirill A. Shutemov 提交于
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 3月, 2016 1 次提交
-
-
由 Michal Hocko 提交于
Journal transaction might fail prematurely because the frozen_buffer is allocated by GFP_NOFS request: [ 72.440013] do_get_write_access: OOM for frozen_buffer [ 72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access [ 72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory (...snipped....) [ 72.495559] do_get_write_access: OOM for frozen_buffer [ 72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access [ 72.496839] do_get_write_access: OOM for frozen_buffer [ 72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access [ 72.505766] Aborting journal on device sda1-8. [ 72.505851] EXT4-fs (sda1): Remounting filesystem read-only This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM" because small GPF_NOFS allocations never failed. This allocation seems essential for the journal and GFP_NOFS is too restrictive to the memory allocator so let's use __GFP_NOFAIL here to emulate the previous behavior. Signed-off-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 10 3月, 2016 1 次提交
-
-
由 OGAWA Hirofumi 提交于
On umount path, jbd2_journal_destroy() writes latest transaction ID (->j_tail_sequence) to be used at next mount. The bug is that ->j_tail_sequence is not holding latest transaction ID in some cases. So, at next mount, there is chance to conflict with remaining (not overwritten yet) transactions. mount (id=10) write transaction (id=11) write transaction (id=12) umount (id=10) <= the bug doesn't write latest ID mount (id=10) write transaction (id=11) crash mount [recovery process] transaction (id=11) transaction (id=12) <= valid transaction ID, but old commit must not replay Like above, this bug become the cause of recovery failure, or FS corruption. So why ->j_tail_sequence doesn't point latest ID? Because if checkpoint transactions was reclaimed by memory pressure (i.e. bdev_try_to_free_page()), then ->j_tail_sequence is not updated. (And another case is, __jbd2_journal_clean_checkpoint_list() is called with empty transaction.) So in above cases, ->j_tail_sequence is not pointing latest transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not done too. So, to fix this problem with minimum changes, this patch updates ->j_tail_sequence, and issue REQ_FLUSH. (With more complex changes, some optimizations would be possible to avoid unnecessary REQ_FLUSH for example though.) BTW, journal->j_tail_sequence = ++journal->j_transaction_sequence; Increment of ->j_transaction_sequence seems to be unnecessary, but ext3 does this. Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 23 2月, 2016 4 次提交
-
-
由 Jan Kara 提交于
Currently we used atomic bit operations to manipulate __JI_COMMIT_RUNNING bit. However this is unnecessary as i_flags are always written and read under j_list_lock. So just change the operations to standard bit operations. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
Revoke and tag descriptor blocks are just different kinds of descriptor blocks and thus have checksum in the same place. Unify computation and checking of checksums for these. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
Descriptor block header is initialized in several places. Factor out the common code into jbd2_journal_get_descriptor_buffer(). Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
jbd2_journal_write_revoke_records() takes journal pointer and write_op, although journal can be obtained from the passed transaction and write_op is always WRITE_SYNC. Remove these superfluous arguments. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 07 1月, 2016 1 次提交
-
-
由 Dmitry Monakhov 提交于
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 12月, 2015 1 次提交
-
-
由 Junxiao Bi 提交于
introduced jbd2_write_access_granted() to improve write|undo_access speed, but missed to check the status of b_committed_data which caused a kernel panic on ocfs2. [ 6538.405938] ------------[ cut here ]------------ [ 6538.406686] kernel BUG at fs/ocfs2/suballoc.c:2400! [ 6538.406686] invalid opcode: 0000 [#1] SMP [ 6538.406686] Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront xen_fbfront parport_pc parport pcspkr i2c_piix4 acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix cirrus ttm drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect i2c_core syscopyarea dm_mirror dm_region_hash dm_log dm_mod [ 6538.406686] CPU: 1 PID: 16265 Comm: mmap_truncate Not tainted 4.3.0 #1 [ 6538.406686] Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014 [ 6538.406686] task: ffff88007c2bab00 ti: ffff880075b78000 task.ti: ffff880075b78000 [ 6538.406686] RIP: 0010:[<ffffffffa06a286b>] [<ffffffffa06a286b>] ocfs2_block_group_clear_bits+0x23b/0x250 [ocfs2] [ 6538.406686] RSP: 0018:ffff880075b7b7f8 EFLAGS: 00010246 [ 6538.406686] RAX: ffff8800760c5b40 RBX: ffff88006c06a000 RCX: ffffffffa06e6df0 [ 6538.406686] RDX: 0000000000000000 RSI: ffff88007a6f6ea0 RDI: ffff88007a760430 [ 6538.406686] RBP: ffff880075b7b878 R08: 0000000000000002 R09: 0000000000000001 [ 6538.406686] R10: ffffffffa06769be R11: 0000000000000000 R12: 0000000000000001 [ 6538.406686] R13: ffffffffa06a1750 R14: 0000000000000001 R15: ffff88007a6f6ea0 [ 6538.406686] FS: 00007f17fde30720(0000) GS:ffff88007f040000(0000) knlGS:0000000000000000 [ 6538.406686] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6538.406686] CR2: 0000000000601730 CR3: 000000007aea0000 CR4: 00000000000406e0 [ 6538.406686] Stack: [ 6538.406686] ffff88007c2bb5b0 ffff880075b7b8e0 ffff88007a7604b0 ffff88006c640800 [ 6538.406686] ffff88007a7604b0 ffff880075d77390 0000000075b7b878 ffffffffa06a309d [ 6538.406686] ffff880075d752d8 ffff880075b7b990 ffff880075b7b898 0000000000000000 [ 6538.406686] Call Trace: [ 6538.406686] [<ffffffffa06a309d>] ? ocfs2_read_group_descriptor+0x6d/0xa0 [ocfs2] [ 6538.406686] [<ffffffffa06a3654>] _ocfs2_free_suballoc_bits+0xe4/0x320 [ocfs2] [ 6538.406686] [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2] [ 6538.406686] [<ffffffffa06a397e>] _ocfs2_free_clusters+0xee/0x210 [ocfs2] [ 6538.406686] [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2] [ 6538.406686] [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2] [ 6538.406686] [<ffffffffa0682d50>] ? ocfs2_extend_trans+0x50/0x1a0 [ocfs2] [ 6538.406686] [<ffffffffa06a3ad5>] ocfs2_free_clusters+0x15/0x20 [ocfs2] [ 6538.406686] [<ffffffffa065072c>] ocfs2_replay_truncate_records+0xfc/0x290 [ocfs2] [ 6538.406686] [<ffffffffa06843ac>] ? ocfs2_start_trans+0xec/0x1d0 [ocfs2] [ 6538.406686] [<ffffffffa0654600>] __ocfs2_flush_truncate_log+0x140/0x2d0 [ocfs2] [ 6538.406686] [<ffffffffa0654394>] ? ocfs2_reserve_blocks_for_rec_trunc.clone.0+0x44/0x170 [ocfs2] [ 6538.406686] [<ffffffffa065acd4>] ocfs2_remove_btree_range+0x374/0x630 [ocfs2] [ 6538.406686] [<ffffffffa017486b>] ? jbd2_journal_stop+0x25b/0x470 [jbd2] [ 6538.406686] [<ffffffffa065d5b5>] ocfs2_commit_truncate+0x305/0x670 [ocfs2] [ 6538.406686] [<ffffffffa0683430>] ? ocfs2_journal_access_eb+0x20/0x20 [ocfs2] [ 6538.406686] [<ffffffffa067adb7>] ocfs2_truncate_file+0x297/0x380 [ocfs2] [ 6538.406686] [<ffffffffa01759e4>] ? jbd2_journal_begin_ordered_truncate+0x64/0xc0 [jbd2] [ 6538.406686] [<ffffffffa067c7a2>] ocfs2_setattr+0x572/0x860 [ocfs2] [ 6538.406686] [<ffffffff810e4a3f>] ? current_fs_time+0x3f/0x50 [ 6538.406686] [<ffffffff812124b7>] notify_change+0x1d7/0x340 [ 6538.406686] [<ffffffff8121abf9>] ? generic_getxattr+0x79/0x80 [ 6538.406686] [<ffffffff811f5876>] do_truncate+0x66/0x90 [ 6538.406686] [<ffffffff81120e30>] ? __audit_syscall_entry+0xb0/0x110 [ 6538.406686] [<ffffffff811f5bb3>] do_sys_ftruncate.clone.0+0xf3/0x120 [ 6538.406686] [<ffffffff811f5bee>] SyS_ftruncate+0xe/0x10 [ 6538.406686] [<ffffffff816aa2ae>] entry_SYSCALL_64_fastpath+0x12/0x71 [ 6538.406686] Code: 28 48 81 ee b0 04 00 00 48 8b 92 50 fb ff ff 48 8b 80 b0 03 00 00 48 39 90 88 00 00 00 0f 84 30 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b 0f 1f 00 eb fb 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 [ 6538.406686] RIP [<ffffffffa06a286b>] ocfs2_block_group_clear_bits+0x23b/0x250 [ocfs2] [ 6538.406686] RSP <ffff880075b7b7f8> [ 6538.691128] ---[ end trace 31cd7011d6770d7e ]--- [ 6538.694492] Kernel panic - not syncing: Fatal exception [ 6538.695484] Kernel Offset: disabled Fixes: de92c8ca("jbd2: speedup jbd2_journal_get_[write|undo]_access()") Cc: <stable@vger.kernel.org> Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 25 11月, 2015 1 次提交
-
-
由 Jan Kara 提交于
Ted and Namjae have reported that truncated pages don't get timely reclaimed after being truncated in data=journal mode. The following test triggers the issue easily: for (i = 0; i < 1000; i++) { pwrite(fd, buf, 1024*1024, 0); fsync(fd); fsync(fd); ftruncate(fd, 0); } The reason is that journal_unmap_buffer() finds that truncated buffers are not journalled (jh->b_transaction == NULL), they are part of checkpoint list of a transaction (jh->b_cp_transaction != NULL) and have been already written out (!buffer_dirty(bh)). We clean such buffers but we leave them in the checkpoint list. Since checkpoint transaction holds a reference to the journal head, these buffers cannot be released until the checkpoint transaction is cleaned up. And at that point we don't call release_buffer_page() anymore so pages detached from mapping are lingering in the system waiting for reclaim to find them and free them. Fix the problem by removing buffers from transaction checkpoint lists when journal_unmap_buffer() finds out they don't have to be there anymore. Reported-and-tested-by: NNamjae Jeon <namjae.jeon@samsung.com> Fixes: de1b7941Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 07 11月, 2015 1 次提交
-
-
由 Mel Gorman 提交于
mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd __GFP_WAIT has been used to identify atomic context in callers that hold spinlocks or are in interrupts. They are expected to be high priority and have access one of two watermarks lower than "min" which can be referred to as the "atomic reserve". __GFP_HIGH users get access to the first lower watermark and can be called the "high priority reserve". Over time, callers had a requirement to not block when fallback options were available. Some have abused __GFP_WAIT leading to a situation where an optimisitic allocation with a fallback option can access atomic reserves. This patch uses __GFP_ATOMIC to identify callers that are truely atomic, cannot sleep and have no alternative. High priority users continue to use __GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify callers that want to wake kswapd for background reclaim. __GFP_WAIT is redefined as a caller that is willing to enter direct reclaim and wake kswapd for background reclaim. This patch then converts a number of sites o __GFP_ATOMIC is used by callers that are high priority and have memory pools for those requests. GFP_ATOMIC uses this flag. o Callers that have a limited mempool to guarantee forward progress clear __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall into this category where kswapd will still be woken but atomic reserves are not used as there is a one-entry mempool to guarantee progress. o Callers that are checking if they are non-blocking should use the helper gfpflags_allow_blocking() where possible. This is because checking for __GFP_WAIT as was done historically now can trigger false positives. Some exceptions like dm-crypt.c exist where the code intent is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to flag manipulations. o Callers that built their own GFP flags instead of starting with GFP_KERNEL and friends now also need to specify __GFP_KSWAPD_RECLAIM. The first key hazard to watch out for is callers that removed __GFP_WAIT and was depending on access to atomic reserves for inconspicuous reasons. In some cases it may be appropriate for them to use __GFP_HIGH. The second key hazard is callers that assembled their own combination of GFP flags instead of starting with something like GFP_KERNEL. They may now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless if it's missed in most cases as other activity will wake kswapd. Signed-off-by: NMel Gorman <mgorman@techsingularity.net> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 10月, 2015 1 次提交
-
-
由 Daeho Jeong 提交于
If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the journaling will be aborted first and the error number will be recorded into JBD2 superblock and, finally, the system will enter into the panic state in "errors=panic" option. But, in the rare case, this sequence is little twisted like the below figure and it will happen that the system enters into panic state, which means the system reset in mobile environment, before completion of recording an error in the journal superblock. In this case, e2fsck cannot recognize that the filesystem failure occurred in the previous run and the corruption wouldn't be fixed. Task A Task B ext4_handle_error() -> jbd2_journal_abort() -> __journal_abort_soft() -> __jbd2_journal_abort_hard() | -> journal->j_flags |= JBD2_ABORT; | | __ext4_abort() | -> jbd2_journal_abort() | | -> __journal_abort_soft() | | -> if (journal->j_flags & JBD2_ABORT) | | return; | -> panic() | -> jbd2_journal_update_sb_errno() Tested-by: NHobin Woo <hobin.woo@samsung.com> Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 18 10月, 2015 3 次提交
-
-
由 Jan Kara 提交于
Unlike comments and expectation of callers journal_clean_one_cp_list() returned 1 not only if it freed the transaction but also if it freed some buffers in the transaction. That could make __jbd2_journal_clean_checkpoint_list() skip processing t_checkpoint_io_list and continue with processing the next transaction. This is mostly a cosmetic issue since the only result is we can sometimes free less memory than we could. But it's still worth fixing. Fix journal_clean_one_cp_list() to return 1 only if the transaction was really freed. Fixes: 50849db3Signed-off-by: NJan Kara <jack@suse.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Darrick J. Wong 提交于
Create separate predicate functions to test/set/clear feature flags, thereby replacing the wordy old macros. Furthermore, clean out the places where we open-coded feature tests. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Darrick J. Wong 提交于
Instead of overloading EIO for CRC errors and corrupt structures, return the same error codes that XFS returns for the same issues. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 15 10月, 2015 1 次提交
-
-
由 Darrick J. Wong 提交于
Change the journal's checksum functions to gate on whether or not the crc32c driver is loaded, and gate the loading on the superblock bits. This prevents a journal crash if someone loads a journal in no-csum mode and then randomizes the superblock, thus flipping on the feature bits. Tested-By: NNikolay Borisov <kernel@kyup.com> Reported-by: NNikolay Borisov <kernel@kyup.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 04 8月, 2015 1 次提交
-
-
由 Lukas Czerner 提交于
Currently there is no limitation on number of reserved credits we can ask for. If we ask for more reserved credits than 1/2 of maximum transaction size, or if total number of credits exceeds the maximum transaction size per operation (which is currently only possible with the former) we will spin forever in start_this_handle(). Fix this by adding this limitation at the start of start_this_handle(). This patch also removes the credit limitation 1/2 of maximum transaction size, since we really only want to limit the number of reserved credits. There is not much point to limit the credits if there is still space in the journal. This accidentally also fixes the online resize, where due to the limitation of the journal credits we're unable to grow file systems with 1k block size and size between 16M and 32M. It has been partially fixed by 2c869b26, but not entirely. Thanks Jan Kara for helping me getting the correct fix. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 29 7月, 2015 1 次提交
-
-
由 Jan Kara 提交于
Commit 6f6a6fda "jbd2: fix ocfs2 corrupt when updating journal superblock fails" changed jbd2_cleanup_journal_tail() to return EIO when the journal is aborted. That makes logic in jbd2_log_do_checkpoint() bail out which is fine, except that jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make a progress in cleaning the journal. Without it jbd2_journal_destroy() just loops in an infinite loop. Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of jbd2_log_do_checkpoint() fails with error. Reported-by: NEryu Guan <guaneryu@gmail.com> Tested-by: NEryu Guan <guaneryu@gmail.com> Fixes: 6f6a6fdaSigned-off-by: NJan Kara <jack@suse.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 23 7月, 2015 1 次提交
-
-
由 Daeho Jeong 提交于
When an error condition is detected, an error status should be recorded into superblocks of EXT4 or JBD2. However, the write request is submitted now without REQ_FUA flag, even in "barrier=1" mode, which is followed by panic() function in "errors=panic" mode. On mobile devices which make whole system reset as soon as kernel panic occurs, this write request containing an error flag will disappear just from storage cache without written to the physical cells. Therefore, when next start, even forever, the error flag cannot be shown in both superblocks, and e2fsck cannot fix the filesystem problems automatically, unless e2fsck is executed in force checking mode. [ Changed use test_opt(sb, BARRIER) of checking the journal flags -- TYT ] Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 13 7月, 2015 1 次提交
-
-
由 Jan Kara 提交于
It is often the case that we mark buffer as having dirty metadata when the buffer is already in that state (frequent for bitmaps, inode table blocks, superblock). Thus it is unnecessary to contend on grabbing journal head reference and bh_state lock. Avoid that by checking whether any modification to the buffer is needed before grabbing any locks or references. [ Note: this is a fixed version of commit 2143c196, which was reverted in ebeaa8dd due to a false positive triggering of an assertion check. -- Ted ] Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 28 6月, 2015 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit 2143c196. This commit seems to be the cause of the following jbd2 assertion failure: ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:1325! invalid opcode: 0000 [#1] SMP Modules linked in: bnep bluetooth fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 ... CPU: 7 PID: 5509 Comm: gcc Not tainted 4.1.0-10944-g2a298679 #1 Hardware name: /DH87RL, BIOS RLH8710H.86A.0327.2014.0924.1645 09/24/2014 task: ffff8803bf866040 ti: ffff880308528000 task.ti: ffff880308528000 RIP: jbd2_journal_dirty_metadata+0x237/0x290 Call Trace: __ext4_handle_dirty_metadata+0x43/0x1f0 ext4_handle_dirty_dirent_node+0xde/0x160 ? jbd2_journal_get_write_access+0x36/0x50 ext4_delete_entry+0x112/0x160 ? __ext4_journal_start_sb+0x52/0xb0 ext4_unlink+0xfa/0x260 vfs_unlink+0xec/0x190 do_unlinkat+0x24a/0x270 SyS_unlink+0x11/0x20 entry_SYSCALL_64_fastpath+0x12/0x6a ---[ end trace ae033ebde8d080b4 ]--- which is not easily reproducible (I've seen it just once, and then Ted was able to reproduce it once). Revert it while Ted and Jan try to figure out what is wrong. Cc: Jan Kara <jack@suse.cz> Acked-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 6月, 2015 1 次提交
-
-
由 Rasmus Villemoes 提交于
In one case, we eliminate a local variable; in the other a strlen() call and some .text. Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 6月, 2015 1 次提交
-
-
由 Jan Kara 提交于
It is often the case that we mark buffer as having dirty metadata when the buffer is already in that state (frequent for bitmaps, inode table blocks, superblock). Thus it is unnecessary to contend on grabbing journal head reference and bh_state lock. Avoid that by checking whether any modification to the buffer is needed before grabbing any locks or references. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 16 6月, 2015 2 次提交
-
-
由 Michal Hocko 提交于
insert_revoke_hash does an open coded endless allocation loop if journal_oom_retry is true. It doesn't implement any allocation fallback strategy between the retries, though. The memory allocator doesn't know about the never fail requirement so it cannot potentially help to move on with the allocation (e.g. use memory reserves). Get rid of the retry loop and use __GFP_NOFAIL instead. We will lose the debugging message but I am not sure it is anyhow helpful. Do the same for journal_alloc_journal_head which is doing a similar thing. Signed-off-by: NMichal Hocko <mhocko@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Joseph Qi 提交于
If updating journal superblock fails after journal data has been flushed, the error is omitted and this will mislead the caller as a normal case. In ocfs2, the checkpoint will be treated successfully and the other node can get the lock to update. Since the sb_start is still pointing to the old log block, it will rewrite the journal data during journal recovery by the other node. Thus the new updates will be overwritten and ocfs2 corrupts. So in above case we have to return the error, and ocfs2_commit_cache will take care of the error and prevent the other node to do update first. And only after recovering journal it can do the new updates. The issue discussion mail can be found at: https://oss.oracle.com/pipermail/ocfs2-devel/2015-June/010856.html http://comments.gmane.org/gmane.comp.file-systems.ext4/48841 [ Fixed bug in patch which allowed a non-negative error return from jbd2_cleanup_journal_tail() to leak out of jbd2_fjournal_flush(); this was causing xfstests ext4/306 to fail. -- Ted ] Reported-by: NYiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: NJoseph Qi <joseph.qi@huawei.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Tested-by: NYiwen Jiang <jiangyiwen@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: stable@vger.kernel.org
-
- 15 6月, 2015 1 次提交
-
-
由 Dmitry Monakhov 提交于
jbd2_cleanup_journal_tail() can be invoked by jbd2__journal_start() So allocations should be done with GFP_NOFS [Full stack trace snipped from 3.10-rh7] [<ffffffff815c4bd4>] dump_stack+0x19/0x1b [<ffffffff8105dba1>] warn_slowpath_common+0x61/0x80 [<ffffffff8105dcca>] warn_slowpath_null+0x1a/0x20 [<ffffffff815c2142>] slab_pre_alloc_hook.isra.31.part.32+0x15/0x17 [<ffffffff8119c045>] kmem_cache_alloc+0x55/0x210 [<ffffffff811477f5>] ? mempool_alloc_slab+0x15/0x20 [<ffffffff811477f5>] mempool_alloc_slab+0x15/0x20 [<ffffffff81147939>] mempool_alloc+0x69/0x170 [<ffffffff815cb69e>] ? _raw_spin_unlock_irq+0xe/0x20 [<ffffffff8109160d>] ? finish_task_switch+0x5d/0x150 [<ffffffff811f1a8e>] bio_alloc_bioset+0x1be/0x2e0 [<ffffffff8127ee49>] blkdev_issue_flush+0x99/0x120 [<ffffffffa019a733>] jbd2_cleanup_journal_tail+0x93/0xa0 [jbd2] -->GFP_KERNEL [<ffffffffa019aca1>] jbd2_log_do_checkpoint+0x221/0x4a0 [jbd2] [<ffffffffa019afc7>] __jbd2_log_wait_for_space+0xa7/0x1e0 [jbd2] [<ffffffffa01952d8>] start_this_handle+0x2d8/0x550 [jbd2] [<ffffffff811b02a9>] ? __memcg_kmem_put_cache+0x29/0x30 [<ffffffff8119c120>] ? kmem_cache_alloc+0x130/0x210 [<ffffffffa019573a>] jbd2__journal_start+0xba/0x190 [jbd2] [<ffffffff811532ce>] ? lru_cache_add+0xe/0x10 [<ffffffffa01c9549>] ? ext4_da_write_begin+0xf9/0x330 [ext4] [<ffffffffa01f2c77>] __ext4_journal_start_sb+0x77/0x160 [ext4] [<ffffffffa01c9549>] ext4_da_write_begin+0xf9/0x330 [ext4] [<ffffffff811446ec>] generic_file_buffered_write_iter+0x10c/0x270 [<ffffffff81146918>] __generic_file_write_iter+0x178/0x390 [<ffffffff81146c6b>] __generic_file_aio_write+0x8b/0xb0 [<ffffffff81146ced>] generic_file_aio_write+0x5d/0xc0 [<ffffffffa01bf289>] ext4_file_write+0xa9/0x450 [ext4] [<ffffffff811c31d9>] ? pipe_read+0x379/0x4f0 [<ffffffff811b93f0>] do_sync_write+0x90/0xe0 [<ffffffff811b9b6d>] vfs_write+0xbd/0x1e0 [<ffffffff811ba5b8>] SyS_write+0x58/0xb0 [<ffffffff815d4799>] system_call_fastpath+0x16/0x1b Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 09 6月, 2015 4 次提交
-
-
由 Jan Kara 提交于
jbd2_journal_get_write_access() and jbd2_journal_get_create_access() are frequently called for buffers that are already part of the running transaction - most frequently it is the case for bitmaps, inode table blocks, and superblock. Since in such cases we have nothing to do, it is unfortunate we still grab reference to journal head, lock the bh, lock bh_state only to find out there's nothing to do. Improving this is a bit subtle though since until we find out journal head is attached to the running transaction, it can disappear from under us because checkpointing / commit decided it's no longer needed. We deal with this by protecting journal_head slab with RCU. We still have to be careful about journal head being freed & reallocated within slab and about exposing journal head in consistent state (in particular b_modified and b_frozen_data must be in correct state before we allow user to touch the buffer). Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
Check for the simple case of unjournaled buffer first, handle it and bail out. This allows us to remove one if and unindent the difficult case by one tab. The result is easier to read. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
We were acquiring bh_state_lock when allocation of buffer failed in do_get_write_access() only to be able to jump to a label that releases the lock and does all other checks that don't make sense for this error path. Just jump into the right label instead. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Jan Kara 提交于
needs_copy is set only in one place in do_get_write_access(), just move the frozen buffer copying into that place and factor it out to a separate function to make do_get_write_access() slightly more readable. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 08 6月, 2015 1 次提交
-
-
由 Michal Hocko 提交于
This basically reverts 47def826 (jbd2: Remove __GFP_NOFAIL from jbd2 layer). The deprecation of __GFP_NOFAIL was a bad choice because it led to open coding the endless loop around the allocator rather than removing the dependency on the non failing allocation. So the deprecation was a clear failure and the reality tells us that __GFP_NOFAIL is not even close to go away. It is still true that __GFP_NOFAIL allocations are generally discouraged and new uses should be evaluated and an alternative (pre-allocations or reservations) should be considered but it doesn't make any sense to lie the allocator about the requirements. Allocator can take steps to help making a progress if it knows the requirements. Signed-off-by: NMichal Hocko <mhocko@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Acked-by: NDavid Rientjes <rientjes@google.com>
-
- 15 5月, 2015 2 次提交
-
-
由 Darrick J. Wong 提交于
The journal revoke block recovery code does not check r_count for sanity, which means that an evil value of r_count could result in the kernel reading off the end of the revoke table and into whatever garbage lies beyond. This could crash the kernel, so fix that. However, in testing this fix, I discovered that the code to write out the revoke tables also was not correctly checking to see if the block was full -- the current offset check is fine so long as the revoke table space size is a multiple of the record size, but this is not true when either journal_csum_v[23] are set. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
-
由 Lukas Czerner 提交于
Currently when journal restart fails, we'll have the h_transaction of the handle set to NULL to indicate that the handle has been effectively aborted. We handle this situation quietly in the jbd2_journal_stop() and just free the handle and exit because everything else has been done before we attempted (and failed) to restart the journal. Unfortunately there are a number of problems with that approach introduced with commit 41a5b913 "jbd2: invalidate handle if jbd2_journal_restart() fails" First of all in ext4 jbd2_journal_stop() will be called through __ext4_journal_stop() where we would try to get a hold of the superblock by dereferencing h_transaction which in this case would lead to NULL pointer dereference and crash. In addition we're going to free the handle regardless of the refcount which is bad as well, because others up the call chain will still reference the handle so we might potentially reference already freed memory. Moreover it's expected that we'll get aborted handle as well as detached handle in some of the journalling function as the error propagates up the stack, so it's unnecessary to call WARN_ON every time we get detached handle. And finally we might leak some memory by forgetting to free reserved handle in jbd2_journal_stop() in the case where handle was detached from the transaction (h_transaction is NULL). Fix the NULL pointer dereference in __ext4_journal_stop() by just calling jbd2_journal_stop() quietly as suggested by Jan Kara. Also fix the potential memory leak in jbd2_journal_stop() and use proper handle refcounting before we attempt to free it to avoid use-after-free issues. And finally remove all WARN_ON(!transaction) from the code so that we do not get random traces when something goes wrong because when journal restart fails we will get to some of those functions. Cc: stable@vger.kernel.org Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
- 20 1月, 2015 1 次提交
-
-
由 Darrick J. Wong 提交于
We should complain in dmesg when journal recovery fails on account of the descriptor block being corrupt, so that the diagnostic data can be recovered. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-