提交 · 6fa7aa50b2c48400bbd045daf3a2498882eb0596 · openeuler / Kernel

14 1月, 2017 1 次提交

fs/jbd2, locking/mutex, sched/wait: Use mutex_lock_io() for journal->j_checkpoint_mutex · 6fa7aa50

由 Tejun Heo 提交于 10月 28, 2016

When an ext4 fs is bogged down by a lot of metadata IOs (in the
reported case, it was deletion of millions of files, but any massive
amount of journal writes would do), after the journal is filled up,
tasks which try to access the filesystem and aren't currently
performing the journal writes end up waiting in
__jbd2_log_wait_for_space() for journal->j_checkpoint_mutex.

Because those mutex sleeps aren't marked as iowait, this condition can
lead to misleadingly low iowait and /proc/stat:procs_blocked.  While
iowait propagation is far from strict, this condition can be triggered
fairly easily and annotating these sleeps correctly helps initial
diagnosis quite a bit.

Use the new mutex_lock_io() for journal->j_checkpoint_mutex so that
these sleeps are properly marked as iowait.
Reported-by: NMingbo Wan <mingbo@fb.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@fb.com
Link: http://lkml.kernel.org/r/1477673892-28940-5-git-send-email-tj@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

6fa7aa50

25 12月, 2016 1 次提交

Replace <asm/uaccess.h> with <linux/uaccess.h> globally · 7c0f6ba6

由 Linus Torvalds 提交于 12月 24, 2016

This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c0f6ba6

01 11月, 2016 1 次提交

block,fs: use REQ_* flags directly · 70fd7614

由 Christoph Hellwig 提交于 11月 01, 2016

Remove the WRITE_* and READ_SYNC wrappers, and just use the flags
directly.  Where applicable this also drops usage of the
bio_set_op_attrs wrapper.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

70fd7614

13 10月, 2016 1 次提交

jbd2: fix incorrect unlock on j_list_lock · 559cce69

由 Taesoo Kim 提交于 10月 12, 2016

When 'jh->b_transaction == transaction' (asserted by below)

  J_ASSERT_JH(jh, (jh->b_transaction == transaction || ...

'journal->j_list_lock' will be incorrectly unlocked, since
the the lock is aquired only at the end of if / else-if
statements (missing the else case).
Signed-off-by: NTaesoo Kim <tsgatesv@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Fixes: 6e4862a5
Cc: stable@vger.kernel.org # 3.14+

559cce69

12 10月, 2016 1 次提交

fs: use mapping_set_error instead of opencoded set_bit · 5114a97a

由 Michal Hocko 提交于 10月 11, 2016

The mapping_set_error() helper sets the correct AS_ flag for the mapping
so there is no reason to open code it.  Use the helper directly.

[akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO]
Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5114a97a

22 9月, 2016 1 次提交

jbd2: fix lockdep annotation in add_transaction_credits() · e03a9976

由 Jan Kara 提交于 9月 22, 2016

Thomas has reported a lockdep splat hitting in
add_transaction_credits(). The problem is that that function calls
jbd2_might_wait_for_commit() while holding j_state_lock which is wrong
(we do not really wait for transaction commit while holding that lock).

Fix the problem by moving jbd2_might_wait_for_commit() into places where
we are ready to wait for transaction commit and thus j_state_lock is
unlocked.

Cc: stable@vger.kernel.org
Fixes: 1eaa566dReported-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

e03a9976

16 9月, 2016 1 次提交

jbd2: move more common code into journal_init_common() · f0c9fd54

由 Geliang Tang 提交于 9月 15, 2016

There are some repetitive code in jbd2_journal_init_dev() and
jbd2_journal_init_inode(). So this patch moves the common code into
journal_init_common() helper to simplify the code. And fix the coding
style warnings reported by checkpatch.pl by the way.
Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

f0c9fd54

30 6月, 2016 4 次提交

jbd2: make journal y2038 safe · abcfb5d9

由 Arnd Bergmann 提交于 6月 30, 2016

The jbd2 journal stores the commit time in 64-bit seconds and 32-bit
nanoseconds, which avoids an overflow in 2038, but it gets the numbers
from current_kernel_time(), which uses 'long' seconds on 32-bit
architectures.

This simply changes the code to call current_kernel_time64() so
we use 64-bit seconds consistently.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org

abcfb5d9

jbd2: track more dependencies on transaction commit · 1eaa566d

由 Jan Kara 提交于 6月 30, 2016

So far we were tracking only dependency on transaction commit due to
starting a new handle (which may require commit to start a new
transaction). Now add tracking also for other cases where we wait for
transaction commit. This way lockdep can catch deadlocks e. g. because we
call jbd2_journal_stop() for a synchronous handle with some locks held
which rank below transaction start.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

1eaa566d

jbd2: move lockdep tracking to journal_s · ab714aff

由 Jan Kara 提交于 6月 30, 2016

Currently lockdep map is tracked in each journal handle. To be able to
expand lockdep support to cover also other cases where we depend on
transaction commit and where handle is not available, move lockdep map
into struct journal_s. Since this makes the lockdep map shared for all
handles, we have to use rwsem_acquire_read() for acquisitions now.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ab714aff

jbd2: move lockdep instrumentation for jbd2 handles · 7a4b188f

由 Jan Kara 提交于 6月 30, 2016

The transaction the handle references is free to commit once we've
decremented t_updates counter. Move the lockdep instrumentation to that
place. Currently it was a bit later which did not really matter but
subsequent improvements to lockdep instrumentation would cause false
positives with it.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

7a4b188f

25 6月, 2016 1 次提交

jbd2: get rid of superfluous __GFP_REPEAT · f2db1971

由 Michal Hocko 提交于 6月 24, 2016

jbd2_alloc is explicit about its allocation preferences wrt.  the
allocation size.  Sub page allocations go to the slab allocator and
larger are using either the page allocator or vmalloc.  This is all good
but the logic is unnecessarily complex.

1) as per Ted, the vmalloc fallback is a left-over:

 : jbd2_alloc is only passed in the bh->b_size, which can't be PAGE_SIZE, so
 : the code path that calls vmalloc() should never get called.  When we
 : conveted jbd2_alloc() to suppor sub-page size allocations in commit
 : d2eecb03, there was an assumption that it could be called with a size
 : greater than PAGE_SIZE, but that's certaily not true today.

Moreover vmalloc allocation might even lead to a deadlock because the
callers expect GFP_NOFS context while vmalloc is GFP_KERNEL.

2) __GFP_REPEAT for requests <= PAGE_ALLOC_COSTLY_ORDER is ignored
   since the flag was introduced.

Let's simplify the code flow and use the slab allocator for sub-page
requests and the page allocator for others.  Even though order > 0 is
not currently used as per above leave that option open.

Link: http://lkml.kernel.org/r/1464599699-30131-18-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f2db1971

08 6月, 2016 3 次提交

block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH · 28a8f0d3

由 Mike Christie 提交于 6月 05, 2016

To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.
Signed-off-by: NMike Christie <mchristi@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

28a8f0d3

fs: have ll_rw_block users pass in op and flags separately · dfec8a14

由 Mike Christie 提交于 6月 05, 2016

This has ll_rw_block users pass in the operation and flags separately,
so ll_rw_block can setup the bio op and bi_rw flags on the bio that
is submitted.
Signed-off-by: NMike Christie <mchristi@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

dfec8a14

fs: have submit_bh users pass in op and flags separately · 2a222ca9

由 Mike Christie 提交于 6月 05, 2016

This has submit_bh users pass in the operation and flags separately,
so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that
is submitted.
Signed-off-by: NMike Christie <mchristi@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

2a222ca9

24 4月, 2016 1 次提交

jbd2: add support for avoiding data writes during transaction commits · 41617e1a

由 Jan Kara 提交于 4月 24, 2016

Currently when filesystem needs to make sure data is on permanent
storage before committing a transaction it adds inode to transaction's
inode list. During transaction commit, jbd2 writes back all dirty
buffers that have allocated underlying blocks and waits for the IO to
finish. However when doing writeback for delayed allocated data, we
allocate blocks and immediately submit the data. Thus asking jbd2 to
write dirty pages just unnecessarily adds more work to jbd2 possibly
writing back other redirtied blocks.

Add support to jbd2 to allow filesystem to ask jbd2 to only wait for
outstanding data writes before committing a transaction and thus avoid
unnecessary writes.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

41617e1a

18 4月, 2016 1 次提交

Doc: treewide : Fix typos in DocBook/filesystem.xml · bd7ced98

由 Masanari Iida 提交于 2月 02, 2016

This patch fix spelling typos found in DocBook/filesystem.xml.
It is because the file was generated from comments in code,
I have to fix the comments in codes, instead of xml file.
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

bd7ced98

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

14 3月, 2016 1 次提交

jbd2: do not fail journal because of frozen_buffer allocation failure · 490c1b44

由 Michal Hocko 提交于 3月 13, 2016

Journal transaction might fail prematurely because the frozen_buffer
is allocated by GFP_NOFS request:
[   72.440013] do_get_write_access: OOM for frozen_buffer
[   72.440014] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.440015] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4735: Out of memory
(...snipped....)
[   72.495559] do_get_write_access: OOM for frozen_buffer
[   72.495560] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.496839] do_get_write_access: OOM for frozen_buffer
[   72.496841] EXT4-fs: ext4_reserve_inode_write:4729: aborting transaction: Out of memory in __ext4_journal_get_write_access
[   72.505766] Aborting journal on device sda1-8.
[   72.505851] EXT4-fs (sda1): Remounting filesystem read-only

This wasn't a problem until "mm: page_alloc: do not lock up GFP_NOFS
allocations upon OOM" because small GPF_NOFS allocations never failed.
This allocation seems essential for the journal and GFP_NOFS is too
restrictive to the memory allocator so let's use __GFP_NOFAIL here to
emulate the previous behavior.
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

490c1b44

10 3月, 2016 1 次提交

jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path · c0a2ad9b

由 OGAWA Hirofumi 提交于 3月 09, 2016

On umount path, jbd2_journal_destroy() writes latest transaction ID
(->j_tail_sequence) to be used at next mount.

The bug is that ->j_tail_sequence is not holding latest transaction ID
in some cases. So, at next mount, there is chance to conflict with
remaining (not overwritten yet) transactions.

	mount (id=10)
	write transaction (id=11)
	write transaction (id=12)
	umount (id=10) <= the bug doesn't write latest ID

	mount (id=10)
	write transaction (id=11)
	crash

	mount
	[recovery process]
		transaction (id=11)
		transaction (id=12) <= valid transaction ID, but old commit
                                       must not replay

Like above, this bug become the cause of recovery failure, or FS
corruption.

So why ->j_tail_sequence doesn't point latest ID?

Because if checkpoint transactions was reclaimed by memory pressure
(i.e. bdev_try_to_free_page()), then ->j_tail_sequence is not updated.
(And another case is, __jbd2_journal_clean_checkpoint_list() is called
with empty transaction.)

So in above cases, ->j_tail_sequence is not pointing latest
transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not
done too.

So, to fix this problem with minimum changes, this patch updates
->j_tail_sequence, and issue REQ_FLUSH.  (With more complex changes,
some optimizations would be possible to avoid unnecessary REQ_FLUSH
for example though.)

BTW,

	journal->j_tail_sequence =
		++journal->j_transaction_sequence;

Increment of ->j_transaction_sequence seems to be unnecessary, but
ext3 does this.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

c0a2ad9b

23 2月, 2016 4 次提交

jbd2: save some atomic ops in __JI_COMMIT_RUNNING handling · cb0d9d47

由 Jan Kara 提交于 2月 22, 2016

Currently we used atomic bit operations to manipulate
__JI_COMMIT_RUNNING bit. However this is unnecessary as i_flags are
always written and read under j_list_lock. So just change the operations
to standard bit operations.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

cb0d9d47

jbd2: unify revoke and tag block checksum handling · 1101cd4d

由 Jan Kara 提交于 2月 22, 2016

Revoke and tag descriptor blocks are just different kinds of descriptor
blocks and thus have checksum in the same place. Unify computation and
checking of checksums for these.
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

1101cd4d

jbd2: factor out common descriptor block initialization · 32ab6715

由 Jan Kara 提交于 2月 22, 2016

Descriptor block header is initialized in several places. Factor out the
common code into jbd2_journal_get_descriptor_buffer().
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

32ab6715

jbd2: remove unnecessary arguments of jbd2_journal_write_revoke_records · 9bcf976c

由 Jan Kara 提交于 2月 22, 2016

jbd2_journal_write_revoke_records() takes journal pointer and write_op,
although journal can be obtained from the passed transaction and
write_op is always WRITE_SYNC. Remove these superfluous arguments.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

9bcf976c

07 1月, 2016 1 次提交

fs: use block_device name vsprintf helper · a1c6f057

由 Dmitry Monakhov 提交于 4月 13, 2015

Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a1c6f057

05 12月, 2015 1 次提交

jbd2: fix null committed data return in undo_access · 087ffd4e

由 Junxiao Bi 提交于 12月 04, 2015

introduced jbd2_write_access_granted() to improve write|undo_access
speed, but missed to check the status of b_committed_data which caused
a kernel panic on ocfs2.

[ 6538.405938] ------------[ cut here ]------------
[ 6538.406686] kernel BUG at fs/ocfs2/suballoc.c:2400!
[ 6538.406686] invalid opcode: 0000 [#1] SMP
[ 6538.406686] Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront xen_fbfront parport_pc parport pcspkr i2c_piix4 acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix cirrus ttm drm_kms_helper drm fb_sys_fops sysimgblt sysfillrect i2c_core syscopyarea dm_mirror dm_region_hash dm_log dm_mod
[ 6538.406686] CPU: 1 PID: 16265 Comm: mmap_truncate Not tainted 4.3.0 #1
[ 6538.406686] Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014
[ 6538.406686] task: ffff88007c2bab00 ti: ffff880075b78000 task.ti: ffff880075b78000
[ 6538.406686] RIP: 0010:[<ffffffffa06a286b>]  [<ffffffffa06a286b>] ocfs2_block_group_clear_bits+0x23b/0x250 [ocfs2]
[ 6538.406686] RSP: 0018:ffff880075b7b7f8  EFLAGS: 00010246
[ 6538.406686] RAX: ffff8800760c5b40 RBX: ffff88006c06a000 RCX: ffffffffa06e6df0
[ 6538.406686] RDX: 0000000000000000 RSI: ffff88007a6f6ea0 RDI: ffff88007a760430
[ 6538.406686] RBP: ffff880075b7b878 R08: 0000000000000002 R09: 0000000000000001
[ 6538.406686] R10: ffffffffa06769be R11: 0000000000000000 R12: 0000000000000001
[ 6538.406686] R13: ffffffffa06a1750 R14: 0000000000000001 R15: ffff88007a6f6ea0
[ 6538.406686] FS:  00007f17fde30720(0000) GS:ffff88007f040000(0000) knlGS:0000000000000000
[ 6538.406686] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6538.406686] CR2: 0000000000601730 CR3: 000000007aea0000 CR4: 00000000000406e0
[ 6538.406686] Stack:
[ 6538.406686]  ffff88007c2bb5b0 ffff880075b7b8e0 ffff88007a7604b0 ffff88006c640800
[ 6538.406686]  ffff88007a7604b0 ffff880075d77390 0000000075b7b878 ffffffffa06a309d
[ 6538.406686]  ffff880075d752d8 ffff880075b7b990 ffff880075b7b898 0000000000000000
[ 6538.406686] Call Trace:
[ 6538.406686]  [<ffffffffa06a309d>] ? ocfs2_read_group_descriptor+0x6d/0xa0 [ocfs2]
[ 6538.406686]  [<ffffffffa06a3654>] _ocfs2_free_suballoc_bits+0xe4/0x320 [ocfs2]
[ 6538.406686]  [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
[ 6538.406686]  [<ffffffffa06a397e>] _ocfs2_free_clusters+0xee/0x210 [ocfs2]
[ 6538.406686]  [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
[ 6538.406686]  [<ffffffffa06a1750>] ? ocfs2_put_slot+0xf0/0xf0 [ocfs2]
[ 6538.406686]  [<ffffffffa0682d50>] ? ocfs2_extend_trans+0x50/0x1a0 [ocfs2]
[ 6538.406686]  [<ffffffffa06a3ad5>] ocfs2_free_clusters+0x15/0x20 [ocfs2]
[ 6538.406686]  [<ffffffffa065072c>] ocfs2_replay_truncate_records+0xfc/0x290 [ocfs2]
[ 6538.406686]  [<ffffffffa06843ac>] ? ocfs2_start_trans+0xec/0x1d0 [ocfs2]
[ 6538.406686]  [<ffffffffa0654600>] __ocfs2_flush_truncate_log+0x140/0x2d0 [ocfs2]
[ 6538.406686]  [<ffffffffa0654394>] ? ocfs2_reserve_blocks_for_rec_trunc.clone.0+0x44/0x170 [ocfs2]
[ 6538.406686]  [<ffffffffa065acd4>] ocfs2_remove_btree_range+0x374/0x630 [ocfs2]
[ 6538.406686]  [<ffffffffa017486b>] ? jbd2_journal_stop+0x25b/0x470 [jbd2]
[ 6538.406686]  [<ffffffffa065d5b5>] ocfs2_commit_truncate+0x305/0x670 [ocfs2]
[ 6538.406686]  [<ffffffffa0683430>] ? ocfs2_journal_access_eb+0x20/0x20 [ocfs2]
[ 6538.406686]  [<ffffffffa067adb7>] ocfs2_truncate_file+0x297/0x380 [ocfs2]
[ 6538.406686]  [<ffffffffa01759e4>] ? jbd2_journal_begin_ordered_truncate+0x64/0xc0 [jbd2]
[ 6538.406686]  [<ffffffffa067c7a2>] ocfs2_setattr+0x572/0x860 [ocfs2]
[ 6538.406686]  [<ffffffff810e4a3f>] ? current_fs_time+0x3f/0x50
[ 6538.406686]  [<ffffffff812124b7>] notify_change+0x1d7/0x340
[ 6538.406686]  [<ffffffff8121abf9>] ? generic_getxattr+0x79/0x80
[ 6538.406686]  [<ffffffff811f5876>] do_truncate+0x66/0x90
[ 6538.406686]  [<ffffffff81120e30>] ? __audit_syscall_entry+0xb0/0x110
[ 6538.406686]  [<ffffffff811f5bb3>] do_sys_ftruncate.clone.0+0xf3/0x120
[ 6538.406686]  [<ffffffff811f5bee>] SyS_ftruncate+0xe/0x10
[ 6538.406686]  [<ffffffff816aa2ae>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 6538.406686] Code: 28 48 81 ee b0 04 00 00 48 8b 92 50 fb ff ff 48 8b 80 b0 03 00 00 48 39 90 88 00 00 00 0f 84 30 fe ff ff 0f 0b eb fe 0f 0b eb fe <0f> 0b 0f 1f 00 eb fb 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00
[ 6538.406686] RIP  [<ffffffffa06a286b>] ocfs2_block_group_clear_bits+0x23b/0x250 [ocfs2]
[ 6538.406686]  RSP <ffff880075b7b7f8>
[ 6538.691128] ---[ end trace 31cd7011d6770d7e ]---
[ 6538.694492] Kernel panic - not syncing: Fatal exception
[ 6538.695484] Kernel Offset: disabled

Fixes: de92c8ca("jbd2: speedup jbd2_journal_get_[write|undo]_access()")
Cc: <stable@vger.kernel.org>
Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

087ffd4e

25 11月, 2015 1 次提交

jbd2: Fix unreclaimed pages after truncate in data=journal mode · bc23f0c8

由 Jan Kara 提交于 11月 24, 2015

Ted and Namjae have reported that truncated pages don't get timely
reclaimed after being truncated in data=journal mode. The following test
triggers the issue easily:

for (i = 0; i < 1000; i++) {
	pwrite(fd, buf, 1024*1024, 0);
	fsync(fd);
	fsync(fd);
	ftruncate(fd, 0);
}

The reason is that journal_unmap_buffer() finds that truncated buffers
are not journalled (jh->b_transaction == NULL), they are part of
checkpoint list of a transaction (jh->b_cp_transaction != NULL) and have
been already written out (!buffer_dirty(bh)). We clean such buffers but
we leave them in the checkpoint list. Since checkpoint transaction holds
a reference to the journal head, these buffers cannot be released until
the checkpoint transaction is cleaned up. And at that point we don't
call release_buffer_page() anymore so pages detached from mapping are
lingering in the system waiting for reclaim to find them and free them.

Fix the problem by removing buffers from transaction checkpoint lists
when journal_unmap_buffer() finds out they don't have to be there
anymore.
Reported-and-tested-by: NNamjae Jeon <namjae.jeon@samsung.com>
Fixes: de1b7941Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

bc23f0c8

07 11月, 2015 1 次提交

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc

由 Mel Gorman 提交于 11月 06, 2015

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd

__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d0164adc

19 10月, 2015 1 次提交

ext4, jbd2: ensure entering into panic after recording an error in superblock · 4327ba52

由 Daeho Jeong 提交于 10月 18, 2015

If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the
journaling will be aborted first and the error number will be recorded
into JBD2 superblock and, finally, the system will enter into the
panic state in "errors=panic" option.  But, in the rare case, this
sequence is little twisted like the below figure and it will happen
that the system enters into panic state, which means the system reset
in mobile environment, before completion of recording an error in the
journal superblock. In this case, e2fsck cannot recognize that the
filesystem failure occurred in the previous run and the corruption
wouldn't be fixed.

Task A                        Task B
ext4_handle_error()
-> jbd2_journal_abort()
  -> __journal_abort_soft()
    -> __jbd2_journal_abort_hard()
    | -> journal->j_flags |= JBD2_ABORT;
    |
    |                         __ext4_abort()
    |                         -> jbd2_journal_abort()
    |                         | -> __journal_abort_soft()
    |                         |   -> if (journal->j_flags & JBD2_ABORT)
    |                         |           return;
    |                         -> panic()
    |
    -> jbd2_journal_update_sb_errno()
Tested-by: NHobin Woo <hobin.woo@samsung.com>
Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

4327ba52

18 10月, 2015 3 次提交

jbd2: fix checkpoint list cleanup · 33d14975

由 Jan Kara 提交于 10月 17, 2015

Unlike comments and expectation of callers journal_clean_one_cp_list()
returned 1 not only if it freed the transaction but also if it freed
some buffers in the transaction. That could make
__jbd2_journal_clean_checkpoint_list() skip processing
t_checkpoint_io_list and continue with processing the next transaction.
This is mostly a cosmetic issue since the only result is we can
sometimes free less memory than we could. But it's still worth fixing.
Fix journal_clean_one_cp_list() to return 1 only if the transaction was
really freed.

Fixes: 50849db3Signed-off-by: NJan Kara <jack@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

33d14975

jbd2: clean up feature test macros with predicate functions · 56316a0d

由 Darrick J. Wong 提交于 10月 17, 2015

Create separate predicate functions to test/set/clear feature flags,
thereby replacing the wordy old macros.  Furthermore, clean out the
places where we open-coded feature tests.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

56316a0d

ext4: call out CRC and corruption errors with specific error codes · 6a797d27

由 Darrick J. Wong 提交于 10月 17, 2015

Instead of overloading EIO for CRC errors and corrupt structures,
return the same error codes that XFS returns for the same issues.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

6a797d27

15 10月, 2015 1 次提交

jbd2: gate checksum calculations on crc driver presence, not sb flags · 8595798c

由 Darrick J. Wong 提交于 10月 15, 2015

Change the journal's checksum functions to gate on whether or not the
crc32c driver is loaded, and gate the loading on the superblock bits.
This prevents a journal crash if someone loads a journal in no-csum
mode and then randomizes the superblock, thus flipping on the feature
bits.
Tested-By: NNikolay Borisov <kernel@kyup.com>
Reported-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

8595798c

04 8月, 2015 1 次提交

jbd2: limit number of reserved credits · 6d3ec14d

由 Lukas Czerner 提交于 8月 04, 2015

Currently there is no limitation on number of reserved credits we can
ask for. If we ask for more reserved credits than 1/2 of maximum
transaction size, or if total number of credits exceeds the maximum
transaction size per operation (which is currently only possible with
the former) we will spin forever in start_this_handle().

Fix this by adding this limitation at the start of start_this_handle().

This patch also removes the credit limitation 1/2 of maximum transaction
size, since we really only want to limit the number of reserved credits.
There is not much point to limit the credits if there is still space in
the journal.

This accidentally also fixes the online resize, where due to the
limitation of the journal credits we're unable to grow file systems with
1k block size and size between 16M and 32M. It has been partially fixed
by 2c869b26, but not entirely.

Thanks Jan Kara for helping me getting the correct fix.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

6d3ec14d

29 7月, 2015 1 次提交

jbd2: avoid infinite loop when destroying aborted journal · 841df7df

由 Jan Kara 提交于 7月 28, 2015

Commit 6f6a6fda "jbd2: fix ocfs2 corrupt when updating journal
superblock fails" changed jbd2_cleanup_journal_tail() to return EIO
when the journal is aborted. That makes logic in
jbd2_log_do_checkpoint() bail out which is fine, except that
jbd2_journal_destroy() expects jbd2_log_do_checkpoint() to always make
a progress in cleaning the journal. Without it jbd2_journal_destroy()
just loops in an infinite loop.

Fix jbd2_journal_destroy() to cleanup journal checkpoint lists of
jbd2_log_do_checkpoint() fails with error.
Reported-by: NEryu Guan <guaneryu@gmail.com>
Tested-by: NEryu Guan <guaneryu@gmail.com>
Fixes: 6f6a6fdaSigned-off-by: NJan Kara <jack@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

841df7df

23 7月, 2015 1 次提交

ext4, jbd2: add REQ_FUA flag when recording an error in the superblock · 564bc402

由 Daeho Jeong 提交于 7月 23, 2015

When an error condition is detected, an error status should be recorded into
superblocks of EXT4 or JBD2. However, the write request is submitted now
without REQ_FUA flag, even in "barrier=1" mode, which is followed by
panic() function in "errors=panic" mode. On mobile devices which make
whole system reset as soon as kernel panic occurs, this write request
containing an error flag will disappear just from storage cache without
written to the physical cells. Therefore, when next start, even forever,
the error flag cannot be shown in both superblocks, and e2fsck cannot fix
the filesystem problems automatically, unless e2fsck is executed in
force checking mode.

[ Changed use test_opt(sb, BARRIER) of checking the journal flags -- TYT ]
Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

564bc402

13 7月, 2015 1 次提交

jbd2: speedup jbd2_journal_dirty_metadata() · 6e06ae88

由 Jan Kara 提交于 7月 12, 2015

It is often the case that we mark buffer as having dirty metadata when
the buffer is already in that state (frequent for bitmaps, inode table
blocks, superblock). Thus it is unnecessary to contend on grabbing
journal head reference and bh_state lock. Avoid that by checking whether
any modification to the buffer is needed before grabbing any locks or
references.

[ Note: this is a fixed version of commit 2143c196, which was
  reverted in ebeaa8dd due to a false positive triggering of an
  assertion check. -- Ted ]
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

6e06ae88

28 6月, 2015 1 次提交

Revert "jbd2: speedup jbd2_journal_dirty_metadata()" · ebeaa8dd

由 Linus Torvalds 提交于 6月 27, 2015

This reverts commit 2143c196.

This commit seems to be the cause of the following jbd2 assertion
failure:

   ------------[ cut here ]------------
   kernel BUG at fs/jbd2/transaction.c:1325!
   invalid opcode: 0000 [#1] SMP
   Modules linked in: bnep bluetooth fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 ...
   CPU: 7 PID: 5509 Comm: gcc Not tainted 4.1.0-10944-g2a298679 #1
   Hardware name:                  /DH87RL, BIOS RLH8710H.86A.0327.2014.0924.1645 09/24/2014
   task: ffff8803bf866040 ti: ffff880308528000 task.ti: ffff880308528000
   RIP: jbd2_journal_dirty_metadata+0x237/0x290
   Call Trace:
     __ext4_handle_dirty_metadata+0x43/0x1f0
     ext4_handle_dirty_dirent_node+0xde/0x160
     ? jbd2_journal_get_write_access+0x36/0x50
     ext4_delete_entry+0x112/0x160
     ? __ext4_journal_start_sb+0x52/0xb0
     ext4_unlink+0xfa/0x260
     vfs_unlink+0xec/0x190
     do_unlinkat+0x24a/0x270
     SyS_unlink+0x11/0x20
     entry_SYSCALL_64_fastpath+0x12/0x6a
   ---[ end trace ae033ebde8d080b4 ]---

which is not easily reproducible (I've seen it just once, and then Ted
was able to reproduce it once).  Revert it while Ted and Jan try to
figure out what is wrong.

Cc: Jan Kara <jack@suse.cz>
Acked-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ebeaa8dd

26 6月, 2015 1 次提交

fs/jbd2/journal.c: use strreplace() · 81ae394b

由 Rasmus Villemoes 提交于 6月 25, 2015

In one case, we eliminate a local variable; in the other a strlen()
call and some .text.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

81ae394b

21 6月, 2015 1 次提交

jbd2: speedup jbd2_journal_dirty_metadata() · 2143c196

由 Jan Kara 提交于 6月 20, 2015

It is often the case that we mark buffer as having dirty metadata when
the buffer is already in that state (frequent for bitmaps, inode table
blocks, superblock). Thus it is unnecessary to contend on grabbing
journal head reference and bh_state lock. Avoid that by checking whether
any modification to the buffer is needed before grabbing any locks or
references.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2143c196

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功