提交 · 598c7d7abc832e35677b851f6afb93141c09993b · openanolis / cloud-kernel

11 7月, 2016 1 次提交

ext4 crypto: migrate into vfs's crypto engine · a7550b30

由 Jaegeuk Kim 提交于 7月 10, 2016

This patch removes the most parts of internal crypto codes.
And then, it modifies and adds some ext4-specific crypt codes to use the generic
facility.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

a7550b30

06 7月, 2016 3 次提交

ext4: fix project quota accounting without quota limits enabled · 079788d0

由 Wang Shilong 提交于 7月 05, 2016

We should always transfer quota accounting, regardless of whether
quota limits are enabled.

Steps to reproduce:
  # mkfs.ext4 /dev/sda4 -O quota,project
  # mount /dev/sda4 /mnt/test
  # cp /bin/bash /mnt/test
  # chattr -p 123 /mnt/test/bash
  # quota -v -P 123
Signed-off-by: NWang Shilong <wshilong@ddn.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

079788d0

ext4: validate s_reserved_gdt_blocks on mount · 5b9554dc

由 Theodore Ts'o 提交于 7月 05, 2016

If s_reserved_gdt_blocks is extremely large, it's possible for
ext4_init_block_bitmap(), which is called when ext4 sets up an
uninitialized block bitmap, to corrupt random kernel memory.  Add the
same checks which e2fsck has --- it must never be larger than
blocksize / sizeof(__u32) --- and then add a backup check in
ext4_init_block_bitmap() in case the superblock gets modified after
the file system is mounted.
Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

5b9554dc

ext4: remove unused page_idx · de9e9181

由 yalin wang 提交于 7月 05, 2016

Signed-off-by: Nyalin wang <yalin.wang2010@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.com>

de9e9181

04 7月, 2016 5 次提交

ext4: don't call ext4_should_journal_data() on the journal inode · 6a7fd522

由 Vegard Nossum 提交于 7月 04, 2016

If ext4_fill_super() fails early, it's possible for ext4_evict_inode()
to call ext4_should_journal_data() before superblock options and flags
are fully set up.  In that case, the iput() on the journal inode can
end up causing a BUG().

Work around this problem by reordering the tests so we only call
ext4_should_journal_data() after we know it's not the journal inode.

Fixes: 2d859db3 ("ext4: fix data corruption in inodes with journalled data")
Fixes: 2b405bfa ("ext4: fix data=journal fast mount/umount hang")
Cc: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

6a7fd522

ext4: Fix WARN_ON_ONCE in ext4_commit_super() · 4743f839

由 Pranay Kr. Srivastava 提交于 7月 04, 2016

If there are racing calls to ext4_commit_super() it's possible for
another writeback of the superblock to result in the buffer being
marked with an error after we check if the buffer is marked as having
a write error and the buffer up-to-date flag is set again.  If that
happens mark_buffer_dirty() can end up throwing a WARN_ON_ONCE.

Fix this by moving this check to write before we call
write_buffer_dirty(), and keeping the buffer locked during this whole
sequence.
Signed-off-by: NPranay Kr. Srivastava <pranjas@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4743f839

ext4: fix deadlock during page writeback · 646caa9c

由 Jan Kara 提交于 7月 04, 2016

Commit 06bd3c36 (ext4: fix data exposure after a crash) uncovered a
deadlock in ext4_writepages() which was previously much harder to hit.
After this commit xfstest generic/130 reproduces the deadlock on small
filesystems.

The problem happens when ext4_do_update_inode() sets LARGE_FILE feature
and marks current inode handle as synchronous. That subsequently results
in ext4_journal_stop() called from ext4_writepages() to block waiting for
transaction commit while still holding page locks, reference to io_end,
and some prepared bio in mpd structure each of which can possibly block
transaction commit from completing and thus results in deadlock.

Fix the problem by releasing page locks, io_end reference, and
submitting prepared bio before calling ext4_journal_stop().

[ Changed to defer the call to ext4_journal_stop() only if the handle
  is synchronous.  --tytso ]
Reported-and-tested-by: NEryu Guan <eguan@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
CC: stable@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>

646caa9c

ext4: correct error value of function verifying dx checksum · fa964540

由 Daeho Jeong 提交于 7月 03, 2016

ext4_dx_csum_verify() returns the success return value in two checksum
verification failure cases. We need to set the return values to zero
as failure like ext4_dirent_csum_verify() returning zero when failing
to find a checksum dirent at the tail.
Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>

fa964540

ext4: avoid modifying checksum fields directly during checksum verification · b47820ed

由 Daeho Jeong 提交于 7月 03, 2016

We temporally change checksum fields in buffers of some types of
metadata into '0' for verifying the checksum values. By doing this
without locking the buffer, some metadata's checksums, which are
being committed or written back to the storage, could be damaged.
In our test, several metadata blocks were found with damaged metadata
checksum value during recovery process. When we only verify the
checksum value, we have to avoid modifying checksum fields directly.
Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: NYoungjin Gil <youngjin.gil@samsung.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>

b47820ed

30 6月, 2016 1 次提交

ext4: check for extents that wrap around · f70749ca

由 Vegard Nossum 提交于 6月 30, 2016

An extent with lblock = 4294967295 and len = 1 will pass the
ext4_valid_extent() test:

	ext4_lblk_t last = lblock + len - 1;

	if (len == 0 || lblock > last)
		return 0;

since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger
the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end().

We can simplify it by removing the - 1 altogether and changing the test
to use lblock + len <= lblock, since now if len = 0, then lblock + 0 ==
lblock and it fails, and if len > 0 then lblock + len > lblock in order
to pass (i.e. it doesn't overflow).

Fixes: 5946d089 ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
Fixes: 2f974865 ("ext4: check for zero length extent explicitly")
Cc: Eryu Guan <guaneryu@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPhil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

f70749ca

27 6月, 2016 2 次提交

ext4: respect the nobarrier mount option in nojournal mode · 78d96251

由 Theodore Ts'o 提交于 6月 26, 2016

Also, if we are going to issue the barrier, we should do this after we
write out the parent directories if necessary.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

78d96251

ext4: optimize ext4_should_retry_alloc() to improve ENOSPC performance · d08854f5

由 Theodore Ts'o 提交于 6月 26, 2016

If there are no pending blocks to be released after a commit, forcing
a journal commit has no hope of helping.  It's possible that a commit
had just completed, so if there are now free blocks available for
allocation, it's worth retrying the commit.
Reported-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d08854f5

28 5月, 2016 1 次提交

switch xattr_handler->set() to passing dentry and inode separately · 59301226

由 Al Viro 提交于 5月 27, 2016

preparation for similar switch in ->setxattr() (see the next commit for
rationale).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

59301226

21 5月, 2016 1 次提交

lib/uuid.c: move generate_random_uuid() to uuid.c · 8da4b8c4

由 Andy Shevchenko 提交于 5月 20, 2016

Let's gather the UUID related functions under one hood.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8da4b8c4

17 5月, 2016 2 次提交

ext4: Add alignment check for DAX mount · 87eefeb4

由 Toshi Kani 提交于 5月 10, 2016

When a partition is not aligned by 4KB, mount -o dax succeeds,
but any read/write access to the filesystem fails, except for
metadata update.

Call bdev_dax_supported() to perform proper precondition checks
which includes this partition alignment check.
Reported-by: NMicah Parrish <micah.parrish@hpe.com>
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>

87eefeb4

dax: Remove complete_unwritten argument · 02fbd139

由 Jan Kara 提交于 5月 11, 2016

Fault handlers currently take complete_unwritten argument to convert
unwritten extents after PTEs are updated. However no filesystem uses
this anymore as the code is racy. Remove the unused argument.
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>

02fbd139

13 5月, 2016 5 次提交

ext4: pre-zero allocated blocks for DAX IO · 12735f88

由 Jan Kara 提交于 5月 13, 2016

Currently ext4 treats DAX IO the same way as direct IO. I.e., it
allocates unwritten extents before IO is done and converts unwritten
extents afterwards. However this way DAX IO can race with page fault to
the same area:

ext4_ext_direct_IO()				dax_fault()
  dax_io()
    get_block() - allocates unwritten extent
    copy_from_iter_pmem()
						  get_block() - converts
						    unwritten block to
						    written and zeroes it
						    out
  ext4_convert_unwritten_extents()

So data written with DAX IO gets lost. Similarly dax_new_buf() called
from dax_io() can overwrite data that has been already written to the
block via mmap.

Fix the problem by using pre-zeroed blocks for DAX IO the same way as we
use them for DAX mmap. The downside of this solution is that every
allocating write writes each block twice (once zeros, once data). Fixing
the race with locking is possible as well however we would need to
lock-out faults for the whole range written to by DAX IO. And that is
not easy to do without locking-out faults for the whole file which seems
too aggressive.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

12735f88

ext4: refactor direct IO code · 914f82a3

由 Jan Kara 提交于 5月 13, 2016

Currently ext4 direct IO handling is split between ext4_ext_direct_IO()
and ext4_ind_direct_IO(). However the extent based function calls into
the indirect based one for some cases and for example it is not able to
handle file extending. Previously it was not also properly handling
retries in case of ENOSPC errors. With DAX things would get even more
contrieved so just refactor the direct IO code and instead of indirect /
extent split do the split to read vs writes.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

914f82a3

ext4: fix race in transient ENOSPC detection · dbc427ce

由 Jan Kara 提交于 5月 13, 2016

When there are blocks to free in the running transaction, block
allocator can return ENOSPC although the filesystem has some blocks to
free. We use ext4_should_retry_alloc() to force commit of the current
transaction and return whether anything was committed so that it makes
sense to retry the allocation. However the transaction may get committed
after block allocation fails but before we call
ext4_should_retry_alloc(). So ext4_should_retry_alloc() returns false
because there is nothing to commit and we wrongly return ENOSPC.

Fix the race by unconditionally returning 1 from ext4_should_retry_alloc()
when we tried to commit a transaction. This should not add any
unnecessary retries since we had a transaction running a while ago when
trying to allocate blocks and we want to retry the allocation once that
transaction has committed anyway.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

dbc427ce

ext4: handle transient ENOSPC properly for DAX · 7cb476f8

由 Jan Kara 提交于 5月 13, 2016

ext4_dax_get_blocks() was accidentally omitted fixing get blocks
handlers to properly handle transient ENOSPC errors. Fix it now to use
ext4_get_blocks_trans() helper which takes care of these errors.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

7cb476f8

ext4: switch to ->iterate_shared() · ae05327a

由 Al Viro 提交于 5月 12, 2016

Note that we need relax_dir() equivalent for directories
locked shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ae05327a

06 5月, 2016 4 次提交

ext4: remove unmeetable inconsisteny check from ext4_find_extent() · 816cd71b

由 Nicolai Stange 提交于 5月 05, 2016

ext4_find_extent(), stripped down to the parts relevant to this patch,
reads as

  ppos = 0;
  i = depth;
  while (i) {
    --i;
    ++ppos;
    if (unlikely(ppos > depth)) {
      ...
      ret = -EFSCORRUPTED;
      goto err;
    }
  }

Due to the loop's bounds, the condition ppos > depth can never be met.

Remove this dead code.
Signed-off-by: NNicolai Stange <nicstange@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

816cd71b

ext4: remove unnecessary bio get/put · 32157de2

由 Jens Axboe 提交于 5月 05, 2016

ext4_io_submit() used to check for EOPNOTSUPP after bio submission,
which is why it had to get an extra reference to the bio before
submitting it. But since we no longer touch the bio after submission,
get rid of the redundant get/put of the bio. If we do get the extra
reference, we enter the slower path of having to flag this bio as now
having external references.
Signed-off-by: NJens Axboe <axboe@fb.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

32157de2

ext4: silence UBSAN in ext4_mb_init() · 935244cd

由 Nicolai Stange 提交于 5月 05, 2016

Currently, in ext4_mb_init(), there's a loop like the following:

  do {
    ...
    offset += 1 << (sb->s_blocksize_bits - i);
    i++;
  } while (i <= sb->s_blocksize_bits + 1);

Note that the updated offset is used in the loop's next iteration only.

However, at the last iteration, that is at i == sb->s_blocksize_bits + 1,
the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3))
and UBSAN reports

  UBSAN: Undefined behaviour in fs/ext4/mballoc.c:2621:15
  shift exponent 4294967295 is too large for 32-bit type 'int'
  [...]
  Call Trace:
   [<ffffffff818c4d25>] dump_stack+0xbc/0x117
   [<ffffffff818c4c69>] ? _atomic_dec_and_lock+0x169/0x169
   [<ffffffff819411ab>] ubsan_epilogue+0xd/0x4e
   [<ffffffff81941cac>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
   [<ffffffff81941ab1>] ? __ubsan_handle_load_invalid_value+0x158/0x158
   [<ffffffff814b6dc1>] ? kmem_cache_alloc+0x101/0x390
   [<ffffffff816fc13b>] ? ext4_mb_init+0x13b/0xfd0
   [<ffffffff814293c7>] ? create_cache+0x57/0x1f0
   [<ffffffff8142948a>] ? create_cache+0x11a/0x1f0
   [<ffffffff821c2168>] ? mutex_lock+0x38/0x60
   [<ffffffff821c23ab>] ? mutex_unlock+0x1b/0x50
   [<ffffffff814c26ab>] ? put_online_mems+0x5b/0xc0
   [<ffffffff81429677>] ? kmem_cache_create+0x117/0x2c0
   [<ffffffff816fcc49>] ext4_mb_init+0xc49/0xfd0
   [...]

Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1.

Unless compilers start to do some fancy transformations (which at least
GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
such calculated value of offset is never used again.

Silence UBSAN by introducing another variable, offset_incr, holding the
next increment to apply to offset and adjust that one by right shifting it
by one position per loop iteration.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161

Cc: stable@vger.kernel.org
Signed-off-by: NNicolai Stange <nicstange@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

935244cd

ext4: address UBSAN warning in mb_find_order_for_block() · b5cb316c

由 Nicolai Stange 提交于 5月 05, 2016

Currently, in mb_find_order_for_block(), there's a loop like the following:

  while (order <= e4b->bd_blkbits + 1) {
    ...
    bb += 1 << (e4b->bd_blkbits - order);
  }

Note that the updated bb is used in the loop's next iteration only.

However, at the last iteration, that is at order == e4b->bd_blkbits + 1,
the shift count becomes negative (c.f. C99 6.5.7(3)) and UBSAN reports

  UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1281:11
  shift exponent -1 is negative
  [...]
  Call Trace:
   [<ffffffff818c4d35>] dump_stack+0xbc/0x117
   [<ffffffff818c4c79>] ? _atomic_dec_and_lock+0x169/0x169
   [<ffffffff819411bb>] ubsan_epilogue+0xd/0x4e
   [<ffffffff81941cbc>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254
   [<ffffffff81941ac1>] ? __ubsan_handle_load_invalid_value+0x158/0x158
   [<ffffffff816e93a0>] ? ext4_mb_generate_from_pa+0x590/0x590
   [<ffffffff816502c8>] ? ext4_read_block_bitmap_nowait+0x598/0xe80
   [<ffffffff816e7b7e>] mb_find_order_for_block+0x1ce/0x240
   [...]

Unless compilers start to do some fancy transformations (which at least
GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the
such calculated value of bb is never used again.

Silence UBSAN by introducing another variable, bb_incr, holding the next
increment to apply to bb and adjust that one by right shifting it by one
position per loop iteration.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161

Cc: stable@vger.kernel.org
Signed-off-by: NNicolai Stange <nicstange@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b5cb316c

05 5月, 2016 2 次提交

ext4: fix oops on corrupted filesystem · 74177f55

由 Jan Kara 提交于 5月 05, 2016

When filesystem is corrupted in the right way, it can happen
ext4_mark_iloc_dirty() in ext4_orphan_add() returns error and we
subsequently remove inode from the in-memory orphan list. However this
deletion is done with list_del(&EXT4_I(inode)->i_orphan) and thus we
leave i_orphan list_head with a stale content. Later we can look at this
content causing list corruption, oops, or other issues. The reported
trace looked like:

WARNING: CPU: 0 PID: 46 at lib/list_debug.c:53 __list_del_entry+0x6b/0x100()
list_del corruption, 0000000061c1d6e0->next is LIST_POISON1
0000000000100100)
CPU: 0 PID: 46 Comm: ext4.exe Not tainted 4.1.0-rc4+ #250
Stack:
 60462947 62219960 602ede24 62219960
 602ede24 603ca293 622198f0 602f02eb
 62219950 6002c12c 62219900 601b4d6b
Call Trace:
 [<6005769c>] ? vprintk_emit+0x2dc/0x5c0
 [<602ede24>] ? printk+0x0/0x94
 [<600190bc>] show_stack+0xdc/0x1a0
 [<602ede24>] ? printk+0x0/0x94
 [<602ede24>] ? printk+0x0/0x94
 [<602f02eb>] dump_stack+0x2a/0x2c
 [<6002c12c>] warn_slowpath_common+0x9c/0xf0
 [<601b4d6b>] ? __list_del_entry+0x6b/0x100
 [<6002c254>] warn_slowpath_fmt+0x94/0xa0
 [<602f4d09>] ? __mutex_lock_slowpath+0x239/0x3a0
 [<6002c1c0>] ? warn_slowpath_fmt+0x0/0xa0
 [<60023ebf>] ? set_signals+0x3f/0x50
 [<600a205a>] ? kmem_cache_free+0x10a/0x180
 [<602f4e88>] ? mutex_lock+0x18/0x30
 [<601b4d6b>] __list_del_entry+0x6b/0x100
 [<601177ec>] ext4_orphan_del+0x22c/0x2f0
 [<6012f27c>] ? __ext4_journal_start_sb+0x2c/0xa0
 [<6010b973>] ? ext4_truncate+0x383/0x390
 [<6010bc8b>] ext4_write_begin+0x30b/0x4b0
 [<6001bb50>] ? copy_from_user+0x0/0xb0
 [<601aa840>] ? iov_iter_fault_in_readable+0xa0/0xc0
 [<60072c4f>] generic_perform_write+0xaf/0x1e0
 [<600c4166>] ? file_update_time+0x46/0x110
 [<60072f0f>] __generic_file_write_iter+0x18f/0x1b0
 [<6010030f>] ext4_file_write_iter+0x15f/0x470
 [<60094e10>] ? unlink_file_vma+0x0/0x70
 [<6009b180>] ? unlink_anon_vmas+0x0/0x260
 [<6008f169>] ? free_pgtables+0xb9/0x100
 [<600a6030>] __vfs_write+0xb0/0x130
 [<600a61d5>] vfs_write+0xa5/0x170
 [<600a63d6>] SyS_write+0x56/0xe0
 [<6029fcb0>] ? __libc_waitpid+0x0/0xa0
 [<6001b698>] handle_syscall+0x68/0x90
 [<6002633d>] userspace+0x4fd/0x600
 [<6002274f>] ? save_registers+0x1f/0x40
 [<60028bd7>] ? arch_prctl+0x177/0x1b0
 [<60017bd5>] fork_handler+0x85/0x90

Fix the problem by using list_del_init() as we always should with
i_orphan list.

CC: stable@vger.kernel.org
Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

74177f55

ext4: fix check of dqget() return value in ext4_ioctl_setproject() · ff0bc084

由 Seth Forshee 提交于 5月 05, 2016

A failed call to dqget() returns an ERR_PTR() and not null. Fix
the check in ext4_ioctl_setproject() to handle this correctly.

Fixes: 9b7365fc ("ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support")
Cc: stable@vger.kernel.org # v4.5
Signed-off-by: NSeth Forshee <seth.forshee@canonical.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

ff0bc084

02 5月, 2016 3 次提交

fs: simplify the generic_write_sync prototype · e2592217

由 Christoph Hellwig 提交于 4月 07, 2016

The kiocb already has the new position, so use that.  The only interesting
case is AIO, where we currently don't bother updating ki_pos.  We're about
to free the kiocb after we're done, so we might as well update it to make
everyone's life simpler.

While we're at it also return the bytes written argument passed in if
we were successful so that the boilerplate error switch code in the
callers can go away.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e2592217

fs: add IOCB_SYNC and IOCB_DSYNC · dde0c2e7

由 Christoph Hellwig 提交于 4月 07, 2016

This will allow us to do per-I/O sync file writes, as required by a lot
of fileservers or storage targets.

XXX: Will need a few additional audits for O_DSYNC
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dde0c2e7

direct-io: eliminate the offset argument to ->direct_IO · c8b8e32d

由 Christoph Hellwig 提交于 4月 07, 2016

Including blkdev_direct_IO and dax_do_io.  It has to be ki_pos to actually
work, so eliminate the superflous argument.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c8b8e32d

30 4月, 2016 2 次提交

ext4: clean up error handling when orphan list is corrupted · 7827a7f6

由 Theodore Ts'o 提交于 4月 30, 2016

Instead of just printing warning messages, if the orphan list is
corrupted, declare the file system is corrupted.  If there are any
reserved inodes in the orphaned inode list, declare the file system
corrupted and stop right away to avoid doing more potential damage to
the file system.

Cc: stable@vger.kernel.org
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

7827a7f6

ext4: fix hang when processing corrupted orphaned inode list · c9eb13a9

由 Theodore Ts'o 提交于 4月 30, 2016

If the orphaned inode list contains inode #5, ext4_iget() returns a
bad inode (since the bootloader inode should never be referenced
directly).  Because of the bad inode, we end up processing the inode
repeatedly and this hangs the machine.

This can be reproduced via:

   mke2fs -t ext4 /tmp/foo.img 100
   debugfs -w -R "ssv last_orphan 5" /tmp/foo.img
   mount -o loop /tmp/foo.img /mnt

(But don't do this if you are using an unpatched kernel if you care
about the system staying functional.  :-)

This bug was found by the port of American Fuzzy Lop into the kernel
to find file system problems[1].  (Since it *only* happens if inode #5
shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not
surprising that AFL needed two hours before it found it.)

[1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf

Cc: stable@vger.kernel.org
Reported by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c9eb13a9

27 4月, 2016 1 次提交

ext4: remove trailing \n from ext4_warning/ext4_error calls · 8d2ae1cb

由 Jakub Wilk 提交于 4月 27, 2016

Messages passed to ext4_warning() or ext4_error() don't need trailing
newlines, because these function add the newlines themselves.
Signed-off-by: NJakub Wilk <jwilk@jwilk.net>

8d2ae1cb

26 4月, 2016 3 次提交

ext4: fix races between changing inode journal mode and ext4_writepages · c8585c6f

由 Daeho Jeong 提交于 4月 25, 2016

In ext4, there is a race condition between changing inode journal mode
and ext4_writepages(). While ext4_writepages() is executed on a
non-journalled mode inode, the inode's journal mode could be enabled
by ioctl() and then, some pages dirtied after switching the journal
mode will be still exposed to ext4_writepages() in non-journaled mode.
To resolve this problem, we use fs-wide per-cpu rw semaphore by Jan
Kara's suggestion because we don't want to waste ext4_inode_info's
space for this extra rare case.
Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

c8585c6f

ext4: handle unwritten or delalloc buffers before enabling data journaling · 4c546592

由 Daeho Jeong 提交于 4月 25, 2016

We already allocate delalloc blocks before changing the inode mode into
"per-file data journal" mode to prevent delalloc blocks from remaining
not allocated, but another issue concerned with "BH_Unwritten" status
still exists. For example, by fallocate(), several buffers' status
change into "BH_Unwritten", but these buffers cannot be processed by
ext4_alloc_da_blocks(). So, they still remain in unwritten status after
per-file data journaling is enabled and they cannot be changed into
written status any more and, if they are journaled and eventually
checkpointed, these unwritten buffer will cause a kernel panic by the
below BUG_ON() function of submit_bh_wbc() when they are submitted
during checkpointing.

static int submit_bh_wbc(int rw, struct buffer_head *bh,...
{
        ...
        BUG_ON(buffer_unwritten(bh));

Moreover, when "dioread_nolock" option is enabled, the status of a
buffer is changed into "BH_Unwritten" after write_begin() completes and
the "BH_Unwritten" status will be cleared after I/O is done. Therefore,
if a buffer's status is changed into unwrutten but the buffer's I/O is
not submitted and completed, it can cause the same problem after
enabling per-file data journaling. You can easily generate this bug by
executing the following command.

./kvm-xfstests -C 10000 -m nodelalloc,dioread_nolock generic/269

To resolve these problems and define a boundary between the previous
mode and per-file data journaling mode, we need to flush and wait all
the I/O of buffers of a file before enabling per-file data journaling
of the file.
Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

4c546592

ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart() · 7b808191

由 Theodore Ts'o 提交于 4月 25, 2016

The function jbd2_journal_extend() takes as its argument the number of
new credits to be added to the handle. We weren't taking into account
the currently unused handle credits; worse, we would try to extend the
handle by N credits when it had N credits available.

In the case where jbd2_journal_extend() fails because the transaction
is too large, when jbd2_journal_restart() gets called, the N credits
owned by the handle gets returned to the transaction, and the
transaction commit is asynchronously requested, and then
start_this_handle() will be able to successfully attach the handle to
the current transaction since the required credits are now available.

This is mostly harmless, but since ext4_ext_truncate_extend_restart()
returns EAGAIN, the truncate machinery will once again try to call
ext4_ext_truncate_extend_restart(), which will do the above sequence
over and over again until the transaction has committed.

This was found while I was debugging a lockup in caused by running
xfstests generic/074 in the data=journal case. I'm still not sure why
we ended up looping forever, which suggests there may still be another
bug hiding in the transaction accounting machinery, but this commit
prevents us from looping in the first place.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

7b808191

24 4月, 2016 4 次提交

ext4: do not ask jbd2 to write data for delalloc buffers · ee0876bc

由 Jan Kara 提交于 4月 24, 2016

Currently we ask jbd2 to write all dirty allocated buffers before
committing a transaction when doing writeback of delay allocated blocks.
However this is unnecessary since we move all pages to writeback state
before dropping a transaction handle and then submit all the necessary
IO. We still need the transaction commit to wait for all the outstanding
writeback before flushing disk caches during transaction commit to avoid
data exposure issues though. Use the new jbd2 capability and ask it to
only wait for outstanding writeback during transaction commit when
writing back data in ext4_writepages().
Tested-by: N"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ee0876bc

jbd2: add support for avoiding data writes during transaction commits · 41617e1a

由 Jan Kara 提交于 4月 24, 2016

Currently when filesystem needs to make sure data is on permanent
storage before committing a transaction it adds inode to transaction's
inode list. During transaction commit, jbd2 writes back all dirty
buffers that have allocated underlying blocks and waits for the IO to
finish. However when doing writeback for delayed allocated data, we
allocate blocks and immediately submit the data. Thus asking jbd2 to
write dirty pages just unnecessarily adds more work to jbd2 possibly
writing back other redirtied blocks.

Add support to jbd2 to allow filesystem to ask jbd2 to only wait for
outstanding data writes before committing a transaction and thus avoid
unnecessary writes.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

41617e1a

ext4: remove EXT4_STATE_ORDERED_MODE · 3957ef53

由 Jan Kara 提交于 4月 24, 2016

This flag is just duplicating what ext4_should_order_data() tells you
and is used in a single place. Furthermore it doesn't reflect changes to
inode data journalling flag so it may be possibly misleading. Just
remove it.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

3957ef53

ext4: fix data exposure after a crash · 06bd3c36

由 Jan Kara 提交于 4月 24, 2016

Huang has reported that in his powerfail testing he is seeing stale
block contents in some of recently allocated blocks although he mounts
ext4 in data=ordered mode. After some investigation I have found out
that indeed when delayed allocation is used, we don't add inode to
transaction's list of inodes needing flushing before commit. Originally
we were doing that but commit f3b59291 removed the logic with a
flawed argument that it is not needed.

The problem is that although for delayed allocated blocks we write their
contents immediately after allocating them, there is no guarantee that
the IO scheduler or device doesn't reorder things and thus transaction
allocating blocks and attaching them to inode can reach stable storage
before actual block contents. Actually whenever we attach freshly
allocated blocks to inode using a written extent, we should add inode to
transaction's ordered inode list to make sure we properly wait for block
contents to be written before committing the transaction. So that is
what we do in this patch. This also handles other cases where stale data
exposure was possible - like filling hole via mmap in
data=ordered,nodelalloc mode.

The only exception to the above rule are extending direct IO writes where
blkdev_direct_IO() waits for IO to complete before increasing i_size and
thus stale data exposure is not possible. For now we don't complicate
the code with optimizing this special case since the overhead is pretty
low. In case this is observed to be a performance problem we can always
handle it using a special flag to ext4_map_blocks().

CC: stable@vger.kernel.org
Fixes: f3b59291Reported-by: N"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
Tested-by: N"HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

06bd3c36

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功