提交 · 0db1ff222d40f1601c961f0edb86d10426992595 · openeuler / Kernel

05 2月, 2017 6 次提交

ext4: add shutdown bit and check for it · 0db1ff22

由 Theodore Ts'o 提交于 2月 05, 2017

Add a shutdown bit that will cause ext4 processing to fail immediately
with EIO.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

0db1ff22

ext4: rename s_resize_flags to s_ext4_flags · 9549a168

由 Theodore Ts'o 提交于 2月 05, 2017

We are currently using one bit in s_resize_flags; rename it in order
to allow more of the bits in that unsigned long for other purposes.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

9549a168

ext4: return EROFS if device is r/o and journal replay is needed · 4753d8a2

由 Theodore Ts'o 提交于 2月 05, 2017

If the file system requires journal recovery, and the device is
read-ony, return EROFS to the mount system call.  This allows xfstests
generic/050 to pass.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

4753d8a2

ext4: preserve the needs_recovery flag when the journal is aborted · 97abd7d4

由 Theodore Ts'o 提交于 2月 04, 2017

If the journal is aborted, the needs_recovery feature flag should not
be removed.  Otherwise, it's the journal might not get replayed and
this could lead to more data getting lost.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

97abd7d4

jbd2: don't leak modified metadata buffers on an aborted journal · e112666b

由 Theodore Ts'o 提交于 2月 04, 2017

If the journal has been aborted, we shouldn't mark the underlying
buffer head as dirty, since that will cause the metadata block to get
modified.  And if the journal has been aborted, we shouldn't allow
this since it will almost certainly lead to a corrupted file system.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

e112666b

ext4: fix inline data error paths · eb5efbcb

由 Theodore Ts'o 提交于 2月 04, 2017

The write_end() function must always unlock the page and drop its ref
count, even on an error.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

eb5efbcb

03 2月, 2017 1 次提交

ext4: move halfmd4 into hash.c directly · 1c83a9aa

由 Jason A. Donenfeld 提交于 2月 02, 2017

The "half md4" transform should not be used by any new code. And
fortunately, it's only used now by ext4. Since ext4 supports several
hashing methods, at some point it might be desirable to move to
something like SipHash. As an intermediate step, remove half md4 from
cryptohash.h and lib, and make it just a local function in ext4's
hash.c. There's precedent for doing this; the other function ext can use
for its hashes -- TEA -- is also implemented in the same place. Also, by
being a local function, this might allow gcc to perform some additional
optimizations.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

1c83a9aa

02 2月, 2017 2 次提交

ext4: fix use-after-iput when fscrypt contexts are inconsistent · dd01b690

由 Eric Biggers 提交于 2月 01, 2017

In the case where the child's encryption context was inconsistent with
its parent directory, we were using inode->i_sb and inode->i_ino after
the inode had already been iput().  Fix this by doing the iput() in the
correct places.

Note: only ext4 had this bug, not f2fs and ubifs.

Fixes: d9cdc903 ("ext4 crypto: enforce context consistency")
Cc: stable@vger.kernel.org
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

dd01b690

jbd2: fix use after free in kjournald2() · dbfcef6b

由 Sahitya Tummala 提交于 2月 01, 2017

Below is the synchronization issue between unmount and kjournald2
contexts, which results into use after free issue in kjournald2().
Fix this issue by using journal->j_state_lock to synchronize the
wait_event() done in journal_kill_thread() and the wake_up() done
in kjournald2().

TASK 1:
umount cmd:
   |--jbd2_journal_destroy() {
       |--journal_kill_thread() {
            write_lock(&journal->j_state_lock);
	    journal->j_flags |= JBD2_UNMOUNT;
	    ...
	    write_unlock(&journal->j_state_lock);
	    wake_up(&journal->j_wait_commit);	   TASK 2 wakes up here:
	    					   kjournald2() {
						     ...
						     checks JBD2_UNMOUNT flag and calls goto end-loop;
						     ...
						     end_loop:
						       write_unlock(&journal->j_state_lock);
						       journal->j_task = NULL; --> If this thread gets
						       pre-empted here, then TASK 1 wait_event will
						       exit even before this thread is completely
						       done.
	    wait_event(journal->j_wait_done_commit, journal->j_task == NULL);
	    ...
	    write_lock(&journal->j_state_lock);
	    write_unlock(&journal->j_state_lock);
	  }
       |--kfree(journal);
     }
}
						       wake_up(&journal->j_wait_done_commit); --> this step
						       now results into use after free issue.
						   }
Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

dbfcef6b

28 1月, 2017 2 次提交

ext4: fix data corruption in data=journal mode · 3b136499

由 Jan Kara 提交于 1月 27, 2017

ext4_journalled_write_end() did not propely handle all the cases when
generic_perform_write() did not copy all the data into the target page
and could mark buffers with uninitialized contents as uptodate and dirty
leading to possible data corruption (which would be quickly fixed by
generic_perform_write() retrying the write but still). Fix the problem
by carefully handling the case when the page that is written to is not
uptodate.

CC: stable@vger.kernel.org
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

3b136499

ext4: trim allocation requests to group size · cd648b8a

由 Jan Kara 提交于 1月 27, 2017

If filesystem groups are artifically small (using parameter -g to
mkfs.ext4), ext4_mb_normalize_request() can result in a request that is
larger than a block group. Trim the request size to not confuse
allocation code.
Reported-by: N"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

cd648b8a

23 1月, 2017 2 次提交

ext4: replace BUG_ON with WARN_ON in mb_find_extent() · 43c73221

由 Theodore Ts'o 提交于 1月 22, 2017

The last BUG_ON in mb_find_extent() is apparently triggering in some
rare cases.  Most of the time it indicates a bug in the buddy bitmap
algorithms, but there are some weird cases where it can trigger when
buddy bitmap is still in memory, but the block bitmap has to be read
from disk, and there is disk or memory corruption such that the block
bitmap and the buddy bitmap are out of sync.

Google-Bug-Id: #33702157
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

43c73221

T
ext4: propagate error values from ext4_inline_data_truncate() · 01daf945
由 Theodore Ts'o 提交于 1月 22, 2017
```
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
01daf945

12 1月, 2017 3 次提交

ext4: avoid calling ext4_mark_inode_dirty() under unneeded semaphores · b907f2d5

由 Theodore Ts'o 提交于 1月 11, 2017

There is no need to call ext4_mark_inode_dirty while holding xattr_sem
or i_data_sem, so where it's easy to avoid it, move it out from the
critical region.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b907f2d5

ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea() · c755e251

由 Theodore Ts'o 提交于 1月 11, 2017

The xattr_sem deadlock problems fixed in commit 2e81a4ee: "ext4:
avoid deadlock when expanding inode size" didn't include the use of
xattr_sem in fs/ext4/inline.c.  With the addition of project quota
which added a new extra inode field, this exposed deadlocks in the
inline_data code similar to the ones fixed by 2e81a4ee.

The deadlock can be reproduced via:

   dmesg -n 7
   mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768
   mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc
   mkdir /vdc/a
   umount /vdc
   mount -t ext4 /dev/vdc /vdc
   echo foo > /vdc/a/foo

and looks like this:

[   11.158815] 
[   11.160276] =============================================
[   11.161960] [ INFO: possible recursive locking detected ]
[   11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G        W      
[   11.161960] ---------------------------------------------
[   11.161960] bash/2519 is trying to acquire lock:
[   11.161960]  (&ei->xattr_sem){++++..}, at: [<c1225a4b>] ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960] 
[   11.161960] but task is already holding lock:
[   11.161960]  (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152
[   11.161960] 
[   11.161960] other info that might help us debug this:
[   11.161960]  Possible unsafe locking scenario:
[   11.161960] 
[   11.161960]        CPU0
[   11.161960]        ----
[   11.161960]   lock(&ei->xattr_sem);
[   11.161960]   lock(&ei->xattr_sem);
[   11.161960] 
[   11.161960]  *** DEADLOCK ***
[   11.161960] 
[   11.161960]  May be due to missing lock nesting notation
[   11.161960] 
[   11.161960] 4 locks held by bash/2519:
[   11.161960]  #0:  (sb_writers#3){.+.+.+}, at: [<c11a2414>] mnt_want_write+0x1e/0x3e
[   11.161960]  #1:  (&type->i_mutex_dir_key){++++++}, at: [<c119508b>] path_openat+0x338/0x67a
[   11.161960]  #2:  (jbd2_handle){++++..}, at: [<c123314a>] start_this_handle+0x582/0x622
[   11.161960]  #3:  (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152
[   11.161960] 
[   11.161960] stack backtrace:
[   11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G        W       4.10.0-rc3-00015-g011b30a8a3cf #160
[   11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
[   11.161960] Call Trace:
[   11.161960]  dump_stack+0x72/0xa3
[   11.161960]  __lock_acquire+0xb7c/0xcb9
[   11.161960]  ? kvm_clock_read+0x1f/0x29
[   11.161960]  ? __lock_is_held+0x36/0x66
[   11.161960]  ? __lock_is_held+0x36/0x66
[   11.161960]  lock_acquire+0x106/0x18a
[   11.161960]  ? ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960]  down_write+0x39/0x72
[   11.161960]  ? ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960]  ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960]  ? _raw_read_unlock+0x22/0x2c
[   11.161960]  ? jbd2_journal_extend+0x1e2/0x262
[   11.161960]  ? __ext4_journal_get_write_access+0x3d/0x60
[   11.161960]  ext4_mark_inode_dirty+0x17d/0x26d
[   11.161960]  ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
[   11.161960]  ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
[   11.161960]  ext4_try_add_inline_entry+0x69/0x152
[   11.161960]  ext4_add_entry+0xa3/0x848
[   11.161960]  ? __brelse+0x14/0x2f
[   11.161960]  ? _raw_spin_unlock_irqrestore+0x44/0x4f
[   11.161960]  ext4_add_nondir+0x17/0x5b
[   11.161960]  ext4_create+0xcf/0x133
[   11.161960]  ? ext4_mknod+0x12f/0x12f
[   11.161960]  lookup_open+0x39e/0x3fb
[   11.161960]  ? __wake_up+0x1a/0x40
[   11.161960]  ? lock_acquire+0x11e/0x18a
[   11.161960]  path_openat+0x35c/0x67a
[   11.161960]  ? sched_clock_cpu+0xd7/0xf2
[   11.161960]  do_filp_open+0x36/0x7c
[   11.161960]  ? _raw_spin_unlock+0x22/0x2c
[   11.161960]  ? __alloc_fd+0x169/0x173
[   11.161960]  do_sys_open+0x59/0xcc
[   11.161960]  SyS_open+0x1d/0x1f
[   11.161960]  do_int80_syscall_32+0x4f/0x61
[   11.161960]  entry_INT80_32+0x2f/0x2f
[   11.161960] EIP: 0xb76ad469
[   11.161960] EFLAGS: 00000286 CPU: 0
[   11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6
[   11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0
[   11.161960]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

Cc: stable@vger.kernel.org # 3.10 (requires 2e81a4ee as a prereq)
Reported-by: NGeorge Spelvin <linux@sciencehorizons.net>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c755e251

ext4: add debug_want_extra_isize mount option · 670e9875

由 Theodore Ts'o 提交于 1月 11, 2017

In order to test the inode extra isize expansion code, it is useful to
be able to easily create file systems that have inodes with extra
isize values smaller than the current desired value.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

670e9875

09 1月, 2017 2 次提交

ext4: do not polute the extents cache while shifting extents · 03e916fa

由 Roman Pen 提交于 1月 08, 2017

Inside ext4_ext_shift_extents() function ext4_find_extent() is called
without EXT4_EX_NOCACHE flag, which should prevent cache population.

This leads to oudated offsets in the extents tree and wrong blocks
afterwards.

Patch fixes the problem providing EXT4_EX_NOCACHE flag for each
ext4_find_extents() call inside ext4_ext_shift_extents function.

Fixes: 331573feSigned-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: stable@vger.kernel.org

03e916fa

ext4: Include forgotten start block on fallocate insert range · 2a9b8cba

由 Roman Pen 提交于 1月 08, 2017

While doing 'insert range' start block should be also shifted right.
The bug can be easily reproduced by the following test:

    ptr = malloc(4096);
    assert(ptr);

    fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600);
    assert(fd >= 0);

    rc = fallocate(fd, 0, 0, 8192);
    assert(rc == 0);
    for (i = 0; i < 2048; i++)
            *((unsigned short *)ptr + i) = 0xbeef;
    rc = pwrite(fd, ptr, 4096, 0);
    assert(rc == 4096);
    rc = pwrite(fd, ptr, 4096, 4096);
    assert(rc == 4096);

    for (block = 2; block < 1000; block++) {
            rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096);
            assert(rc == 0);

            for (i = 0; i < 2048; i++)
                    *((unsigned short *)ptr + i) = block;

            rc = pwrite(fd, ptr, 4096, 4096);
            assert(rc == 4096);
    }

Because start block is not included in the range the hole appears at
the wrong offset (just after the desired offset) and the following
pwrite() overwrites already existent block, keeping hole untouched.

Simple way to verify wrong behaviour is to check zeroed blocks after
the test:

   $ hexdump ./ext4.file | grep '0000 0000'

The root cause of the bug is a wrong range (start, stop], where start
should be inclusive, i.e. [start, stop].

This patch fixes the problem by including start into the range.  But
not to break left shift (range collapse) stop points to the beginning
of the a block, not to the end.

The other not obvious change is an iterator check on validness in a
main loop.  Because iterator is unsigned the following corner case
should be considered with care: insert a block at 0 offset, when stop
variables overflows and never becomes less than start, which is 0.
To handle this special case iterator is set to NULL to indicate that
end of the loop is reached.

Fixes: 331573feSigned-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: stable@vger.kernel.org

2a9b8cba

08 1月, 2017 2 次提交

fscrypt: make fscrypt_operations.key_prefix a string · a5d431ef

由 Eric Biggers 提交于 1月 05, 2017

There was an unnecessary amount of complexity around requesting the
filesystem-specific key prefix.  It was unclear why; perhaps it was
envisioned that different instances of the same filesystem type could
use different key prefixes, or that key prefixes could be binary.
However, neither of those things were implemented or really make sense
at all.  So simplify the code by making key_prefix a const char *.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

a5d431ef

ext4: don't allow encrypted operations without keys · 173b8439

由 Theodore Ts'o 提交于 12月 28, 2016

While we allow deletes without the key, the following should not be
permitted:

# cd /vdc/encrypted-dir-without-key
# ls -l
total 4
-rw-r--r-- 1 root root   0 Dec 27 22:35 6,LKNRJsp209FbXoSvJWzB
-rw-r--r-- 1 root root 286 Dec 27 22:35 uRJ5vJh9gE7vcomYMqTAyD
# mv uRJ5vJh9gE7vcomYMqTAyD  6,LKNRJsp209FbXoSvJWzB
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

173b8439

04 1月, 2017 4 次提交

xfs: fix max_retries _show and _store functions · ff97f239

由 Carlos Maiolino 提交于 1月 03, 2017

max_retries _show and _store functions should test against cfg->max_retries,
not cfg->retry_timeout
Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

ff97f239

xfs: fix crash and data corruption due to removal of busy COW extents · a1b7a4de

由 Christoph Hellwig 提交于 1月 03, 2017

There is a race window between write_cache_pages calling
clear_page_dirty_for_io and XFS calling set_page_writeback, in which
the mapping for an inode is tagged neither as dirty, nor as writeback.

If the COW shrinker hits in exactly that window we'll remove the delayed
COW extents and writepages trying to write it back, which in release
kernels will manifest as corruption of the bmap btree, and in debug
kernels will trip the ASSERT about now calling xfs_bmapi_write with the
COWFORK flag for holes. A complex customer load manages to hit this
window fairly reliably, probably by always having COW writeback in flight
while the cow shrinker runs.

This patch adds another check for having the I_DIRTY_PAGES flag set,
which is still set during this race window. While this fixes the problem
I'm still not overly happy about the way the COW shrinker works as it
still seems a bit fragile.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

a1b7a4de

xfs: use the actual AG length when reserving blocks · 20e73b00

由 Darrick J. Wong 提交于 1月 03, 2017

We need to use the actual AG length when making per-AG reservations,
since we could otherwise end up reserving more blocks out of the last
AG than there are actual blocks.
Complained-about-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

20e73b00

xfs: fix double-cleanup when CUI recovery fails · 7a21272b

由 Darrick J. Wong 提交于 1月 03, 2017

Dan Carpenter reported a double-free of rcur if _defer_finish fails
while we're recovering CUI items.  Fix the error recovery to prevent
this.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

7a21272b

03 1月, 2017 2 次提交

fscrypt: make test_dummy_encryption require a keyring key · 5bbdcbbb

由 Theodore Ts'o 提交于 1月 02, 2017

Currently, the test_dummy_encryption ext4 mount option, which exists
only to test encrypted I/O paths with xfstests, overrides all
per-inode encryption keys with a fixed key.

This change minimizes test_dummy_encryption-specific code path changes
by supplying a fake context for directories which are not encrypted
for use when creating new directories, files, or symlinks.  This
allows us to properly exercise the keyring lookup, derivation, and
context inheritance code paths.

Before mounting a file system using test_dummy_encryption, userspace
must execute the following shell commands:

    mode='\x00\x00\x00\x00'
    raw="$(printf ""\\\\x%02x"" $(seq 0 63))"
    if lscpu | grep "Byte Order" | grep -q Little ; then
        size='\x40\x00\x00\x00'
    else
        size='\x00\x00\x00\x40'
    fi
    key="${mode}${raw}${size}"
    keyctl new_session
    echo -n -e "${key}" | keyctl padd logon fscrypt:4242424242424242 @s
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

5bbdcbbb

clean_bdev_aliases: Prevent cleaning blocks that are not in block range · 6c006a9d

由 Chandan Rajendra 提交于 12月 25, 2016

The first block to be cleaned may start at a non-zero page offset. In
such a scenario clean_bdev_aliases() will end up cleaning blocks that
do not fall in the range of blocks to be cleaned. This commit fixes the
issue by skipping blocks that do not fall in valid block range.
Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NEryu Guan <eguan@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6c006a9d

02 1月, 2017 1 次提交

fscrypt: factor out bio specific functions · 58ae7468

由 Richard Weinberger 提交于 12月 19, 2016

That way we can get rid of the direct dependency on CONFIG_BLOCK.

Fixes: d475a507 ("ubifs: Add skeleton for fscrypto")
Reported-by: NArnd Bergmann <arnd@arndb.de>
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDavid Gstir <david@sigma-star.at>
Signed-off-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

58ae7468

01 1月, 2017 5 次提交

fscrypt: pass up error codes from ->get_context() · efee590e