提交 · 2ba3e6e8afc9b6188b471f27cf2b5e3cf34e7af2 · openanolis / cloud-kernel

15 2月, 2017 2 次提交

ext4: fix fencepost in s_first_meta_bg validation · 2ba3e6e8

由 Theodore Ts'o 提交于 2月 15, 2017

It is OK for s_first_meta_bg to be equal to the number of block group
descriptor blocks.  (It rarely happens, but it shouldn't cause any
problems.)

https://bugzilla.kernel.org/show_bug.cgi?id=194567

Fixes: 3a4b77cdSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

2ba3e6e8

ext4: don't BUG when truncating encrypted inodes on the orphan list · 0d06863f

由 Theodore Ts'o 提交于 2月 14, 2017

Fix a BUG when the kernel tries to mount a file system constructed as
follows:

echo foo > foo.txt
mke2fs -Fq -t ext4 -O encrypt foo.img 100
debugfs -w foo.img << EOF
write foo.txt a
set_inode_field a i_flags 0x80800
set_super_value s_last_orphan 12
quit
EOF

root@kvm-xfstests:~# mount -o loop foo.img /mnt
[  160.238770] ------------[ cut here ]------------
[  160.240106] kernel BUG at /usr/projects/linux/ext4/fs/ext4/inode.c:3874!
[  160.240106] invalid opcode: 0000 [#1] SMP
[  160.240106] Modules linked in:
[  160.240106] CPU: 0 PID: 2547 Comm: mount Tainted: G        W       4.10.0-rc3-00034-gcdd33b941b67 #227
[  160.240106] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
[  160.240106] task: f4518000 task.stack: f47b6000
[  160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4
[  160.240106] EFLAGS: 00010246 CPU: 0
[  160.240106] EAX: 00000001 EBX: f7be4b50 ECX: f47b7dc0 EDX: 00000007
[  160.240106] ESI: f43b05a8 EDI: f43babec EBP: f47b7dd0 ESP: f47b7dac
[  160.240106]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  160.240106] CR0: 80050033 CR2: bfd85b08 CR3: 34a00680 CR4: 000006f0
[  160.240106] Call Trace:
[  160.240106]  ext4_truncate+0x1e9/0x3e5
[  160.240106]  ext4_fill_super+0x286f/0x2b1e
[  160.240106]  ? set_blocksize+0x2e/0x7e
[  160.240106]  mount_bdev+0x114/0x15f
[  160.240106]  ext4_mount+0x15/0x17
[  160.240106]  ? ext4_calculate_overhead+0x39d/0x39d
[  160.240106]  mount_fs+0x58/0x115
[  160.240106]  vfs_kern_mount+0x4b/0xae
[  160.240106]  do_mount+0x671/0x8c3
[  160.240106]  ? _copy_from_user+0x70/0x83
[  160.240106]  ? strndup_user+0x31/0x46
[  160.240106]  SyS_mount+0x57/0x7b
[  160.240106]  do_int80_syscall_32+0x4f/0x61
[  160.240106]  entry_INT80_32+0x2f/0x2f
[  160.240106] EIP: 0xb76b919e
[  160.240106] EFLAGS: 00000246 CPU: 0
[  160.240106] EAX: ffffffda EBX: 08053838 ECX: 08052188 EDX: 080537e8
[  160.240106] ESI: c0ed0000 EDI: 00000000 EBP: 080537e8 ESP: bfa13660
[  160.240106]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[  160.240106] Code: 59 8b 00 a8 01 0f 84 09 01 00 00 8b 07 66 25 00 f0 66 3d 00 80 75 61 89 f8 e8 3e e2 ff ff 84 c0 74 56 83 bf 48 02 00 00 00 75 02 <0f> 0b 81 7d e8 00 10 00 00 74 02 0f 0b 8b 43 04 8b 53 08 31 c9
[  160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4 SS:ESP: 0068:f47b7dac
[  160.317241] ---[ end trace d6a773a375c810a5 ]---

The problem is that when the kernel tries to truncate an inode in
ext4_truncate(), it tries to clear any on-disk data beyond i_size.
Without the encryption key, it can't do that, and so it triggers a
BUG.

E2fsck does *not* provide this service, and in practice most file
systems have their orphan list processed by e2fsck, so to avoid
crashing, this patch skips this step if we don't have access to the
encryption key (which is the case when processing the orphan list; in
all other cases, we will have the encryption key, or the kernel
wouldn't have allowed the file to be opened).

An open question is whether the fact that e2fsck isn't clearing the
bytes beyond i_size causing problems --- and if we've lived with it
not doing it for so long, can we drop this from the kernel replay of
the orphan list in all cases (not just when we don't have the key for
encrypted inodes).

Addresses-Google-Bug: #35209576
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

0d06863f

10 2月, 2017 2 次提交

ext4: do not use stripe_width if it is not set · 5469d7c3

由 Jan Kara 提交于 2月 10, 2017

Avoid using stripe_width for sbi->s_stripe value if it is not actually
set. It prevents using the stride for sbi->s_stripe.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

5469d7c3

ext4: fix stripe-unaligned allocations · d9b22cf9

由 Jan Kara 提交于 2月 10, 2017

When a filesystem is created using:

	mkfs.ext4 -b 4096 -E stride=512 <dev>

and we try to allocate 64MB extent, we will end up directly in
ext4_mb_complex_scan_group(). This is because the request is detected
as power-of-two allocation (so we start in ext4_mb_regular_allocator()
with ac_criteria == 0) however the check before
ext4_mb_simple_scan_group() refuses the direct buddy scan because the
allocation request is too large. Since cr == 0, the check whether we
should use ext4_mb_scan_aligned() fails as well and we fall back to
ext4_mb_complex_scan_group().

Fix the problem by checking for upper limit on power-of-two requests
directly when detecting them.
Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d9b22cf9

09 2月, 2017 2 次提交

dax: assert that i_rwsem is held exclusive for writes · 168316db

由 Christoph Hellwig 提交于 2月 08, 2017

Make sure all callers follow the same locking protocol, given that DAX
transparantly replaced the normal buffered I/O path.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

168316db

ext4: fix DAX write locking · ff5462e3

由 Christoph Hellwig 提交于 2月 08, 2017

Unlike O_DIRECT DAX is not an optional opt-in feature selected by the
application, so we'll have to provide the traditional synchronіzation
of overlapping writes as we do for buffered writes.

This was broken historically for DAX, but got fixed for ext2 and XFS
as part of the iomap conversion.  Fix up ext4 as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

ff5462e3

06 2月, 2017 1 次提交

ext4: add EXT4_IOC_GOINGDOWN ioctl · 783d9485

由 Theodore Ts'o 提交于 2月 05, 2017

This ioctl is modeled after the xfs's XFS_IOC_GOINGDOWN ioctl.  (In
fact, it uses the same code points.)
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

783d9485

05 2月, 2017 6 次提交

ext4: add shutdown bit and check for it · 0db1ff22

由 Theodore Ts'o 提交于 2月 05, 2017

Add a shutdown bit that will cause ext4 processing to fail immediately
with EIO.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

0db1ff22

ext4: rename s_resize_flags to s_ext4_flags · 9549a168

由 Theodore Ts'o 提交于 2月 05, 2017

We are currently using one bit in s_resize_flags; rename it in order
to allow more of the bits in that unsigned long for other purposes.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

9549a168

ext4: return EROFS if device is r/o and journal replay is needed · 4753d8a2

由 Theodore Ts'o 提交于 2月 05, 2017

If the file system requires journal recovery, and the device is
read-ony, return EROFS to the mount system call.  This allows xfstests
generic/050 to pass.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

4753d8a2

ext4: preserve the needs_recovery flag when the journal is aborted · 97abd7d4

由 Theodore Ts'o 提交于 2月 04, 2017

If the journal is aborted, the needs_recovery feature flag should not
be removed.  Otherwise, it's the journal might not get replayed and
this could lead to more data getting lost.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

97abd7d4

jbd2: don't leak modified metadata buffers on an aborted journal · e112666b

由 Theodore Ts'o 提交于 2月 04, 2017

If the journal has been aborted, we shouldn't mark the underlying
buffer head as dirty, since that will cause the metadata block to get
modified.  And if the journal has been aborted, we shouldn't allow
this since it will almost certainly lead to a corrupted file system.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

e112666b

ext4: fix inline data error paths · eb5efbcb

由 Theodore Ts'o 提交于 2月 04, 2017

The write_end() function must always unlock the page and drop its ref
count, even on an error.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

eb5efbcb

03 2月, 2017 1 次提交

ext4: move halfmd4 into hash.c directly · 1c83a9aa

由 Jason A. Donenfeld 提交于 2月 02, 2017

The "half md4" transform should not be used by any new code. And
fortunately, it's only used now by ext4. Since ext4 supports several
hashing methods, at some point it might be desirable to move to
something like SipHash. As an intermediate step, remove half md4 from
cryptohash.h and lib, and make it just a local function in ext4's
hash.c. There's precedent for doing this; the other function ext can use
for its hashes -- TEA -- is also implemented in the same place. Also, by
being a local function, this might allow gcc to perform some additional
optimizations.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

1c83a9aa

02 2月, 2017 2 次提交

ext4: fix use-after-iput when fscrypt contexts are inconsistent · dd01b690

由 Eric Biggers 提交于 2月 01, 2017

In the case where the child's encryption context was inconsistent with
its parent directory, we were using inode->i_sb and inode->i_ino after
the inode had already been iput().  Fix this by doing the iput() in the
correct places.

Note: only ext4 had this bug, not f2fs and ubifs.

Fixes: d9cdc903 ("ext4 crypto: enforce context consistency")
Cc: stable@vger.kernel.org
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

dd01b690

jbd2: fix use after free in kjournald2() · dbfcef6b

由 Sahitya Tummala 提交于 2月 01, 2017

Below is the synchronization issue between unmount and kjournald2
contexts, which results into use after free issue in kjournald2().
Fix this issue by using journal->j_state_lock to synchronize the
wait_event() done in journal_kill_thread() and the wake_up() done
in kjournald2().

TASK 1:
umount cmd:
   |--jbd2_journal_destroy() {
       |--journal_kill_thread() {
            write_lock(&journal->j_state_lock);
	    journal->j_flags |= JBD2_UNMOUNT;
	    ...
	    write_unlock(&journal->j_state_lock);
	    wake_up(&journal->j_wait_commit);	   TASK 2 wakes up here:
	    					   kjournald2() {
						     ...
						     checks JBD2_UNMOUNT flag and calls goto end-loop;
						     ...
						     end_loop:
						       write_unlock(&journal->j_state_lock);
						       journal->j_task = NULL; --> If this thread gets
						       pre-empted here, then TASK 1 wait_event will
						       exit even before this thread is completely
						       done.
	    wait_event(journal->j_wait_done_commit, journal->j_task == NULL);
	    ...
	    write_lock(&journal->j_state_lock);
	    write_unlock(&journal->j_state_lock);
	  }
       |--kfree(journal);
     }
}
						       wake_up(&journal->j_wait_done_commit); --> this step
						       now results into use after free issue.
						   }
Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

dbfcef6b

28 1月, 2017 2 次提交

ext4: fix data corruption in data=journal mode · 3b136499

由 Jan Kara 提交于 1月 27, 2017

ext4_journalled_write_end() did not propely handle all the cases when
generic_perform_write() did not copy all the data into the target page
and could mark buffers with uninitialized contents as uptodate and dirty
leading to possible data corruption (which would be quickly fixed by
generic_perform_write() retrying the write but still). Fix the problem
by carefully handling the case when the page that is written to is not
uptodate.

CC: stable@vger.kernel.org
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

3b136499

ext4: trim allocation requests to group size · cd648b8a

由 Jan Kara 提交于 1月 27, 2017

If filesystem groups are artifically small (using parameter -g to
mkfs.ext4), ext4_mb_normalize_request() can result in a request that is
larger than a block group. Trim the request size to not confuse
allocation code.
Reported-by: N"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

cd648b8a

23 1月, 2017 2 次提交

ext4: replace BUG_ON with WARN_ON in mb_find_extent() · 43c73221

由 Theodore Ts'o 提交于 1月 22, 2017

The last BUG_ON in mb_find_extent() is apparently triggering in some
rare cases.  Most of the time it indicates a bug in the buddy bitmap
algorithms, but there are some weird cases where it can trigger when
buddy bitmap is still in memory, but the block bitmap has to be read
from disk, and there is disk or memory corruption such that the block
bitmap and the buddy bitmap are out of sync.

Google-Bug-Id: #33702157
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

43c73221

T
ext4: propagate error values from ext4_inline_data_truncate() · 01daf945
由 Theodore Ts'o 提交于 1月 22, 2017
```
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
01daf945

12 1月, 2017 3 次提交

ext4: avoid calling ext4_mark_inode_dirty() under unneeded semaphores · b907f2d5

由 Theodore Ts'o 提交于 1月 11, 2017

There is no need to call ext4_mark_inode_dirty while holding xattr_sem
or i_data_sem, so where it's easy to avoid it, move it out from the
critical region.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b907f2d5

ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea() · c755e251

由 Theodore Ts'o 提交于 1月 11, 2017

The xattr_sem deadlock problems fixed in commit 2e81a4ee: "ext4:
avoid deadlock when expanding inode size" didn't include the use of
xattr_sem in fs/ext4/inline.c.  With the addition of project quota
which added a new extra inode field, this exposed deadlocks in the
inline_data code similar to the ones fixed by 2e81a4ee.

The deadlock can be reproduced via:

   dmesg -n 7
   mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768
   mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc
   mkdir /vdc/a
   umount /vdc
   mount -t ext4 /dev/vdc /vdc
   echo foo > /vdc/a/foo

and looks like this:

[   11.158815] 
[   11.160276] =============================================
[   11.161960] [ INFO: possible recursive locking detected ]
[   11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G        W      
[   11.161960] ---------------------------------------------
[   11.161960] bash/2519 is trying to acquire lock:
[   11.161960]  (&ei->xattr_sem){++++..}, at: [<c1225a4b>] ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960] 
[   11.161960] but task is already holding lock:
[   11.161960]  (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152
[   11.161960] 
[   11.161960] other info that might help us debug this:
[   11.161960]  Possible unsafe locking scenario:
[   11.161960] 
[   11.161960]        CPU0
[   11.161960]        ----
[   11.161960]   lock(&ei->xattr_sem);
[   11.161960]   lock(&ei->xattr_sem);
[   11.161960] 
[   11.161960]  *** DEADLOCK ***
[   11.161960] 
[   11.161960]  May be due to missing lock nesting notation
[   11.161960] 
[   11.161960] 4 locks held by bash/2519:
[   11.161960]  #0:  (sb_writers#3){.+.+.+}, at: [<c11a2414>] mnt_want_write+0x1e/0x3e
[   11.161960]  #1:  (&type->i_mutex_dir_key){++++++}, at: [<c119508b>] path_openat+0x338/0x67a
[   11.161960]  #2:  (jbd2_handle){++++..}, at: [<c123314a>] start_this_handle+0x582/0x622
[   11.161960]  #3:  (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152
[   11.161960] 
[   11.161960] stack backtrace:
[   11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G        W       4.10.0-rc3-00015-g011b30a8a3cf #160
[   11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
[   11.161960] Call Trace:
[   11.161960]  dump_stack+0x72/0xa3
[   11.161960]  __lock_acquire+0xb7c/0xcb9
[   11.161960]  ? kvm_clock_read+0x1f/0x29
[   11.161960]  ? __lock_is_held+0x36/0x66
[   11.161960]  ? __lock_is_held+0x36/0x66
[   11.161960]  lock_acquire+0x106/0x18a
[   11.161960]  ? ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960]  down_write+0x39/0x72
[   11.161960]  ? ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960]  ext4_expand_extra_isize_ea+0x3d/0x4cd
[   11.161960]  ? _raw_read_unlock+0x22/0x2c
[   11.161960]  ? jbd2_journal_extend+0x1e2/0x262
[   11.161960]  ? __ext4_journal_get_write_access+0x3d/0x60
[   11.161960]  ext4_mark_inode_dirty+0x17d/0x26d
[   11.161960]  ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
[   11.161960]  ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
[   11.161960]  ext4_try_add_inline_entry+0x69/0x152
[   11.161960]  ext4_add_entry+0xa3/0x848
[   11.161960]  ? __brelse+0x14/0x2f
[   11.161960]  ? _raw_spin_unlock_irqrestore+0x44/0x4f
[   11.161960]  ext4_add_nondir+0x17/0x5b
[   11.161960]  ext4_create+0xcf/0x133
[   11.161960]  ? ext4_mknod+0x12f/0x12f
[   11.161960]  lookup_open+0x39e/0x3fb
[   11.161960]  ? __wake_up+0x1a/0x40
[   11.161960]  ? lock_acquire+0x11e/0x18a
[   11.161960]  path_openat+0x35c/0x67a
[   11.161960]  ? sched_clock_cpu+0xd7/0xf2
[   11.161960]  do_filp_open+0x36/0x7c
[   11.161960]  ? _raw_spin_unlock+0x22/0x2c
[   11.161960]  ? __alloc_fd+0x169/0x173
[   11.161960]  do_sys_open+0x59/0xcc
[   11.161960]  SyS_open+0x1d/0x1f
[   11.161960]  do_int80_syscall_32+0x4f/0x61
[   11.161960]  entry_INT80_32+0x2f/0x2f
[   11.161960] EIP: 0xb76ad469
[   11.161960] EFLAGS: 00000286 CPU: 0
[   11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6
[   11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0
[   11.161960]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b

Cc: stable@vger.kernel.org # 3.10 (requires 2e81a4ee as a prereq)
Reported-by: NGeorge Spelvin <linux@sciencehorizons.net>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c755e251

ext4: add debug_want_extra_isize mount option · 670e9875

由 Theodore Ts'o 提交于 1月 11, 2017

In order to test the inode extra isize expansion code, it is useful to
be able to easily create file systems that have inodes with extra
isize values smaller than the current desired value.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

670e9875

09 1月, 2017 2 次提交

ext4: do not polute the extents cache while shifting extents · 03e916fa

由 Roman Pen 提交于 1月 08, 2017

Inside ext4_ext_shift_extents() function ext4_find_extent() is called
without EXT4_EX_NOCACHE flag, which should prevent cache population.

This leads to oudated offsets in the extents tree and wrong blocks
afterwards.

Patch fixes the problem providing EXT4_EX_NOCACHE flag for each
ext4_find_extents() call inside ext4_ext_shift_extents function.

Fixes: 331573feSigned-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: stable@vger.kernel.org

03e916fa

ext4: Include forgotten start block on fallocate insert range · 2a9b8cba

由 Roman Pen 提交于 1月 08, 2017

While doing 'insert range' start block should be also shifted right.
The bug can be easily reproduced by the following test:

    ptr = malloc(4096);
    assert(ptr);

    fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600);
    assert(fd >= 0);

    rc = fallocate(fd, 0, 0, 8192);
    assert(rc == 0);
    for (i = 0; i < 2048; i++)
            *((unsigned short *)ptr + i) = 0xbeef;
    rc = pwrite(fd, ptr, 4096, 0);
    assert(rc == 4096);
    rc = pwrite(fd, ptr, 4096, 4096);
    assert(rc == 4096);

    for (block = 2; block < 1000; block++) {
            rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096);
            assert(rc == 0);

            for (i = 0; i < 2048; i++)
                    *((unsigned short *)ptr + i) = block;

            rc = pwrite(fd, ptr, 4096, 4096);
            assert(rc == 4096);
    }

Because start block is not included in the range the hole appears at
the wrong offset (just after the desired offset) and the following
pwrite() overwrites already existent block, keeping hole untouched.

Simple way to verify wrong behaviour is to check zeroed blocks after
the test:

   $ hexdump ./ext4.file | grep '0000 0000'

The root cause of the bug is a wrong range (start, stop], where start
should be inclusive, i.e. [start, stop].

This patch fixes the problem by including start into the range.  But
not to break left shift (range collapse) stop points to the beginning
of the a block, not to the end.

The other not obvious change is an iterator check on validness in a
main loop.  Because iterator is unsigned the following corner case
should be considered with care: insert a block at 0 offset, when stop
variables overflows and never becomes less than start, which is 0.
To handle this special case iterator is set to NULL to indicate that
end of the loop is reached.

Fixes: 331573feSigned-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: stable@vger.kernel.org

2a9b8cba

08 1月, 2017 2 次提交

fscrypt: make fscrypt_operations.key_prefix a string · a5d431ef

由 Eric Biggers 提交于 1月 05, 2017

There was an unnecessary amount of complexity around requesting the
filesystem-specific key prefix.  It was unclear why; perhaps it was
envisioned that different instances of the same filesystem type could
use different key prefixes, or that key prefixes could be binary.
However, neither of those things were implemented or really make sense
at all.  So simplify the code by making key_prefix a const char *.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

a5d431ef

ext4: don't allow encrypted operations without keys · 173b8439

由 Theodore Ts'o 提交于 12月 28, 2016

While we allow deletes without the key, the following should not be
permitted:

# cd /vdc/encrypted-dir-without-key
# ls -l
total 4
-rw-r--r-- 1 root root   0 Dec 27 22:35 6,LKNRJsp209FbXoSvJWzB
-rw-r--r-- 1 root root 286 Dec 27 22:35 uRJ5vJh9gE7vcomYMqTAyD
# mv uRJ5vJh9gE7vcomYMqTAyD  6,LKNRJsp209FbXoSvJWzB
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

173b8439

04 1月, 2017 4 次提交

xfs: fix max_retries _show and _store functions · ff97f239

由 Carlos Maiolino 提交于 1月 03, 2017

max_retries _show and _store functions should test against cfg->max_retries,
not cfg->retry_timeout
Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

ff97f239

xfs: fix crash and data corruption due to removal of busy COW extents · a1b7a4de

由 Christoph Hellwig 提交于 1月 03, 2017

There is a race window between write_cache_pages calling
clear_page_dirty_for_io and XFS calling set_page_writeback, in which
the mapping for an inode is tagged neither as dirty, nor as writeback.

If the COW shrinker hits in exactly that window we'll remove the delayed
COW extents and writepages trying to write it back, which in release
kernels will manifest as corruption of the bmap btree, and in debug
kernels will trip the ASSERT about now calling xfs_bmapi_write with the
COWFORK flag for holes. A complex customer load manages to hit this
window fairly reliably, probably by always having COW writeback in flight
while the cow shrinker runs.

This patch adds another check for having the I_DIRTY_PAGES flag set,
which is still set during this race window. While this fixes the problem
I'm still not overly happy about the way the COW shrinker works as it
still seems a bit fragile.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

a1b7a4de

xfs: use the actual AG length when reserving blocks · 20e73b00

由 Darrick J. Wong 提交于 1月 03, 2017

We need to use the actual AG length when making per-AG reservations,
since we could otherwise end up reserving more blocks out of the last
AG than there are actual blocks.
Complained-about-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

20e73b00

xfs: fix double-cleanup when CUI recovery fails · 7a21272b

由 Darrick J. Wong 提交于 1月 03, 2017

Dan Carpenter reported a double-free of rcur if _defer_finish fails
while we're recovering CUI items.  Fix the error recovery to prevent
this.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

7a21272b

03 1月, 2017 2 次提交

fscrypt: make test_dummy_encryption require a keyring key · 5bbdcbbb

由 Theodore Ts'o 提交于 1月 02, 2017

Currently, the test_dummy_encryption ext4 mount option, which exists
only to test encrypted I/O paths with xfstests, overrides all
per-inode encryption keys with a fixed key.

This change minimizes test_dummy_encryption-specific code path changes
by supplying a fake context for directories which are not encrypted
for use when creating new directories, files, or symlinks.  This
allows us to properly exercise the keyring lookup, derivation, and
context inheritance code paths.

Before mounting a file system using test_dummy_encryption, userspace
must execute the following shell commands:

    mode='\x00\x00\x00\x00'
    raw="$(printf ""\\\\x%02x"" $(seq 0 63))"
    if lscpu | grep "Byte Order" | grep -q Little ; then
        size='\x40\x00\x00\x00'
    else
        size='\x00\x00\x00\x40'
    fi
    key="${mode}${raw}${size}"
    keyctl new_session
    echo -n -e "${key}" | keyctl padd logon fscrypt:4242424242424242 @s
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

5bbdcbbb

clean_bdev_aliases: Prevent cleaning blocks that are not in block range · 6c006a9d

由 Chandan Rajendra 提交于 12月 25, 2016

The first block to be cleaned may start at a non-zero page offset. In
such a scenario clean_bdev_aliases() will end up cleaning blocks that
do not fall in the range of blocks to be cleaned. This commit fixes the
issue by skipping blocks that do not fall in valid block range.
Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NEryu Guan <eguan@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

6c006a9d

02 1月, 2017 1 次提交

fscrypt: factor out bio specific functions · 58ae7468

由 Richard Weinberger 提交于 12月 19, 2016

That way we can get rid of the direct dependency on CONFIG_BLOCK.

Fixes: d475a507 ("ubifs: Add skeleton for fscrypto")
Reported-by: NArnd Bergmann <arnd@arndb.de>
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDavid Gstir <david@sigma-star.at>
Signed-off-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

58ae7468

01 1月, 2017 5 次提交

fscrypt: pass up error codes from ->get_context() · efee590e