提交 · 6a8098a447206f8d2ce3954271657b6d79911d51 · openeuler / raspberrypi-kernel

15 5月, 2015 6 次提交

ext4: fix an ext3 collapse range regression in xfstests · b9576fc3

由 Theodore Ts'o 提交于 5月 15, 2015

The xfstests test suite assumes that an attempt to collapse range on
the range (0, 1) will return EOPNOTSUPP if the file system does not
support collapse range.  Commit 280227a7: "ext4: move check under
lock scope to close a race" broke this, and this caused xfstests to
fail when run when testing file systems that did not have the extents
feature enabled.
Reported-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b9576fc3

ext4: check for zero length extent explicitly · 2f974865

由 Eryu Guan 提交于 5月 14, 2015

The following commit introduced a bug when checking for zero length extent

5946d089 ext4: check for overlapping extents in ext4_valid_extent_entries()

Zero length extent could pass the check if lblock is zero.

Adding the explicit check for zero length back.
Signed-off-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

2f974865

ext4: fix NULL pointer dereference when journal restart fails · 9d506594

由 Lukas Czerner 提交于 5月 14, 2015

Currently when journal restart fails, we'll have the h_transaction of
the handle set to NULL to indicate that the handle has been effectively
aborted. We handle this situation quietly in the jbd2_journal_stop() and just
free the handle and exit because everything else has been done before we
attempted (and failed) to restart the journal.

Unfortunately there are a number of problems with that approach
introduced with commit

41a5b913 "jbd2: invalidate handle if jbd2_journal_restart()
fails"

First of all in ext4 jbd2_journal_stop() will be called through
__ext4_journal_stop() where we would try to get a hold of the superblock
by dereferencing h_transaction which in this case would lead to NULL
pointer dereference and crash.

In addition we're going to free the handle regardless of the refcount
which is bad as well, because others up the call chain will still
reference the handle so we might potentially reference already freed
memory.

Moreover it's expected that we'll get aborted handle as well as detached
handle in some of the journalling function as the error propagates up
the stack, so it's unnecessary to call WARN_ON every time we get
detached handle.

And finally we might leak some memory by forgetting to free reserved
handle in jbd2_journal_stop() in the case where handle was detached from
the transaction (h_transaction is NULL).

Fix the NULL pointer dereference in __ext4_journal_stop() by just
calling jbd2_journal_stop() quietly as suggested by Jan Kara. Also fix
the potential memory leak in jbd2_journal_stop() and use proper
handle refcounting before we attempt to free it to avoid use-after-free
issues.

And finally remove all WARN_ON(!transaction) from the code so that we do
not get random traces when something goes wrong because when journal
restart fails we will get to some of those functions.

Cc: stable@vger.kernel.org
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

9d506594

ext4: remove unused function prototype from ext4.h · 92c82639

由 Theodore Ts'o 提交于 5月 14, 2015

The ext4_extent_tree_init() function hasn't been in the ext4 code for
a long time ago, except in an unused function prototype in ext4.h

Google-Bug-Id: 4530137
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

92c82639

T
ext4: don't save the error information if the block device is read-only · 1b46617b
由 Theodore Ts'o 提交于 5月 14, 2015
```
Google-Bug-Id: 20939131
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
1b46617b

ext4: fix lazytime optimization · 8f4d8558

由 Theodore Ts'o 提交于 5月 14, 2015

We had a fencepost error in the lazytime optimization which means that
timestamp would get written to the wrong inode.

Cc: stable@vger.kernel.org
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

8f4d8558

03 5月, 2015 3 次提交

ext4: fix growing of tiny filesystems · 2c869b26

由 Jan Kara 提交于 5月 02, 2015

The estimate of necessary transaction credits in ext4_flex_group_add()
is too pessimistic. It reserves credit for sb, resize inode, and resize
inode dindirect block for each group added in a flex group although they
are always the same block and thus it is enough to account them only
once. Also the number of modified GDT block is overestimated since we
fit EXT4_DESC_PER_BLOCK(sb) descriptors in one block.

Make the estimation more precise. That reduces number of requested
credits enough that we can grow 20 MB filesystem (which has 1 MB
journal, 79 reserved GDT blocks, and flex group size 16 by default).
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>

2c869b26

ext4: move check under lock scope to close a race. · 280227a7

由 Davide Italiano 提交于 5月 02, 2015

fallocate() checks that the file is extent-based and returns
EOPNOTSUPP in case is not. Other tasks can convert from and to
indirect and extent so it's safe to check only after grabbing
the inode mutex.
Signed-off-by: NDavide Italiano <dccitaliano@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

280227a7

ext4: fix data corruption caused by unwritten and delayed extents · d2dc317d

由 Lukas Czerner 提交于 5月 02, 2015

Currently it is possible to lose whole file system block worth of data
when we hit the specific interaction with unwritten and delayed extents
in status extent tree.

The problem is that when we insert delayed extent into extent status
tree the only way to get rid of it is when we write out delayed buffer.
However there is a limitation in the extent status tree implementation
so that when inserting unwritten extent should there be even a single
delayed block the whole unwritten extent would be marked as delayed.

At this point, there is no way to get rid of the delayed extents,
because there are no delayed buffers to write out. So when a we write
into said unwritten extent we will convert it to written, but it still
remains delayed.

When we try to write into that block later ext4_da_map_blocks() will set
the buffer new and delayed and map it to invalid block which causes
the rest of the block to be zeroed loosing already written data.

For now we can fix this by simply not allowing to set delayed status on
written extent in the extent status tree. Also add WARN_ON() to make
sure that we notice if this happens in the future.

This problem can be easily reproduced by running the following xfs_io.

xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
          -c "falloc 0 131072" \
          -c "pwrite -S 0xbb 65536 2048" \
          -c "fsync" /mnt/test/fff

echo 3 > /proc/sys/vm/drop_caches
xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff

This can be theoretically also reproduced by at random by running fsx,
but it's not very reliable, though on machines with bigger page size
(like ppc) this can be seen more often (especially xfstest generic/127)
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

d2dc317d

02 5月, 2015 4 次提交

ext4 crypto: remove duplicated encryption mode definitions · 9402bdca

由 Chanho Park 提交于 5月 02, 2015

This patch removes duplicated encryption modes which were already in
ext4.h. They were duplicated from commit 3edc18d8 and commit f542fb.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: NChanho Park <chanho61.park@samsung.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

9402bdca

ext4 crypto: do not select from EXT4_FS_ENCRYPTION · fb63e548

由 Herbert Xu 提交于 5月 02, 2015

This patch adds a tristate EXT4_ENCRYPTION to do the selections
for EXT4_FS_ENCRYPTION because selecting from a bool causes all
the selected options to be built-in, even if EXT4 itself is a
module.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

fb63e548

ext4 crypto: add padding to filenames before encrypting · a44cd7a0

由 Theodore Ts'o 提交于 5月 01, 2015

This obscures the length of the filenames, to decrease the amount of
information leakage. By default, we pad the filenames to the next 4
byte boundaries. This costs nothing, since the directory entries are
aligned to 4 byte boundaries anyway. Filenames can also be padded to
8, 16, or 32 bytes, which will consume more directory space.

Change-Id: Ibb7a0fb76d2c48e2061240a709358ff40b14f322
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

a44cd7a0

ext4 crypto: simplify and speed up filename encryption · 5de0b4d0

由 Theodore Ts'o 提交于 5月 01, 2015

Avoid using SHA-1 when calculating the user-visible filename when the
encryption key is available, and avoid decrypting lots of filenames
when searching for a directory entry in a directory block.

Change-Id: If4655f144784978ba0305b597bfa1c8d7bb69e63
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

5de0b4d0

25 4月, 2015 1 次提交

direct-io: only inc/dec inode->i_dio_count for file systems · fe0f07d0

由 Jens Axboe 提交于 4月 15, 2015

do_blockdev_direct_IO() increments and decrements the inode
->i_dio_count for each IO operation. It does this to protect against
truncate of a file. Block devices don't need this sort of protection.

For a capable multiqueue setup, this atomic int is the only shared
state between applications accessing the device for O_DIRECT, and it
presents a scaling wall for that. In my testing, as much as 30% of
system time is spent incrementing and decrementing this value. A mixed
read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
better latencies too. Before:

clat percentiles (usec):
 |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
 | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
 | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
 | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
 | 99.99th=[  165]

After:

clat percentiles (usec):
 |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
 | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
 | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
 | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
 | 99.99th=[  438]

In other setups, Robert Elliott reported seeing good performance
improvements:

https://lkml.org/lkml/2015/4/3/557

The more applications accessing the device, the worse it gets.

Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
do_blockdev_direct_IO() that it need not worry about incrementing
or decrementing the inode i_dio_count for this caller.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJens Axboe <axboe@fb.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fe0f07d0

16 4月, 2015 5 次提交

ext4 crypto: enable encryption feature flag · 6ddb2447

由 Theodore Ts'o 提交于 4月 16, 2015

Also add the test dummy encryption mode flag so we can more easily
test the encryption patches using xfstests.
Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

6ddb2447

ext4 crypto: add symlink encryption · f348c252

由 Theodore Ts'o 提交于 4月 16, 2015

Signed-off-by: NUday Savagaonkar <savagaon@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

f348c252

dax: unify ext2/4_{dax,}_file_operations · be64f884

由 Boaz Harrosh 提交于 4月 15, 2015

The original dax patchset split the ext2/4_file_operations because of the
two NULL splice_read/splice_write in the dax case.

In the vfs if splice_read/splice_write are NULL we then call
default_splice_read/write.

What we do here is make generic_file_splice_read aware of IS_DAX() so the
original ext2/4_file_operations can be used as is.

For write it appears that iter_file_splice_write is just fine.  It uses
the regular f_op->write(file,..) or new_sync_write(file, ...).
Signed-off-by: NBoaz Harrosh <boaz@plexistor.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be64f884

dax: use pfn_mkwrite to update c/mtime + freeze protection · 0e3b210c

由 Boaz Harrosh 提交于 4月 15, 2015

From: Yigal Korman <yigal@plexistor.com>

[v1]
Without this patch, c/mtime is not updated correctly when mmap'ed page is
first read from and then written to.

A new xfstest is submitted for testing this (generic/080)

[v2]
Jan Kara has pointed out that if we add the
sb_start/end_pagefault pair in the new pfn_mkwrite we
are then fixing another bug where: A user could start
writing to the page while filesystem is frozen.
Signed-off-by: NYigal Korman <yigal@plexistor.com>
Signed-off-by: NBoaz Harrosh <boaz@plexistor.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0e3b210c

VFS: normal filesystems (and lustre): d_inode() annotations · 2b0143b5

由 David Howells 提交于 3月 17, 2015

that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2b0143b5

12 4月, 2015 21 次提交

ext4 crypto: enable filename encryption · 44614711

由 Michael Halcrow 提交于 4月 12, 2015

Signed-off-by: NUday Savagaonkar <savagaon@google.com>
Signed-off-by: NIldar Muslukhov <ildarm@google.com>
Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

44614711

ext4 crypto: filename encryption modifications · 1f3862b5

由 Michael Halcrow 提交于 4月 12, 2015

Modifies htree_dirblock_to_tree, dx_make_map, ext4_match search_dir,
and ext4_find_dest_de to support fname crypto.  Filename encryption
feature is not yet enabled at this patch.
Signed-off-by: NUday Savagaonkar <savagaon@google.com>
Signed-off-by: NIldar Muslukhov <ildarm@google.com>
Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

1f3862b5

ext4 crypto: partial update to namei.c for fname crypto · b3098486

由 Michael Halcrow 提交于 4月 12, 2015

Modifies dx_show_leaf and dx_probe to support fname encryption.
Filename encryption not yet enabled.
Signed-off-by: NUday Savagaonkar <savagaon@google.com>
Signed-off-by: NIldar Muslukhov <ildarm@google.com>
Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b3098486

ext4 crypto: insert encrypted filenames into a leaf directory block · 4bdfc873

由 Michael Halcrow 提交于 4月 12, 2015

Signed-off-by: NUday Savagaonkar <savagaon@google.com>
Signed-off-by: NIldar Muslukhov <ildarm@google.com>
Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4bdfc873

ext4 crypto: teach ext4_htree_store_dirent() to store decrypted filenames · 2f61830a

由 Theodore Ts'o 提交于 4月 12, 2015

For encrypted directories, we need to pass in a separate parameter for
the decrypted filename, since the directory entry contains the
encrypted filename.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2f61830a

ext4 crypto: filename encryption facilities · d5d0e8c7

由 Michael Halcrow 提交于 4月 12, 2015

Signed-off-by: NUday Savagaonkar <savagaon@google.com>
Signed-off-by: NIldar Muslukhov <ildarm@google.com>
Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d5d0e8c7

ext4 crypto: implement the ext4 decryption read path · c9c7429c

由 Michael Halcrow 提交于 4月 12, 2015

Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NIldar Muslukhov <ildarm@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c9c7429c

ext4 crypto: implement the ext4 encryption write path · 2058f83a

由 Michael Halcrow 提交于 4月 12, 2015

Pulls block_write_begin() into fs/ext4/inode.c because it might need
to do a low-level read of the existing data, in which case we need to
decrypt it.
Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NIldar Muslukhov <ildarm@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2058f83a

M
ext4 crypto: inherit encryption policies on inode and directory create · dde680ce
由 Michael Halcrow 提交于 4月 12, 2015
```
Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
dde680ce

ext4 crypto: enforce context consistency · d9cdc903

由 Theodore Ts'o 提交于 4月 12, 2015

Enforce the following inheritance policy:

1) An unencrypted directory may contain encrypted or unencrypted files
or directories.

2) All files or directories in a directory must be protected using the
same key as their containing directory.

As a result, assuming the following setup:

mke2fs -t ext4 -Fq -O encrypt /dev/vdc
mount -t ext4 /dev/vdc /vdc
mkdir /vdc/a /vdc/b /vdc/c
echo foo | e4crypt add_key /vdc/a
echo bar | e4crypt add_key /vdc/b
for i in a b c ; do cp /etc/motd /vdc/$i/motd-$i ; done

Then we will see the following results:

cd /vdc
mv a b			# will fail; /vdc/a and /vdc/b have different keys
mv b/motd-b a		# will fail, see above
ln a/motd-a b		# will fail, see above
mv c a	    		# will fail; all inodes in an encrypted directory
   	  		#	must be encrypted
ln c/motd-c b		# will fail, see above
mv a/motd-a c		# will succeed
mv c/motd-a a		# will succeed
Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

d9cdc903

ext4 crypto: add encryption key management facilities · 88bd6ccd

由 Michael Halcrow 提交于 4月 12, 2015

Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NIldar Muslukhov <muslukhovi@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

88bd6ccd

ext4 crypto: add ext4 encryption facilities · b30ab0e0

由 Michael Halcrow 提交于 4月 12, 2015

On encrypt, we will re-assign the buffer_heads to point to a bounce
page rather than the control_page (which is the original page to write
that contains the plaintext). The block I/O occurs against the bounce
page.  On write completion, we re-assign the buffer_heads to the
original plaintext page.

On decrypt, we will attach a read completion callback to the bio
struct. This read completion will decrypt the read contents in-place
prior to setting the page up-to-date.

The current encryption mode, AES-256-XTS, lacks cryptographic
integrity. AES-256-GCM is in-plan, but we will need to devise a
mechanism for handling the integrity data.
Signed-off-by: NMichael Halcrow <mhalcrow@google.com>
Signed-off-by: NIldar Muslukhov <ildarm@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b30ab0e0

A
mirror O_APPEND and O_DIRECT into iocb->ki_flags · 2ba48ce5
由 Al Viro 提交于 4月 09, 2015
```
... avoiding write_iter/fcntl races.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2ba48ce5

switch generic_write_checks() to iocb and iter · 3309dd04

由 Al Viro 提交于 4月 09, 2015

... returning -E... upon error and amount of data left in iter after
(possible) truncation upon success.  Note, that normal case gives
a non-zero (positive) return value, so any tests for != 0 _must_ be
updated.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

Conflicts:
	fs/ext4/file.c

3309dd04

A
ext4_file_write_iter: move generic_write_checks() up · e768d7ff
由 Al Viro 提交于 4月 07, 2015
```
simpler that way...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e768d7ff

generic_write_checks(): drop isblk argument · 0fa6b005

由 Al Viro 提交于 4月 04, 2015

all remaining callers are passing 0; some just obscure that fact.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0fa6b005

A
lift generic_write_checks() into callers of __generic_file_write_iter() · 5f380c7f
由 Al Viro 提交于 4月 07, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5f380c7f

direct_IO: remove rw from a_ops->direct_IO() · 22c6186e

由 Omar Sandoval 提交于 3月 16, 2015

Now that no one is using rw, remove it completely.
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

22c6186e

direct_IO: use iov_iter_rw() instead of rw everywhere · 6f673763

由 Omar Sandoval 提交于 3月 16, 2015

The rw parameter to direct_IO is redundant with iov_iter->type, and
treated slightly differently just about everywhere it's used: some users
do rw & WRITE, and others do rw == WRITE where they should be doing a
bitwise check. Simplify this with the new iov_iter_rw() helper, which
always returns either READ or WRITE.
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6f673763

Remove rw from dax_{do_,}io() · a95cd631

由 Omar Sandoval 提交于 3月 16, 2015

And use iov_iter_rw() instead.
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a95cd631

Remove rw from {,__,do_}blockdev_direct_IO() · 17f8c842

由 Omar Sandoval 提交于 3月 16, 2015

Most filesystems call through to these at some point, so we'll start
here.
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

17f8c842