- 15 5月, 2015 6 次提交
-
-
由 Theodore Ts'o 提交于
The xfstests test suite assumes that an attempt to collapse range on the range (0, 1) will return EOPNOTSUPP if the file system does not support collapse range. Commit 280227a7: "ext4: move check under lock scope to close a race" broke this, and this caused xfstests to fail when run when testing file systems that did not have the extents feature enabled. Reported-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Eryu Guan 提交于
The following commit introduced a bug when checking for zero length extent 5946d089 ext4: check for overlapping extents in ext4_valid_extent_entries() Zero length extent could pass the check if lblock is zero. Adding the explicit check for zero length back. Signed-off-by: NEryu Guan <guaneryu@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Lukas Czerner 提交于
Currently when journal restart fails, we'll have the h_transaction of the handle set to NULL to indicate that the handle has been effectively aborted. We handle this situation quietly in the jbd2_journal_stop() and just free the handle and exit because everything else has been done before we attempted (and failed) to restart the journal. Unfortunately there are a number of problems with that approach introduced with commit 41a5b913 "jbd2: invalidate handle if jbd2_journal_restart() fails" First of all in ext4 jbd2_journal_stop() will be called through __ext4_journal_stop() where we would try to get a hold of the superblock by dereferencing h_transaction which in this case would lead to NULL pointer dereference and crash. In addition we're going to free the handle regardless of the refcount which is bad as well, because others up the call chain will still reference the handle so we might potentially reference already freed memory. Moreover it's expected that we'll get aborted handle as well as detached handle in some of the journalling function as the error propagates up the stack, so it's unnecessary to call WARN_ON every time we get detached handle. And finally we might leak some memory by forgetting to free reserved handle in jbd2_journal_stop() in the case where handle was detached from the transaction (h_transaction is NULL). Fix the NULL pointer dereference in __ext4_journal_stop() by just calling jbd2_journal_stop() quietly as suggested by Jan Kara. Also fix the potential memory leak in jbd2_journal_stop() and use proper handle refcounting before we attempt to free it to avoid use-after-free issues. And finally remove all WARN_ON(!transaction) from the code so that we do not get random traces when something goes wrong because when journal restart fails we will get to some of those functions. Cc: stable@vger.kernel.org Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 Theodore Ts'o 提交于
The ext4_extent_tree_init() function hasn't been in the ext4 code for a long time ago, except in an unused function prototype in ext4.h Google-Bug-Id: 4530137 Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Google-Bug-Id: 20939131 Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
We had a fencepost error in the lazytime optimization which means that timestamp would get written to the wrong inode. Cc: stable@vger.kernel.org Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 03 5月, 2015 3 次提交
-
-
由 Jan Kara 提交于
The estimate of necessary transaction credits in ext4_flex_group_add() is too pessimistic. It reserves credit for sb, resize inode, and resize inode dindirect block for each group added in a flex group although they are always the same block and thus it is enough to account them only once. Also the number of modified GDT block is overestimated since we fit EXT4_DESC_PER_BLOCK(sb) descriptors in one block. Make the estimation more precise. That reduces number of requested credits enough that we can grow 20 MB filesystem (which has 1 MB journal, 79 reserved GDT blocks, and flex group size 16 by default). Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Reviewed-by: NEric Sandeen <sandeen@redhat.com>
-
由 Davide Italiano 提交于
fallocate() checks that the file is extent-based and returns EOPNOTSUPP in case is not. Other tasks can convert from and to indirect and extent so it's safe to check only after grabbing the inode mutex. Signed-off-by: NDavide Italiano <dccitaliano@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Lukas Czerner 提交于
Currently it is possible to lose whole file system block worth of data when we hit the specific interaction with unwritten and delayed extents in status extent tree. The problem is that when we insert delayed extent into extent status tree the only way to get rid of it is when we write out delayed buffer. However there is a limitation in the extent status tree implementation so that when inserting unwritten extent should there be even a single delayed block the whole unwritten extent would be marked as delayed. At this point, there is no way to get rid of the delayed extents, because there are no delayed buffers to write out. So when a we write into said unwritten extent we will convert it to written, but it still remains delayed. When we try to write into that block later ext4_da_map_blocks() will set the buffer new and delayed and map it to invalid block which causes the rest of the block to be zeroed loosing already written data. For now we can fix this by simply not allowing to set delayed status on written extent in the extent status tree. Also add WARN_ON() to make sure that we notice if this happens in the future. This problem can be easily reproduced by running the following xfs_io. xfs_io -f -c "pwrite -S 0xaa 4096 2048" \ -c "falloc 0 131072" \ -c "pwrite -S 0xbb 65536 2048" \ -c "fsync" /mnt/test/fff echo 3 > /proc/sys/vm/drop_caches xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff This can be theoretically also reproduced by at random by running fsx, but it's not very reliable, though on machines with bigger page size (like ppc) this can be seen more often (especially xfstest generic/127) Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 02 5月, 2015 4 次提交
-
-
由 Chanho Park 提交于
This patch removes duplicated encryption modes which were already in ext4.h. They were duplicated from commit 3edc18d8 and commit f542fb. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Michael Halcrow <mhalcrow@google.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: NChanho Park <chanho61.park@samsung.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Herbert Xu 提交于
This patch adds a tristate EXT4_ENCRYPTION to do the selections for EXT4_FS_ENCRYPTION because selecting from a bool causes all the selected options to be built-in, even if EXT4 itself is a module. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This obscures the length of the filenames, to decrease the amount of information leakage. By default, we pad the filenames to the next 4 byte boundaries. This costs nothing, since the directory entries are aligned to 4 byte boundaries anyway. Filenames can also be padded to 8, 16, or 32 bytes, which will consume more directory space. Change-Id: Ibb7a0fb76d2c48e2061240a709358ff40b14f322 Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Avoid using SHA-1 when calculating the user-visible filename when the encryption key is available, and avoid decrypting lots of filenames when searching for a directory entry in a directory block. Change-Id: If4655f144784978ba0305b597bfa1c8d7bb69e63 Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 25 4月, 2015 1 次提交
-
-
由 Jens Axboe 提交于
do_blockdev_direct_IO() increments and decrements the inode ->i_dio_count for each IO operation. It does this to protect against truncate of a file. Block devices don't need this sort of protection. For a capable multiqueue setup, this atomic int is the only shared state between applications accessing the device for O_DIRECT, and it presents a scaling wall for that. In my testing, as much as 30% of system time is spent incrementing and decrementing this value. A mixed read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with better latencies too. Before: clat percentiles (usec): | 1.00th=[ 33], 5.00th=[ 34], 10.00th=[ 34], 20.00th=[ 34], | 30.00th=[ 34], 40.00th=[ 34], 50.00th=[ 35], 60.00th=[ 35], | 70.00th=[ 35], 80.00th=[ 35], 90.00th=[ 37], 95.00th=[ 80], | 99.00th=[ 98], 99.50th=[ 151], 99.90th=[ 155], 99.95th=[ 155], | 99.99th=[ 165] After: clat percentiles (usec): | 1.00th=[ 95], 5.00th=[ 108], 10.00th=[ 129], 20.00th=[ 149], | 30.00th=[ 155], 40.00th=[ 161], 50.00th=[ 167], 60.00th=[ 171], | 70.00th=[ 177], 80.00th=[ 185], 90.00th=[ 201], 95.00th=[ 270], | 99.00th=[ 390], 99.50th=[ 398], 99.90th=[ 418], 99.95th=[ 422], | 99.99th=[ 438] In other setups, Robert Elliott reported seeing good performance improvements: https://lkml.org/lkml/2015/4/3/557 The more applications accessing the device, the worse it gets. Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells do_blockdev_direct_IO() that it need not worry about incrementing or decrementing the inode i_dio_count for this caller. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Elliott, Robert (Server Storage) <elliott@hp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NJens Axboe <axboe@fb.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 4月, 2015 5 次提交
-
-
由 Theodore Ts'o 提交于
Also add the test dummy encryption mode flag so we can more easily test the encryption patches using xfstests. Signed-off-by: NMichael Halcrow <mhalcrow@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Signed-off-by: NUday Savagaonkar <savagaon@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Boaz Harrosh 提交于
The original dax patchset split the ext2/4_file_operations because of the two NULL splice_read/splice_write in the dax case. In the vfs if splice_read/splice_write are NULL we then call default_splice_read/write. What we do here is make generic_file_splice_read aware of IS_DAX() so the original ext2/4_file_operations can be used as is. For write it appears that iter_file_splice_write is just fine. It uses the regular f_op->write(file,..) or new_sync_write(file, ...). Signed-off-by: NBoaz Harrosh <boaz@plexistor.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Boaz Harrosh 提交于
From: Yigal Korman <yigal@plexistor.com> [v1] Without this patch, c/mtime is not updated correctly when mmap'ed page is first read from and then written to. A new xfstest is submitted for testing this (generic/080) [v2] Jan Kara has pointed out that if we add the sb_start/end_pagefault pair in the new pfn_mkwrite we are then fixing another bug where: A user could start writing to the page while filesystem is frozen. Signed-off-by: NYigal Korman <yigal@plexistor.com> Signed-off-by: NBoaz Harrosh <boaz@plexistor.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Howells 提交于
that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 12 4月, 2015 21 次提交
-
-
由 Michael Halcrow 提交于
Signed-off-by: NUday Savagaonkar <savagaon@google.com> Signed-off-by: NIldar Muslukhov <ildarm@google.com> Signed-off-by: NMichael Halcrow <mhalcrow@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Michael Halcrow 提交于
Modifies htree_dirblock_to_tree, dx_make_map, ext4_match search_dir, and ext4_find_dest_de to support fname crypto. Filename encryption feature is not yet enabled at this patch. Signed-off-by: NUday Savagaonkar <savagaon@google.com> Signed-off-by: NIldar Muslukhov <ildarm@google.com> Signed-off-by: NMichael Halcrow <mhalcrow@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Michael Halcrow 提交于
Modifies dx_show_leaf and dx_probe to support fname encryption. Filename encryption not yet enabled. Signed-off-by: NUday Savagaonkar <savagaon@google.com> Signed-off-by: NIldar Muslukhov <ildarm@google.com> Signed-off-by: NMichael Halcrow <mhalcrow@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Michael Halcrow 提交于
Signed-off-by: NUday Savagaonkar <savagaon@google.com> Signed-off-by: NIldar Muslukhov <ildarm@google.com> Signed-off-by: NMichael Halcrow <mhalcrow@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
For encrypted directories, we need to pass in a separate parameter for the decrypted filename, since the directory entry contains the encrypted filename. Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Michael Halcrow 提交于
Signed-off-by: NUday Savagaonkar <savagaon@google.com> Signed-off-by: NIldar Muslukhov <ildarm@google.com> Signed-off-by: NMichael Halcrow <mhalcrow@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Michael Halcrow 提交于
Signed-off-by: NMichael Halcrow <mhalcrow@google.com> Signed-off-by: NIldar Muslukhov <ildarm@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Michael Halcrow 提交于
Pulls block_write_begin() into fs/ext4/inode.c because it might need to do a low-level read of the existing data, in which case we need to decrypt it. Signed-off-by: NMichael Halcrow <mhalcrow@google.com> Signed-off-by: NIldar Muslukhov <ildarm@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Michael Halcrow 提交于
Signed-off-by: NMichael Halcrow <mhalcrow@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Enforce the following inheritance policy: 1) An unencrypted directory may contain encrypted or unencrypted files or directories. 2) All files or directories in a directory must be protected using the same key as their containing directory. As a result, assuming the following setup: mke2fs -t ext4 -Fq -O encrypt /dev/vdc mount -t ext4 /dev/vdc /vdc mkdir /vdc/a /vdc/b /vdc/c echo foo | e4crypt add_key /vdc/a echo bar | e4crypt add_key /vdc/b for i in a b c ; do cp /etc/motd /vdc/$i/motd-$i ; done Then we will see the following results: cd /vdc mv a b # will fail; /vdc/a and /vdc/b have different keys mv b/motd-b a # will fail, see above ln a/motd-a b # will fail, see above mv c a # will fail; all inodes in an encrypted directory # must be encrypted ln c/motd-c b # will fail, see above mv a/motd-a c # will succeed mv c/motd-a a # will succeed Signed-off-by: NMichael Halcrow <mhalcrow@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Michael Halcrow 提交于
Signed-off-by: NMichael Halcrow <mhalcrow@google.com> Signed-off-by: NIldar Muslukhov <muslukhovi@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Michael Halcrow 提交于
On encrypt, we will re-assign the buffer_heads to point to a bounce page rather than the control_page (which is the original page to write that contains the plaintext). The block I/O occurs against the bounce page. On write completion, we re-assign the buffer_heads to the original plaintext page. On decrypt, we will attach a read completion callback to the bio struct. This read completion will decrypt the read contents in-place prior to setting the page up-to-date. The current encryption mode, AES-256-XTS, lacks cryptographic integrity. AES-256-GCM is in-plan, but we will need to devise a mechanism for handling the integrity data. Signed-off-by: NMichael Halcrow <mhalcrow@google.com> Signed-off-by: NIldar Muslukhov <ildarm@google.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Al Viro 提交于
... avoiding write_iter/fcntl races. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... returning -E... upon error and amount of data left in iter after (possible) truncation upon success. Note, that normal case gives a non-zero (positive) return value, so any tests for != 0 _must_ be updated. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Conflicts: fs/ext4/file.c
-
由 Al Viro 提交于
simpler that way... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
all remaining callers are passing 0; some just obscure that fact. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Omar Sandoval 提交于
Now that no one is using rw, remove it completely. Signed-off-by: NOmar Sandoval <osandov@osandov.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Omar Sandoval 提交于
The rw parameter to direct_IO is redundant with iov_iter->type, and treated slightly differently just about everywhere it's used: some users do rw & WRITE, and others do rw == WRITE where they should be doing a bitwise check. Simplify this with the new iov_iter_rw() helper, which always returns either READ or WRITE. Signed-off-by: NOmar Sandoval <osandov@osandov.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Omar Sandoval 提交于
And use iov_iter_rw() instead. Signed-off-by: NOmar Sandoval <osandov@osandov.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Omar Sandoval 提交于
Most filesystems call through to these at some point, so we'll start here. Signed-off-by: NOmar Sandoval <osandov@osandov.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-