提交 · 994903663c516e3274728a03653411c6b3c81bab · openanolis / cloud-kernel

21 2月, 2019 2 次提交

ext4: fix reserved cluster accounting at delayed write time · 99490366

由 Eric Whitney 提交于 10月 01, 2018

commit 0b02f4c0d6d9e2c611dfbdd4317193e9dca740e6 upstream.

The code in ext4_da_map_blocks sometimes reserves space for more
delayed allocated clusters than it should, resulting in premature
ENOSPC, exceeded quota, and inaccurate free space reporting.

Fix this by checking for written and unwritten blocks shared in the
same cluster with the newly delayed allocated block.  A cluster
reservation should not be made for a cluster for which physical space
has already been allocated.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

99490366

ext4: generalize extents status tree search functions · 667f9459

由 Eric Whitney 提交于 10月 01, 2018

commit ad431025aecda85d3ebef5e4a3aca5c1c681d0c7 upstream.

Ext4 contains a few functions that are used to search for delayed
extents or blocks in the extents status tree.  Rather than duplicate
code to add new functions to search for extents with different status
values, such as written or a combination of delayed and unwritten,
generalize the existing code to search for caller-specified extents
status values.  Also, move this code into extents_status.c where it
is better associated with the data structures it operates upon, and
where it can be more readily used to implement new extents status tree
functions that might want a broader scope for i_es_lock.

Three missing static specifiers in RFC version of patch reported and
fixed by Fengguang Wu <fengguang.wu@intel.com>.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

667f9459

17 1月, 2019 2 次提交

ext4: fix special inode number checks in __ext4_iget() · 5dc41af3

由 Theodore Ts'o 提交于 12月 31, 2018

commit 191ce17876c9367819c4b0a25b503c0f6d9054d8 upstream.

The check for special (reserved) inode number checks in __ext4_iget()
was broken by commit 8a363970d1dc: ("ext4: avoid declaring fs
inconsistent due to invalid file handles").  This was caused by a
botched reversal of the sense of the flag now known as
EXT4_IGET_SPECIAL (when it was previously named EXT4_IGET_NORMAL).
Fix the logic appropriately.

Fixes: 8a363970d1dc ("ext4: avoid declaring fs inconsistent...")
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5dc41af3

ext4: make sure enough credits are reserved for dioread_nolock writes · 7c2ea25e

由 Theodore Ts'o 提交于 12月 24, 2018

commit 812c0cab2c0dfad977605dbadf9148490ca5d93f upstream.

There are enough credits reserved for most dioread_nolock writes;
however, if the extent tree is sufficiently deep, and/or quota is
enabled, the code was not allowing for all eventualities when
reserving journal credits for the unwritten extent conversion.

This problem can be seen using xfstests ext4/034:

   WARNING: CPU: 1 PID: 257 at fs/ext4/ext4_jbd2.c:271 __ext4_handle_dirty_metadata+0x10c/0x180
   Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work
   RIP: 0010:__ext4_handle_dirty_metadata+0x10c/0x180
   	...
   EXT4-fs: ext4_free_blocks:4938: aborting transaction: error 28 in __ext4_handle_dirty_metadata
   EXT4: jbd2_journal_dirty_metadata failed: handle type 11 started at line 4921, credits 4/0, errcode -28
   EXT4-fs error (device dm-1) in ext4_free_blocks:4950: error 28
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7c2ea25e

10 1月, 2019 2 次提交

ext4: check for shutdown and r/o file system in ext4_write_inode() · 0cb4f655

由 Theodore Ts'o 提交于 12月 19, 2018

commit 18f2c4fcebf2582f96cbd5f2238f4f354a0e4847 upstream.

If the file system has been shut down or is read-only, then
ext4_write_inode() needs to bail out early.

Also use jbd2_complete_transaction() instead of ext4_force_commit() so
we only force a commit if it is needed.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

0cb4f655

ext4: avoid declaring fs inconsistent due to invalid file handles · 26366388

由 Theodore Ts'o 提交于 12月 19, 2018

commit 8a363970d1dc38c4ec4ad575c862f776f468d057 upstream.

If we receive a file handle, either from NFS or open_by_handle_at(2),
and it points at an inode which has not been initialized, and the file
system has metadata checksums enabled, we shouldn't try to get the
inode, discover the checksum is invalid, and then declare the file
system as being inconsistent.

This can be reproduced by creating a test file system via "mke2fs -t
ext4 -O metadata_csum /tmp/foo.img 8M", mounting it, cd'ing into that
directory, and then running the following program.

#define _GNU_SOURCE
#include <fcntl.h>

struct handle {
	struct file_handle fh;
	unsigned char fid[MAX_HANDLE_SZ];
};

int main(int argc, char **argv)
{
	struct handle h = {{8, 1 }, { 12, }};

	open_by_handle_at(AT_FDCWD, &h.fh, O_RDONLY);
	return 0;
}

Google-Bug-Id: 120690101
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

26366388

21 11月, 2018 1 次提交

ext4: avoid buffer leak on shutdown in ext4_mark_iloc_dirty() · d65b7d33

由 Vasily Averin 提交于 11月 06, 2018

commit a6758309a005060b8297a538a457c88699cb2520 upstream.

ext4_mark_iloc_dirty() callers expect that it releases iloc->bh
even if it returns an error.

Fixes: 0db1ff22 ("ext4: add shutdown bit and check for it")
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org # 4.11
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d65b7d33

16 9月, 2018 2 次提交

ext4, dax: set ext4_dax_aops for dax files · cce6c9f7

由 Toshi Kani 提交于 9月 15, 2018

Sync syscall to DAX file needs to flush processor cache, but it
currently does not flush to existing DAX files.  This is because
'ext4_da_aops' is set to address_space_operations of existing DAX
files, instead of 'ext4_dax_aops', since S_DAX flag is set after
ext4_set_aops() in the open path.

  New file
  --------
  lookup_open
    ext4_create
      __ext4_new_inode
        ext4_set_inode_flags   // Set S_DAX flag
      ext4_set_aops            // Set aops to ext4_dax_aops

  Existing file
  -------------
  lookup_open
    ext4_lookup
      ext4_iget
        ext4_set_aops          // Set aops to ext4_da_aops
        ext4_set_inode_flags   // Set S_DAX flag

Change ext4_iget() to initialize i_flags before ext4_set_aops().

Fixes: 5f0663bb ("ext4, dax: introduce ext4_dax_aops")
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Suggested-by: NJan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org

cce6c9f7

ext4, dax: add ext4_bmap to ext4_dax_aops · 94dbb631

由 Toshi Kani 提交于 9月 15, 2018

Ext4 mount path calls .bmap to the journal inode. This currently
works for the DAX mount case because ext4_iget() always set
'ext4_da_aops' to any regular files.

In preparation to fix ext4_iget() to set 'ext4_dax_aops' for ext4
DAX files, add ext4_bmap() to 'ext4_dax_aops', since bmap works for
DAX inodes.

Fixes: 5f0663bb ("ext4, dax: introduce ext4_dax_aops")
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Suggested-by: NJan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org

94dbb631

12 9月, 2018 1 次提交

ext4: close race between direct IO and ext4_break_layouts() · b1f38217

由 Ross Zwisler 提交于 9月 11, 2018

If the refcount of a page is lowered between the time that it is returned
by dax_busy_page() and when the refcount is again checked in
ext4_break_layouts() => ___wait_var_event(), the waiting function
ext4_wait_dax_page() will never be called.  This means that
ext4_break_layouts() will still have 'retry' set to false, so we'll stop
looping and never check the refcount of other pages in this inode.

Instead, always continue looping as long as dax_layout_busy_page() gives us
a page which it found with an elevated refcount.
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

b1f38217

02 9月, 2018 1 次提交

ext4: avoid arithemetic overflow that can trigger a BUG · bcd8e91f

由 Theodore Ts'o 提交于 9月 01, 2018

A maliciously crafted file system can cause an overflow when the
results of a 64-bit calculation is stored into a 32-bit length
parameter.

https://bugzilla.kernel.org/show_bug.cgi?id=200623Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reported-by: NWen Xu <wen.xu@gatech.edu>
Cc: stable@vger.kernel.org

bcd8e91f

18 8月, 2018 1 次提交

ext4: readpages() should submit IO as read-ahead · ac22b46a

由 Jens Axboe 提交于 8月 17, 2018

a_ops->readpages() is only ever used for read-ahead.  Ensure that we
pass this information down to the block layer.

Link: http://lkml.kernel.org/r/20180621010725.17813-5-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <clm@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ac22b46a

02 8月, 2018 1 次提交

ext4: improve code readability in ext4_iget() · bc716523

由 Liu Song 提交于 8月 02, 2018

Merge the duplicated complex conditions to improve code readability.
Signed-off-by: NLiu Song <liu.song11@zte.com.cn>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJiang Biao <jiang.biao2@zte.com.cn>

bc716523

30 7月, 2018 2 次提交

ext4: handle layout changes to pinned DAX mappings · 430657b6

由 Ross Zwisler 提交于 7月 29, 2018

Follow the lead of xfs_break_dax_layouts() and add synchronization between
operations in ext4 which remove blocks from an inode (hole punch, truncate
down, etc.) and pages which are pinned due to DAX DMA operations.
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NLukas Czerner <lczerner@redhat.com>

430657b6

ext4: use ktime_get_real_seconds for i_dtime · 5ffff834

由 Arnd Bergmann 提交于 7月 29, 2018

We only care about the low 32-bit for i_dtime as explained in commit
b5f51573 ("ext4: avoid Y2038 overflow in recently_deleted()"), so
the use of get_seconds() is correct here, but that function is getting
removed in the process of the y2038 fixes, so let's use the modern
ktime_get_real_seconds() here.
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

5ffff834

10 7月, 2018 1 次提交

ext4: fix inline data updates with checksums enabled · 362eca70

由 Theodore Ts'o 提交于 7月 10, 2018

The inline data code was updating the raw inode directly; this is
problematic since if metadata checksums are enabled,
ext4_mark_inode_dirty() must be called to update the inode's checksum.
In addition, the jbd2 layer requires that get_write_access() be called
before the metadata buffer is modified. Fix both of these problems.

https://bugzilla.kernel.org/show_bug.cgi?id=200443Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

362eca70

17 6月, 2018 1 次提交

ext4: add more inode number paranoia checks · c37e9e01

由 Theodore Ts'o 提交于 6月 17, 2018

If there is a directory entry pointing to a system inode (such as a
journal inode), complain and declare the file system to be corrupted.

Also, if the superblock's first inode number field is too small,
refuse to mount the file system.

This addresses CVE-2018-10882.

https://bugzilla.kernel.org/show_bug.cgi?id=200069Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org

c37e9e01

16 6月, 2018 1 次提交
- T
  ext4: include the illegal physical block in the bad map ext4_error msg · bdbd6ce0
  由 Theodore Ts'o 提交于 6月 15, 2018
```
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
```
  bdbd6ce0
23 5月, 2018 1 次提交

ext4: bubble errors from ext4_find_inline_data_nolock() up to ext4_iget() · eb9b5f01

由 Theodore Ts'o 提交于 5月 22, 2018

If ext4_find_inline_data_nolock() returns an error it needs to get
reflected up to ext4_iget().  In order to fix this,
ext4_iget_extra_inode() needs to return an error (and not return
void).

This is related to "ext4: do not allow external inodes for inline
data" (which fixes CVE-2018-11412) in that in the errors=continue
case, it would be useful to for userspace to receive an error
indicating that file system is corrupted.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Cc: stable@kernel.org

eb9b5f01

14 5月, 2018 2 次提交

ext4: update mtime in ext4_punch_hole even if no blocks are released · eee597ac

由 Lukas Czerner 提交于 5月 13, 2018

Currently in ext4_punch_hole we're going to skip the mtime update if
there are no actual blocks to release. However we've actually modified
the file by zeroing the partial block so the mtime should be updated.

Moreover the sync and datasync handling is skipped as well, which is
also wrong. Fix it.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reported-by: NJoe Habermann <joe.habermann@quantum.com>
Cc: <stable@vger.kernel.org>

eee597ac

ext4: add verifier check for symlink with append/immutable flags · 6390d33b

由 Luis R. Rodriguez 提交于 5月 13, 2018

The Linux VFS does not allow a way to set append/immuttable
attributes to symlinks, this is just not possible. If this is
detected inform the user as the filesystem must be corrupted.
Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

6390d33b

10 5月, 2018 1 次提交

ext4: use raw i_version value for ea_inode · e254d1af

由 Eryu Guan 提交于 5月 10, 2018

Currently, creating large xattr (e.g. 2k) in ea_inode would cause
ea_inode refcount corruption, e.g.

  Pass 4: Checking reference counts
  Extended attribute inode 13 ref count is 0, should be 1. Fix? no

This is because that we save the lower 32bit of refcount in
inode->i_version and store it in raw_inode->i_disk_version on disk.
But since commit ee73f9a5 ("ext4: convert to new i_version
API"), we load/store modified i_disk_version from/to disk instead of
raw value, which causes on-disk ea_inode refcount corruption.

Fix it by loading/storing raw i_version/i_disk_version, because it's
a self-managed value in this case.

Fixes: ee73f9a5 ("ext4: convert to new i_version API")
Cc: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

e254d1af

31 3月, 2018 1 次提交

ext4, dax: introduce ext4_dax_aops · 5f0663bb

由 Dan Williams 提交于 12月 21, 2017

In preparation for the dax implementation to start associating dax pages
to inodes via page->mapping, we need to provide a 'struct
address_space_operations' instance for dax. Otherwise, direct-I/O
triggers incorrect page cache assumptions and warnings.

Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5f0663bb

30 3月, 2018 1 次提交

ext4: fail ext4_iget for root directory if unallocated · 8e4b5eae

由 Theodore Ts'o 提交于 3月 29, 2018

If the root directory has an i_links_count of zero, then when the file
system is mounted, then when ext4_fill_super() notices the problem and
tries to call iput() the root directory in the error return path,
ext4_evict_inode() will try to free the inode on disk, before all of
the file system structures are set up, and this will result in an OOPS
caused by a NULL pointer dereference.

This issue has been assigned CVE-2018-1092.

https://bugzilla.kernel.org/show_bug.cgi?id=199179
https://bugzilla.redhat.com/show_bug.cgi?id=1560777Reported-by: NWen Xu <wen.xu@gatech.edu>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

8e4b5eae

28 3月, 2018 1 次提交

fs: move I_DIRTY_INODE to fs.h · 0e11f644

由 Christoph Hellwig 提交于 2月 21, 2018

And use it in a few more places rather than opencoding the values.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0e11f644

26 3月, 2018 1 次提交

ext4: use generic_writepages instead of __writepage/write_cache_pages · 043d20d1

由 Goldwyn Rodrigues 提交于 3月 26, 2018

Code cleanup. Instead of writing an internal static function, use the
available generic_writepages().
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

043d20d1

22 3月, 2018 4 次提交

ext4: remove EXT4_STATE_DIOREAD_LOCK flag · 1d39834f

由 Nikolay Borisov 提交于 3月 22, 2018

Commit 16c54688 ("ext4: Allow parallel DIO reads") reworked the way
locking happens around parallel dio reads. This resulted in obviating
the need for EXT4_STATE_DIOREAD_LOCK flag and accompanying logic.
Currently this amounts to dead code so let's remove it. No functional
changes
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

1d39834f

ext4: fix offset overflow on 32-bit archs in ext4_iomap_begin() · fe23cb65

由 Jiri Slaby 提交于 3月 22, 2018

ext4_iomap_begin() has a bug where offset returned in the iomap
structure will be truncated to unsigned long size. On 64-bit
architectures this is fine but on 32-bit architectures obviously not.
Not many places actually use the offset stored in the iomap structure
but one of visible failures is in SEEK_HOLE / SEEK_DATA implementation.
If we create a file like:

dd if=/dev/urandom of=file bs=1k seek=8m count=1

then

lseek64("file", 0x100000000ULL, SEEK_DATA)

wrongly returns 0x100000000 on unfixed kernel while it should return
0x200000000. Avoid the overflow by proper type cast.

Fixes: 545052e9 ("ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA")
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # v4.15

fe23cb65

ext4: update i_disksize if direct write past ondisk size · 45d8ec4d

由 Eryu Guan 提交于 3月 22, 2018

Currently in ext4 direct write path, we update i_disksize only when
new eof is greater than i_size, and don't update it even when new
eof is greater than i_disksize but less than i_size. This doesn't
work well with delalloc buffer write, which updates i_size and
i_disksize only when delalloc blocks are resolved (at writeback
time), the i_disksize from direct write can be lost if a previous
buffer write succeeded at write time but failed at writeback time,
then results in corrupted ondisk inode size.

Consider this case, first buffer write 4k data to a new file at
offset 16k with delayed allocation, then direct write 4k data to the
same file at offset 4k before delalloc blocks are resolved, which
doesn't update i_disksize because it writes within i_size(20k), but
the extent tree metadata has been committed in journal. Then
writeback of the delalloc blocks fails (due to device error etc.),
and i_size/i_disksize from buffer write can't be written to disk
(still zero). A subsequent umount/mount cycle recovers journal and
writes extent tree metadata from direct write to disk, but with
i_disksize being zero.

Fix it by updating i_disksize too in direct write path when new eof
is greater than i_disksize but less than i_size, so i_disksize is
always consistent with direct write.

This fixes occasional i_size corruption in fstests generic/475.
Signed-off-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

45d8ec4d

ext4: protect i_disksize update by i_data_sem in direct write path · 73fdad00

由 Eryu Guan 提交于 3月 22, 2018

i_disksize update should be protected by i_data_sem, by either taking
the lock explicitly or by using ext4_update_i_disksize() helper. But the
i_disksize updates in ext4_direct_IO_write() are not protected at all,
which may be racing with i_disksize updates in writeback path in
delalloc buffer write path.

This is found by code inspection, and I didn't hit any i_disksize
corruption due to this bug. Thanks to Jan Kara for catching this bug and
suggesting the fix!
Reported-by: NJan Kara <jack@suse.cz>
Suggested-by: NJan Kara <jack@suse.cz>
Signed-off-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

73fdad00

29 1月, 2018 2 次提交

ext4: convert to new i_version API · ee73f9a5

由 Jeff Layton 提交于 1月 09, 2018

Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NTheodore Ts'o <tytso@mit.edu>

ee73f9a5

fs: new API for handling inode->i_version · ae5e165d

由 Jeff Layton 提交于 1月 29, 2018

Add a documentation blob that explains what the i_version field is, how
it is expected to work, and how it is currently implemented by various
filesystems.

We already have inode_inc_iversion. Add several other functions for
manipulating and accessing the i_version counter. For now, the
implementation is trivial and basically works the way that all of the
open-coded i_version accesses work today.

Future patches will convert existing users of i_version to use the new
API, and then convert the backend implementation to do things more
efficiently.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>

ae5e165d

10 1月, 2018 1 次提交

ext4: fix a race in the ext4 shutdown path · abbc3f93

由 Harshad Shirwadkar 提交于 1月 10, 2018

This patch fixes a race between the shutdown path and bio completion
handling. In the ext4 direct io path with async io, after submitting a
bio to the block layer, if journal starting fails,
ext4_direct_IO_write() would bail out pretending that the IO
failed. The caller would have had no way of knowing whether or not the
IO was successfully submitted. So instead, we return -EIOCBQUEUED in
this case. Now, the caller knows that the IO was submitted.  The bio
completion handler takes care of the error.

Tested: Ran the shutdown xfstest test 461 in loop for over 2 hours across
4 machines resulting in over 400 runs. Verified that the race didn't
occur. Usually the race was seen in about 20-30 iterations.
Signed-off-by: NHarshad Shirwadkar <harshads@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

abbc3f93

04 12月, 2017 1 次提交

ext4: support fast symlinks from ext3 file systems · fc82228a

由 Andi Kleen 提交于 12月 03, 2017

407cd7fb (ext4: change fast symlink test to not rely on i_blocks)
broke ~10 years old ext3 file systems created by 2.6.17. Any ELF
executable fails because the /lib/ld-linux.so.2 fast symlink
cannot be read anymore.

The patch assumed fast symlinks were created in a specific way,
but that's not true on these really old file systems.

The new behavior is apparently needed only with the large EA inode
feature.

Revert to the old behavior if the large EA inode feature is not set.

This makes my old VM boot again.

Fixes: 407cd7fb (ext4: change fast symlink test to not rely on i_blocks)
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Cc: stable@vger.kernel.org

fc82228a

28 11月, 2017 1 次提交

Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6

由 Linus Torvalds 提交于 11月 27, 2017

This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

    # places to look in; re security/*: it generally should *not* be
    # touched (that stuff parses mount(2) arguments directly), but
    # there are two places where we really deal with superblock flags.
    FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
            include/linux/fs.h include/uapi/linux/bfs_fs.h \
            security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
    # the list of MS_... constants
    SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
          DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
          POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
          I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
          ACTIVE NOUSER"

    SED_PROG=
    for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

    # we want files that contain at least one of MS_...,
    # with fs/namespace.c and fs/pnode.c excluded.
    L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

    for f in $L; do sed -i $f $SED_PROG; done
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1751e8a6

16 11月, 2017 3 次提交

mm, pagevec: remove cold parameter for pagevecs · 86679820

由 Mel Gorman 提交于 11月 15, 2017

Every pagevec_init user claims the pages being released are hot even in
cases where it is unlikely the pages are hot.  As no one cares about the
hotness of pages being released to the allocator, just ditch the
parameter.

No performance impact is expected as the overhead is marginal.  The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.

Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

86679820

mm: remove nr_pages argument from pagevec_lookup_{,range}_tag() · 67fd707f

由 Jan Kara 提交于 11月 15, 2017

All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages. Just drop the argument.

Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

67fd707f

ext4: use pagevec_lookup_range_tag() · dc7f3e86

由 Jan Kara 提交于 11月 15, 2017

We want only pages from given range in ext4_writepages().  Use
pagevec_lookup_range_tag() instead of pagevec_lookup_tag() and remove
unnecessary code.

Link: http://lkml.kernel.org/r/20171009151359.31984-5-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dc7f3e86

14 11月, 2017 1 次提交

fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core · aaa422c4

由 Dan Williams 提交于 11月 13, 2017

While reviewing whether MAP_SYNC should strengthen its current guarantee
of syncing writes from the initiating process to also include
third-party readers observing dirty metadata, Dave pointed out that the
check of IOMAP_WRITE is misplaced.

The policy of what to with IOMAP_F_DIRTY should be separated from the
generic filesystem mechanism of reporting dirty metadata. Move this
policy to the fs-dax core to simplify the per-filesystem iomap handlers,
and further centralize code that implements the MAP_SYNC policy. This
otherwise should not change behavior, it just makes it easier to change
behavior in the future.
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

aaa422c4

03 11月, 2017 1 次提交

ext4: Support for synchronous DAX faults · b8a6176c

由 Jan Kara 提交于 11月 01, 2017

We return IOMAP_F_DIRTY flag from ext4_iomap_begin() when asked to
prepare blocks for writing and the inode has some uncommitted metadata
changes. In the fault handler ext4_dax_fault() we then detect this case
(through VM_FAULT_NEEDDSYNC return value) and call helper
dax_finish_sync_fault() to flush metadata changes and insert page table
entry. Note that this will also dirty corresponding radix tree entry
which is what we want - fsync(2) will still provide data integrity
guarantees for applications not using userspace flushing. And
applications using userspace flushing can avoid calling fsync(2) and
thus avoid the performance overhead.
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b8a6176c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功