提交 · fde6af4729b005dc9dc936b0ed9f1b27b5b2d0f4 · openanolis / cloud-kernel

06 8月, 2017 14 次提交

ext4: fix copy paste error in ext4_swap_extents() · 4e562013

由 Maninder Singh 提交于 8月 06, 2017

This bug was found by a static code checker tool for copy paste
problems.
Signed-off-by: NManinder Singh <maninder1.s@samsung.com>
Signed-off-by: NVaneet Narang <v.narang@samsung.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4e562013

ext4: fix overflow caused by missing cast in ext4_resize_fs() · aec51758

由 Jerry Lee 提交于 8月 06, 2017

On a 32-bit platform, the value of n_blcoks_count may be wrong during
the file system is resized to size larger than 2^32 blocks.  This may
caused the superblock being corrupted with zero blocks count.

Fixes: 1c6bd717Signed-off-by: NJerry Lee <jerrylee@qnap.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org # 3.7+

aec51758

ext4, project: expand inode extra size if possible · c03b45b8

由 Miao Xie 提交于 8月 06, 2017

When upgrading from old format, try to set project id
to old file first time, it will return EOVERFLOW, but if
that file is dirtied(touch etc), changing project id will
be allowed, this might be confusing for users, we could
try to expand @i_extra_isize here too.
Reported-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NMiao Xie <miaoxie@huawei.com>
Signed-off-by: NWang Shilong <wshilong@ddn.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c03b45b8

ext4: cleanup ext4_expand_extra_isize_ea() · b640b2c5

由 Miao Xie 提交于 8月 06, 2017

Clean up some goto statement, make ext4_expand_extra_isize_ea() clearer.
Signed-off-by: NMiao Xie <miaoxie@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NWang Shilong <wshilong@ddn.com>

b640b2c5

ext4: restructure ext4_expand_extra_isize · cf0a5e81

由 Miao Xie 提交于 8月 06, 2017

Current ext4_expand_extra_isize just tries to expand extra isize, if
someone is holding xattr lock or some check fails, it will give up.
So rename its name to ext4_try_to_expand_extra_isize.

Besides that, we clean up unnecessary check and move some relative checks
into it.
Signed-off-by: NMiao Xie <miaoxie@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NWang Shilong <wshilong@ddn.com>

cf0a5e81

ext4: fix forgetten xattr lock protection in ext4_expand_extra_isize · 3b10fdc6

由 Miao Xie 提交于 8月 06, 2017

We should avoid the contention between the i_extra_isize update and
the inline data insertion, so move the xattr trylock in front of
i_extra_isize update.
Signed-off-by: NMiao Xie <miaoxie@huawei.com>
Reviewed-by: NWang Shilong <wshilong@ddn.com>

3b10fdc6

ext4: make xattr inode reads faster · 9699d4f9

由 Tahsin Erdogan 提交于 8月 06, 2017

ext4_xattr_inode_read() currently reads each block sequentially while
waiting for io operation to complete before moving on to the next
block. This prevents request merging in block layer.

Add a ext4_bread_batch() function that starts reads for all blocks
then optionally waits for them to complete. A similar logic is used
in ext4_find_entry(), so update that code to use the new function.
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

9699d4f9

ext4: inplace xattr block update fails to deduplicate blocks · ec000220

由 Tahsin Erdogan 提交于 8月 05, 2017

When an xattr block has a single reference, block is updated inplace
and it is reinserted to the cache. Later, a cache lookup is performed
to see whether an existing block has the same contents. This cache
lookup will most of the time return the just inserted entry so
deduplication is not achieved.

Running the following test script will produce two xattr blocks which
can be observed in "File ACL: " line of debugfs output:

  mke2fs -b 1024 -I 128 -F -O extent /dev/sdb 1G
  mount /dev/sdb /mnt/sdb

  touch /mnt/sdb/{x,y}

  setfattr -n user.1 -v aaa /mnt/sdb/x
  setfattr -n user.2 -v bbb /mnt/sdb/x

  setfattr -n user.1 -v aaa /mnt/sdb/y
  setfattr -n user.2 -v bbb /mnt/sdb/y

  debugfs -R 'stat x' /dev/sdb | cat
  debugfs -R 'stat y' /dev/sdb | cat

This patch defers the reinsertion to the cache so that we can locate
other blocks with the same contents.
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>

ec000220

ext4: remove unused mode parameter · 77a2e84d

由 Tahsin Erdogan 提交于 8月 05, 2017

ext4_alloc_file_blocks() does not use its mode parameter. Remove it.
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

77a2e84d

ext4: fix warning about stack corruption · 2df2c340

由 Arnd Bergmann 提交于 8月 05, 2017

After commit 62d1034f53e3 ("fortify: use WARN instead of BUG for now"),
we get a warning about possible stack overflow from a memcpy that
was not strictly bounded to the size of the local variable:

inlined from 'ext4_mb_seq_groups_show' at fs/ext4/mballoc.c:2322:2:
include/linux/string.h:309:9: error: '__builtin_memcpy': writing between 161 and 1116 bytes into a region of size 160 overflows the destination [-Werror=stringop-overflow=]

We actually had a bug here that would have been found by the warning,
but it was already fixed last year in commit 30a9d7af ("ext4: fix
stack memory corruption with 64k block size").

This replaces the fixed-length structure on the stack with a variable-length
structure, using the correct upper bound that tells the compiler that
everything is really fine here. I also change the loop count to check
for the same upper bound for consistency, but the existing code is
already correct here.

Note that while clang won't allow certain kinds of variable-length arrays
in structures, this particular instance is fine, as the array is at the
end of the structure, and the size is strictly bounded.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2df2c340

ext4: fix dir_nlink behaviour · c7414892

由 Andreas Dilger 提交于 8月 05, 2017

The dir_nlink feature has been enabled by default for new ext4
filesystems since e2fsprogs-1.41 in 2008, and was automatically
enabled by the kernel for older ext4 filesystems since the
dir_nlink feature was added with ext4 in kernel 2.6.28+ when
the subdirectory count exceeded EXT4_LINK_MAX-1.

Automatically adding the file system features such as dir_nlink is
generally frowned upon, since it could cause the file system to not be
mountable on older kernel, thus preventing the administrator from
rolling back to an older kernel if necessary.

In this case, the administrator might also want to disable the feature
because glibc's fts_read() function does not correctly optimize
directory traversal for directories that use st_nlinks field of 1 to
indicate that the number of links in the directory are not tracked by
the file system, and could fail to traverse the full directory
hierarchy. Fortunately, in the past ten years very few users have
complained about incomplete file system traversal by glibc's
fts_read().

This commit also changes ext4_inc_count() to allow i_nlinks to reach
the full EXT4_LINK_MAX links on the parent directory (including "."
and "..") before changing i_links_count to be 1.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196405Signed-off-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c7414892

ext4: silence array overflow warning · 381cebfe

由 Dan Carpenter 提交于 8月 05, 2017

I get a static checker warning:

    fs/ext4/ext4.h:3091 ext4_set_de_type()
    error: buffer overflow 'ext4_type_by_mode' 15 <= 15

It seems unlikely that we would hit this read overflow in real life, but
it's also simple enough to make the array 16 bytes instead of 15.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

381cebfe

ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize · fcf5ea10

由 Jan Kara 提交于 8月 05, 2017

ext4_find_unwritten_pgoff() does not properly handle a situation when
starting index is in the middle of a page and blocksize < pagesize. The
following command shows the bug on filesystem with 1k blocksize:

  xfs_io -f -c "falloc 0 4k" \
            -c "pwrite 1k 1k" \
            -c "pwrite 3k 1k" \
            -c "seek -a -r 0" foo

In this example, neither lseek(fd, 1024, SEEK_HOLE) nor lseek(fd, 2048,
SEEK_DATA) will return the correct result.

Fix the problem by neglecting buffers in a page before starting offset.
Reported-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
CC: stable@vger.kernel.org # 3.8+

fcf5ea10

ext4: release discard bio after sending discard commands · e4510577

由 Daeho Jeong 提交于 8月 05, 2017

We've changed the discard command handling into parallel manner.
But, in this change, I forgot decreasing the usage count of the bio
which was used to send discard request. I'm sorry about that.

Fixes: a0154344 ("ext4: send parallel discards on commit completions")
Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

e4510577

31 7月, 2017 6 次提交

ext4: convert swap_inode_data() over to use swap() on most of the fields · 9c5d58fb

由 Jeff Layton 提交于 7月 31, 2017

For some odd reason, it forces a byte-by-byte copy of each field. A
plain old swap() on most of these fields would be more efficient. We
do need to retain the memswap of i_data however as that field is an array.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>

9c5d58fb

ext4: error should be cleared if ea_inode isn't added to the cache · 191eac33

由 Emoly Liu 提交于 7月 31, 2017

For Lustre, if ea_inode fails in hash validation but passes parent
inode and generation checks, it won't be added to the cache as well
as the error "-EFSCORRUPTED" should be cleared, otherwise it will
cause "Structure needs cleaning" when running getfattr command.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9723

Cc: stable@vger.kernel.org
Fixes: dec214d0Signed-off-by: NEmoly Liu <emoly.liu@intel.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reviewed-by: tahsin@google.com

191eac33

ext4: Don't clear SGID when inheriting ACLs · a3bb2d55

由 Jan Kara 提交于 7月 30, 2017

When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of
__ext4_set_acl() into ext4_set_acl(). That way the function will not be
called when inheriting ACLs which is what we want as it prevents SGID
bit clearing and the mode has been properly set by posix_acl_create()
anyway.

Fixes: 07393101
CC: stable@vger.kernel.org
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>

a3bb2d55

ext4: preserve i_mode if __ext4_set_acl() fails · 397e4341

由 Ernesto A. Fernández 提交于 7月 30, 2017

When changing a file's acl mask, __ext4_set_acl() will first set the group
bits of i_mode to the value of the mask, and only then set the actual
extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the file
had no acl attribute to begin with, the system will from now on assume
that the mask permission bits are actual group permission bits, potentially
granting access to the wrong users.

Prevent this by only changing the inode mode after the acl has been set.
Signed-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

397e4341

ext4: remove unused metadata accounting variables · a627b0a7

由 Eric Whitney 提交于 7月 30, 2017

Two variables in ext4_inode_info, i_reserved_meta_blocks and
i_allocated_meta_blocks, are unused.  Removing them saves a little
memory per in-memory inode and cleans up clutter in several tracepoints.
Adjust tracepoint output from ext4_alloc_da_blocks() for consistency
and fix a typo and whitespace near these changes.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

a627b0a7

ext4: correct comment references to ext4_ext_direct_IO() · 1e21196c

由 Eric Whitney 提交于 7月 30, 2017

Commit 914f82a3 "ext4: refactor direct IO code" deleted
ext4_ext_direct_IO(), but references to that function remain in
comments.  Update them to refer to ext4_direct_IO_write().
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reviewed-by: NJan Kara <jack@suse.cz>

1e21196c

07 7月, 2017 1 次提交

ext4: fix spelling mistake: "prellocated" -> "preallocated" · ff950156

由 Colin Ian King 提交于 7月 06, 2017

Trivial fix to spelling mistake in mb_debug debug message
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ff950156

06 7月, 2017 3 次提交

ext4: use errseq_t based error handling for reporting data writeback errors · 6acec592

由 Jeff Layton 提交于 7月 06, 2017

Add a call to filemap_report_wb_err at the end of ext4_sync_file. This
will ensure that we check and advance the errseq_t in the file, which
allows us to track and report errors on all open fds when they occur.
Signed-off-by: NJeff Layton <jlayton@redhat.com>

6acec592

ext4: fix __ext4_new_inode() journal credits calculation · af65207c

由 Tahsin Erdogan 提交于 7月 06, 2017

ea_inode feature allows creating extended attributes that are up to
64k in size. Update __ext4_new_inode() to pick increased credit limits.

To avoid overallocating too many journal credits, update
__ext4_xattr_set_credits() to make a distinction between xattr create
vs update. This helps __ext4_new_inode() because all attributes are
known to be new, so we can save credits that are normally needed to
delete old values.

Also, have fscrypt specify its maximum context size so that we don't
end up allocating credits for 64k size.
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

af65207c

ext4: skip ext4_init_security() and encryption on ea_inodes · ad47f953

由 Tahsin Erdogan 提交于 7月 06, 2017

Extended attribute inodes are internal to ext4. Adding encryption/security
related attributes on them would mean dealing with nested calls into ea code.
Since they have no direct exposure to user mode, just avoid creating ea
entries for them.
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ad47f953

04 7月, 2017 1 次提交

ext4: change fast symlink test to not rely on i_blocks · 407cd7fb

由 Tahsin Erdogan 提交于 7月 04, 2017

ext4_inode_info->i_data is the storage area for 4 types of data:

  a) Extents data
  b) Inline data
  c) Block map
  d) Fast symlink data (symlink length < 60)

Extents data case is positively identified by EXT4_INODE_EXTENTS flag.
Inline data case is also obvious because of EXT4_INODE_INLINE_DATA
flag.

Distinguishing c) and d) however requires additional logic. This
currently relies on i_blocks count. After subtracting external xattr
block from i_blocks, if it is greater than 0 then we know that some
data blocks exist, so there must be a block map.

This logic got broken after ea_inode feature was added. That feature
charges the data blocks of external xattr inodes to the referencing
inode and so adds them to the i_blocks. To fix this, we could subtract
ea_inode blocks by iterating through all xattr entries and then check
whether remaining i_blocks count is zero. Besides being complicated,
this won't change the fact that the current way of distinguishing
between c) and d) is fragile.

The alternative solution is to test whether i_size is less than 60 to
determine fast symlink case. ext4_symlink() uses the same test to decide
whether to store the symlink in i_data. There is one caveat to address
before this can work though.

If an inode's i_nlink is zero during eviction, its i_size is set to
zero and its data is truncated. If system crashes before inode is removed
from the orphan list, next boot orphan cleanup may find the inode with
zero i_size. So, a symlink that had its data stored in a block may now
appear to be a fast symlink. The solution used in this patch is to treat
i_size = 0 as a non-fast symlink case. A zero sized symlink is not legal
so the only time this can happen is the mentioned scenario. This is also
logically correct because a i_size = 0 symlink has no data stored in
i_data.
Suggested-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>

407cd7fb

28 6月, 2017 1 次提交

ext4: add support for passing in write hints for buffered writes · 0127251c

由 Jens Axboe 提交于 6月 27, 2017

Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0127251c

24 6月, 2017 3 次提交

fscrypt: make ->dummy_context() return bool · c250b7dd

由 Eric Biggers 提交于 6月 22, 2017

This makes it consistent with ->is_encrypted(), ->empty_dir(), and
fscrypt_dummy_context_enabled().
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c250b7dd

ext4: require key for truncate(2) of encrypted file · 63136858

由 Eric Biggers 提交于 6月 23, 2017

Currently, filesystems allow truncate(2) on an encrypted file without
the encryption key. However, it's impossible to correctly handle the
case where the size being truncated to is not a multiple of the
filesystem block size, because that would require decrypting the final
block, zeroing the part beyond i_size, then encrypting the block.

As other modifications to encrypted file contents are prohibited without
the key, just prohibit truncate(2) as well, making it fail with ENOKEY.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

63136858

ext4: don't bother checking for encryption key in ->mmap() · 66e0aaad

由 Eric Biggers 提交于 6月 23, 2017

Since only an open file can be mmap'ed, and we only allow open()ing an
encrypted file when its key is available, there is no need to check for
the key again before permitting each mmap().
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

66e0aaad

23 6月, 2017 7 次提交

ext4: check return value of kstrtoull correctly in reserved_clusters_store · 1ea1516f

由 Chao Yu 提交于 6月 23, 2017

kstrtoull returns 0 on success, however, in reserved_clusters_store we
will return -EINVAL if kstrtoull returns 0, it makes us fail to update
reserved_clusters value through sysfs.

Fixes: 76d33bca
Cc: stable@vger.kernel.org # 4.4
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NMiao Xie <miaoxie@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

1ea1516f

ext4: fix off-by-one fsmap error on 1k block filesystems · 4a495624

由 Darrick J. Wong 提交于 6月 23, 2017

For 1k-block filesystems, the filesystem starts at block 1, not block 0.
This fact is recorded in s_first_data_block, so use that to bump up the
start_fsb before we start querying the filesystem for its space map.
Without this, ext4/026 fails on 1k block ext4 because various functions
(notably ext4_get_group_no_and_offset) don't know what to do with an
fsblock that is "before" the start of the filesystem and return garbage
results (blockgroup 2^32-1, etc.) that confuse fsmap.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4a495624

ext4: return EFSBADCRC if a bad checksum error is found in ext4_find_entry() · bdddf342

由 Theodore Ts'o 提交于 6月 23, 2017

Previously a bad directory block with a bad checksum is skipped; we
should be returning EFSBADCRC (aka EBADMSG).
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

bdddf342

ext4: return EIO on read error in ext4_find_entry · 6febe6f2

由 Khazhismel Kumykov 提交于 6月 23, 2017

Previously, a read error would be ignored and we would eventually return
NULL from ext4_find_entry, which signals "no such file or directory". We
should be returning EIO.
Signed-off-by: NKhazhismel Kumykov <khazhy@google.com>

6febe6f2

ext4: forbid encrypting root directory · 9ce0151a

由 Eric Biggers 提交于 6月 23, 2017

Currently it's possible to encrypt all files and directories on an ext4
filesystem by deleting everything, including lost+found, then setting an
encryption policy on the root directory. However, this is incompatible
with e2fsck because e2fsck expects to find, create, and/or write to
lost+found and does not have access to any encryption keys. Especially
problematic is that if e2fsck can't find lost+found, it will create it
without regard for whether the root directory is encrypted. This is
wrong for obvious reasons, and it causes a later run of e2fsck to
consider the lost+found directory entry to be corrupted.

Encrypting the root directory may also be of limited use because it is
the "all-or-nothing" use case, for which dm-crypt can be used instead.
(By design, encryption policies are inherited and cannot be overridden;
so the root directory having an encryption policy implies that all files
and directories on the filesystem have that same encryption policy.)

In any case, encrypting the root directory is broken currently and must
not be allowed; so start returning an error if userspace requests it.
For now only do this in ext4, because f2fs and ubifs do not appear to
have the lost+found requirement. We could move it into
fscrypt_ioctl_set_policy() later if desired, though.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>

9ce0151a

ext4: send parallel discards on commit completions · a0154344

由 Daeho Jeong 提交于 6月 22, 2017

Now, when we mount ext4 filesystem with '-o discard' option, we have to
issue all the discard commands for the blocks to be deallocated and
wait for the completion of the commands on the commit complete phase.
Because this procedure might involve a lot of sequential combinations of
issuing discard commands and waiting for that, the delay of this
procedure might be too much long, even to 17.0s in our test,
and it results in long commit delay and fsync() performance degradation.

To reduce this kind of delay, instead of adding callback for each
extent and handling all of them in a sequential manner on commit phase,
we instead add a separate list of extents to free to the superblock and
then process this list at once after transaction commits so that
we can issue all the discard commands in a parallel manner like XFS
filesystem.

Finally, we could enhance the discard command handling performance.
The result was such that 17.0s delay of a single commit in the worst
case has been enhanced to 4.8s.
Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Tested-by: NHobin Woo <hobin.woo@samsung.com>
Tested-by: NKitae Lee <kitae87.lee@samsung.com>
Reviewed-by: NJan Kara <jack@suse.cz>

a0154344

ext4: avoid unnecessary stalls in ext4_evict_inode() · 3abb1a0f

由 Jan Kara 提交于 6月 22, 2017

These days inode reclaim calls evict_inode() only when it has no pages
in the mapping.  In that case it is not necessary to wait for transaction
commit in ext4_evict_inode() as there can be no pages waiting to be
committed.  So avoid unnecessary transaction waiting in that case.

We still have to keep the check for the case where ext4_evict_inode()
gets called from other paths (e.g. umount) where inode still can have
some page cache pages.
Reported-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

3abb1a0f

22 6月, 2017 4 次提交

ext4: add nombcache mount option · cdb7ee4c

由 Tahsin Erdogan 提交于 6月 22, 2017

The main purpose of mb cache is to achieve deduplication in
extended attributes. In use cases where opportunity for deduplication
is unlikely, it only adds overhead.

Add a mount option to explicitly turn off mb cache.
Suggested-by: NAndreas Dilger <adilger@dilger.ca>
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

cdb7ee4c

ext4: strong binding of xattr inode references · b9fc761e

由 Tahsin Erdogan 提交于 6月 22, 2017

To verify that a xattr entry is not pointing to the wrong xattr inode,
we currently check that the target inode has EXT4_EA_INODE_FL flag set and
also the entry size matches the target inode size.

For stronger validation, also incorporate crc32c hash of the value into
the e_hash field. This is done regardless of whether the entry lives in
the inode body or external attribute block.
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

b9fc761e

ext4: eliminate xattr entry e_hash recalculation for removes · daf83281

由 Tahsin Erdogan 提交于 6月 22, 2017

When an extended attribute block is modified, ext4_xattr_hash_entry()
recalculates e_hash for the entry that is pointed by s->here. This  is
unnecessary if the modification is to remove an entry.

Currently, if the removed entry is the last one and there are other
entries remaining, hash calculation targets the just erased entry which
has been filled with zeroes and effectively does nothing.  If the removed
entry is not the last one and there are more entries, this time it will
recalculate hash on the next entry which is totally unnecessary.

Fix these by moving the decision on when to recalculate hash to
ext4_xattr_set_entry().
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

daf83281

ext4: reserve space for xattr entries/names · 9c6e7853

由 Tahsin Erdogan 提交于 6月 22, 2017

New ea_inode feature allows putting large xattr values into external
inodes.  struct ext4_xattr_entry and the attribute name however have to
remain in the inode extra space or external attribute block.  Once that
space is exhausted, no further entries can be added.  Some of that space
could also be used by values that fit in there at the time of addition.

So, a single xattr entry whose value barely fits in the external block
could prevent further entries being added.

To mitigate the problem, this patch introduces a notion of reserved
space in the external attribute block that cannot be used by value data.
This reserve is enforced when ea_inode feature is enabled.  The amount
of reserve is arbitrarily chosen to be min(block_size/8, 1024).  The
table below shows how much space is reserved for each block size and the
guaranteed mininum number of entries that can be placed in the external
attribute block.

block size     reserved bytes  entries (name length = 16)
 1k            128              3
 2k            256              7
 4k            512             15
 8k            1024            31
16k            1024            31
32k            1024            31
64k            1024            31
Signed-off-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

9c6e7853

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功