提交 · 3fd164629d25b04f291a79a013dcc7ce1a301269 · openeuler / Kernel

23 2月, 2016 4 次提交

ext4: shortcut setting of xattr to the same value · 3fd16462

由 Jan Kara 提交于 2月 22, 2016

When someone tried to set xattr to the same value (i.e., not changing
anything) we did all the work of removing original xattr, possibly
breaking references to shared xattr block, inserting new xattr, and
merging xattr blocks again. Since this is not so rare operation and it
is relatively cheap for us to detect this case, check for this and
shortcut xattr setting in that case.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

3fd16462

ext4: kill ext4_mballoc_ready · 2335d05f

由 Andreas Gruenbacher 提交于 2月 22, 2016

This variable, introduced in commit 9c191f70, is unnecessary: it is set
once the module has been initialized correctly, and ext4_fill_super
cannot run unless the module has been initialized correctly.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2335d05f

mbcache2: rename to mbcache · 7a2508e1

由 Jan Kara 提交于 2月 22, 2016

Since old mbcache code is gone, let's rename new code to mbcache since
number 2 is now meaningless. This is just a mechanical replacement.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

7a2508e1

ext4: convert to mbcache2 · 82939d79

由 Jan Kara 提交于 2月 22, 2016

The conversion is generally straightforward. The only tricky part is
that xattr block corresponding to found mbcache entry can get freed
before we get buffer lock for that block. So we have to check whether
the entry is still valid after getting buffer lock.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

82939d79

22 2月, 2016 2 次提交

ext4: iterate over buffer heads correctly in move_extent_per_page() · 87f9a031

由 Eryu Guan 提交于 2月 21, 2016

In commit bcff2488 ("ext4: don't read blocks from disk after extents
being swapped") bh is not updated correctly in the for loop and wrong
data has been written to disk. generic/324 catches this on sub-page
block size ext4.

Fixes: bcff2488 ("ext4: don't read blocks from disk after extentsbeing swapped")
Signed-off-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

87f9a031

ext4: make sure to revoke all the freeable blocks in ext4_free_blocks · f96c450d

由 Daeho Jeong 提交于 2月 21, 2016

Now, ext4_free_blocks() doesn't revoke data blocks of per-file data
journalled inode and it can cause file data inconsistency problems.
Even though data blocks of per-file data journalled inode are already
forgotten by jbd2_journal_invalidatepage() in advance of invoking
ext4_free_blocks(), we still need to revoke the data blocks here.
Moreover some of the metadata blocks, which are not found by
sb_find_get_block(), are still needed to be revoked, but this is also
missing here.
Signed-off-by: NDaeho Jeong <daeho.jeong@samsung.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>

f96c450d

19 2月, 2016 2 次提交

ext4: fix crashes in dioread_nolock mode · 74dae427

由 Jan Kara 提交于 2月 19, 2016

Competing overwrite DIO in dioread_nolock mode will just overwrite
pointer to io_end in the inode. This may result in data corruption or
extent conversion happening from IO completion interrupt because we
don't properly set buffer_defer_completion() when unlocked DIO races
with locked DIO to unwritten extent.

Since unlocked DIO doesn't need io_end for anything, just avoid
allocating it and corrupting pointer from inode for locked DIO.
A cleaner fix would be to avoid these games with io_end pointer from the
inode but that requires more intrusive changes so we leave that for
later.

Cc: stable@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

74dae427

ext4: fix bh->b_state corruption · ed8ad838

由 Jan Kara 提交于 2月 19, 2016

ext4 can update bh->b_state non-atomically in _ext4_get_block() and
ext4_da_get_block_prep(). Usually this is fine since bh is just a
temporary storage for mapping information on stack but in some cases it
can be fully living bh attached to a page. In such case non-atomic
update of bh->b_state can race with an atomic update which then gets
lost. Usually when we are mapping bh and thus updating bh->b_state
non-atomically, nobody else touches the bh and so things work out fine
but there is one case to especially worry about: ext4_finish_bio() uses
BH_Uptodate_Lock on the first bh in the page to synchronize handling of
PageWriteback state. So when blocksize < pagesize, we can be atomically
modifying bh->b_state of a buffer that actually isn't under IO and thus
can race e.g. with delalloc trying to map that buffer. The result is
that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.

Fix the problem by always updating bh->b_state bits atomically.

CC: stable@vger.kernel.org
Reported-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ed8ad838

16 2月, 2016 1 次提交

ext4: fix memleak in ext4_readdir() · c906f38e

由 Kirill Tkhai 提交于 2月 16, 2016

When ext4_bread() fails, fname_crypto_str remains
allocated after return. Fix that.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
CC: Dmitry Monakhov <dmonakhov@virtuozzo.com>

c906f38e

12 2月, 2016 6 次提交

ext4: remove unused parameter "newblock" in convert_initialized_extent() · 56263b4c

由 Eryu Guan 提交于 2月 12, 2016

The "newblock" parameter is not used in convert_initialized_extent(),
remove it.
Signed-off-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

56263b4c

ext4: don't read blocks from disk after extents being swapped · bcff2488

由 Eryu Guan 提交于 2月 12, 2016

I notice ext4/307 fails occasionally on ppc64 host, reporting md5
checksum mismatch after moving data from original file to donor file.

The reason is that move_extent_per_page() calls __block_write_begin()
and block_commit_write() to write saved data from original inode blocks
to donor inode blocks, but __block_write_begin() not only maps buffer
heads but also reads block content from disk if the size is not block
size aligned.  At this time the physical block number in mapped buffer
head is pointing to the donor file not the original file, and that
results in reading wrong data to page, which get written to disk in
following block_commit_write call.

This also can be reproduced by the following script on 1k block size ext4
on x86_64 host:

    mnt=/mnt/ext4
    donorfile=$mnt/donor
    testfile=$mnt/testfile
    e4compact=~/xfstests/src/e4compact

    rm -f $donorfile $testfile

    # reserve space for donor file, written by 0xaa and sync to disk to
    # avoid EBUSY on EXT4_IOC_MOVE_EXT
    xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile

    # create test file written by 0xbb
    xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile

    # compute initial md5sum
    md5sum $testfile | tee md5sum.txt
    # drop cache, force e4compact to read data from disk
    echo 3 > /proc/sys/vm/drop_caches

    # test defrag
    echo "$testfile" | $e4compact -i -v -f $donorfile
    # check md5sum
    md5sum -c md5sum.txt

Fix it by creating & mapping buffer heads only but not reading blocks
from disk, because all the data in page is guaranteed to be up-to-date
in mext_page_mkuptodate().

Cc: stable@vger.kernel.org
Signed-off-by: NEryu Guan <guaneryu@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

bcff2488

ext4: fix potential integer overflow · 46901760

由 Insu Yun 提交于 2月 12, 2016

Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data),
integer overflow could be happened.
Therefore, need to fix integer overflow sanitization.

Cc: stable@vger.kernel.org
Signed-off-by: NInsu Yun <wuninsu@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

46901760

ext4: add a line break for proc mb_groups display · 802cf1f9

由 Huaitong Han 提交于 2月 12, 2016

This patch adds a line break for proc mb_groups display.
Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>

802cf1f9

ext4: ioctl: fix erroneous return value · fdde368e

由 Anton Protopopov 提交于 2月 11, 2016

The ext4_ioctl_setflags() function which is used in the ioctls
EXT4_IOC_SETFLAGS and EXT4_IOC_FSSETXATTR may return the positive value
EPERM instead of -EPERM in case of error. This bug was introduced by a
recent commit 9b7365fc.

The following program can be used to illustrate the wrong behavior:

    #include <sys/types.h>
    #include <sys/ioctl.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <err.h>

    #define FS_IOC_GETFLAGS _IOR('f', 1, long)
    #define FS_IOC_SETFLAGS _IOW('f', 2, long)
    #define FS_IMMUTABLE_FL 0x00000010

    int main(void)
    {
        int fd;
        long flags;

        fd = open("file", O_RDWR|O_CREAT, 0600);
        if (fd < 0)
            err(1, "open");

        if (ioctl(fd, FS_IOC_GETFLAGS, &flags) < 0)
            err(1, "ioctl: FS_IOC_GETFLAGS");

        flags |= FS_IMMUTABLE_FL;

        if (ioctl(fd, FS_IOC_SETFLAGS, &flags) < 0)
            err(1, "ioctl: FS_IOC_SETFLAGS");

        warnx("ioctl returned no error");

        return 0;
    }

Running it gives the following result:

    $ strace -e ioctl ./test
    ioctl(3, FS_IOC_GETFLAGS, 0x7ffdbd8bfd38) = 0
    ioctl(3, FS_IOC_SETFLAGS, 0x7ffdbd8bfd38) = 1
    test: ioctl returned no error
    +++ exited with 0 +++

Running the program on a kernel with the bug fixed gives the proper result:

    $ strace -e ioctl ./test
    ioctl(3, FS_IOC_GETFLAGS, 0x7ffdd2768258) = 0
    ioctl(3, FS_IOC_SETFLAGS, 0x7ffdd2768258) = -1 EPERM (Operation not permitted)
    test: ioctl: FS_IOC_SETFLAGS: Operation not permitted
    +++ exited with 1 +++
Signed-off-by: NAnton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

fdde368e

ext4: fix scheduling in atomic on group checksum failure · 05145bd7

由 Jan Kara 提交于 2月 11, 2016

When block group checksum is wrong, we call ext4_error() while holding
group spinlock from ext4_init_block_bitmap() or
ext4_init_inode_bitmap() which results in scheduling while in atomic.
Fix the issue by calling ext4_error() later after dropping the spinlock.

CC: stable@vger.kernel.org
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>

05145bd7

08 2月, 2016 2 次提交

ext4 crypto: move context consistency check to ext4_file_open() · ff978b09

由 Theodore Ts'o 提交于 2月 08, 2016

In the case where the per-file key for the directory is cached, but
root does not have access to the key needed to derive the per-file key
for the files in the directory, we allow the lookup to succeed, so
that lstat(2) and unlink(2) can suceed.  However, if a program tries
to open the file, it will get an ENOKEY error.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ff978b09

ext4 crypto: revalidate dentry after adding or removing the key · 28b4c263

由 Theodore Ts'o 提交于 2月 07, 2016

Add a validation check for dentries for encrypted directory to make
sure we're not caching stale data after a key has been added or removed.

Also check to make sure that status of the encryption key is updated
when readdir(2) is executed.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

28b4c263

23 1月, 2016 2 次提交

ext4: call dax_pfn_mkwrite() for DAX fsync/msync · d5be7a03

由 Ross Zwisler 提交于 1月 22, 2016

To properly support the new DAX fsync/msync infrastructure filesystems
need to call dax_pfn_mkwrite() so that DAX can track when user pages are
dirtied.
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d5be7a03

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

15 1月, 2016 1 次提交

kmemcg: account certain kmem allocations to memcg · 5d097056

由 Vladimir Davydov 提交于 1月 14, 2016

Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg.  For the list, see below:

 - threadinfo
 - task_struct
 - task_delay_info
 - pid
 - cred
 - mm_struct
 - vm_area_struct and vm_region (nommu)
 - anon_vma and anon_vma_chain
 - signal_struct
 - sighand_struct
 - fs_struct
 - files_struct
 - fdtable and fdtable->full_fds_bits
 - dentry and external_name
 - inode for all filesystems. This is the most tedious part, because
   most filesystems overwrite the alloc_inode method.

The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds.  Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5d097056

09 1月, 2016 4 次提交

ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support · 9b7365fc

由 Li Xi 提交于 1月 08, 2016

This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface
support for ext4. The interface is kept consistent with
XFS_IOC_FSGETXATTR/XFS_IOC_FSGETXATTR.
Signed-off-by: NLi Xi <lixi@ddn.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reviewed-by: NJan Kara <jack@suse.cz>

9b7365fc

ext4: add project quota support · 689c958c

由 Li Xi 提交于 1月 08, 2016

This patch adds mount options for enabling/disabling project quota
accounting and enforcement. A new specific inode is also used for
project quota accounting.

[ Includes fix from Dan Carpenter to crrect error checking from dqget(). ]
Signed-off-by: NLi Xi <lixi@ddn.com>
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reviewed-by: NJan Kara <jack@suse.cz>

689c958c

ext4: adds project ID support · 040cb378

由 Li Xi 提交于 1月 08, 2016

Signed-off-by: NLi Xi <lixi@ddn.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Reviewed-by: NJan Kara <jack@suse.cz>

040cb378

ext4 crypto: simplify interfaces to directory entry insert functions · 56a04915

由 Theodore Ts'o 提交于 1月 08, 2016

A number of functions include ext4_add_dx_entry, make_indexed_dir,
etc. are being passed a dentry even though the only thing they use is
the containing parent. We can shrink the code size slightly by making
this replacement. This will also be useful in cases where we don't
have a dentry as the argument to the directory entry insert functions.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

56a04915

07 1月, 2016 1 次提交

fs: use block_device name vsprintf helper · a1c6f057

由 Dmitry Monakhov 提交于 4月 13, 2015

Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a1c6f057

31 12月, 2015 1 次提交
- A
  switch ->get_link() to delayed_call, kill ->put_link() · fceef393
  由 Al Viro 提交于 12月 29, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  fceef393
14 12月, 2015 1 次提交

xattr handlers: Simplify list operation · 764a5c6b

由 Andreas Gruenbacher 提交于 12月 02, 2015

Change the list operation to only return whether or not an attribute
should be listed.  Copying the attribute names into the buffer is moved
to the callers.

Since the result only depends on the dentry and not on the attribute
name, we do not pass the attribute name to list operations.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

764a5c6b

10 12月, 2015 1 次提交
- T
  ext4 crypto: add missing locking for keyring_key access · db7730e3
  由 Theodore Ts'o 提交于 12月 10, 2015
```
Cc: stable@kernel.org
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
  db7730e3
09 12月, 2015 2 次提交

replace ->follow_link() with new method that could stay in RCU mode · 6b255391

由 Al Viro 提交于 11月 17, 2015

new method: ->get_link(); replacement of ->follow_link().  The differences
are:
	* inode and dentry are passed separately
	* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
	* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.

It's a flagday change - the old method is gone, all in-tree instances
converted.  Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode.  That'll change
in the next commits.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6b255391

don't put symlink bodies in pagecache into highmem · 21fc61c7

由 Al Viro 提交于 11月 17, 2015

kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.

new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases.  page_follow_link_light()
instrumented to yell about anything missed.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

21fc61c7

08 12月, 2015 9 次提交

ext4: use pre-zeroed blocks for DAX page faults · ba5843f5

由 Jan Kara 提交于 12月 07, 2015

Make DAX fault path use pre-zeroed blocks to avoid races with extent
conversion and zeroing when two page faults to the same block happen.
Signed-off-by: NJan Kara <jack@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ba5843f5

ext4: implement allocation of pre-zeroed blocks · c86d8db3

由 Jan Kara 提交于 12月 07, 2015

DAX page fault path needs to get blocks that are pre-zeroed to avoid
races when two concurrent page faults happen in the same block of a
file. Implement support for this in ext4_map_blocks().
Signed-off-by: NJan Kara <jack@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

c86d8db3

ext4: provide ext4_issue_zeroout() · 53085fac

由 Jan Kara 提交于 12月 07, 2015

Create new function ext4_issue_zeroout() to zeroout contiguous (both
logically and physically) part of inode data. We will need to issue
zeroout when extent structure is not readily available and this function
will allow us to do it without making up fake extent structures.
Signed-off-by: NJan Kara <jack@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

53085fac

ext4: get rid of EXT4_GET_BLOCKS_NO_LOCK flag · 2dcba478

由 Jan Kara 提交于 12月 07, 2015

When dioread_nolock mode is enabled, we grab i_data_sem in
ext4_ext_direct_IO() and therefore we need to instruct _ext4_get_block()
not to grab i_data_sem again using EXT4_GET_BLOCKS_NO_LOCK. However
holding i_data_sem over overwrite direct IO isn't needed these days. We
have exclusion against truncate / hole punching because we increase
i_dio_count under i_mutex in ext4_ext_direct_IO() so once
ext4_file_write_iter() verifies blocks are allocated & written, they are
guaranteed to stay so during the whole direct IO even after we drop
i_mutex.

So we can just remove this locking abuse and the no longer necessary
EXT4_GET_BLOCKS_NO_LOCK flag.
Signed-off-by: NJan Kara <jack@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

2dcba478

ext4: document lock ordering · e74031fd

由 Jan Kara 提交于 12月 07, 2015

We have enough locks that it's probably worth documenting the lock
ordering rules we have in ext4.
Signed-off-by: NJan Kara <jack@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

e74031fd

ext4: fix races of writeback with punch hole and zero range · 01127848

由 Jan Kara 提交于 12月 07, 2015

When doing delayed allocation, update of on-disk inode size is postponed
until IO submission time. However hole punch or zero range fallocate
calls can end up discarding the tail page cache page and thus on-disk
inode size would never be properly updated.

Make sure the on-disk inode size is updated before truncating page
cache.
Signed-off-by: NJan Kara <jack@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

01127848

ext4: fix races between buffered IO and collapse / insert range · 32ebffd3

由 Jan Kara 提交于 12月 07, 2015

Current code implementing FALLOC_FL_COLLAPSE_RANGE and
FALLOC_FL_INSERT_RANGE is prone to races with buffered writes and page
faults. If buffered write or write via mmap manages to squeeze between
filemap_write_and_wait_range() and truncate_pagecache() in the fallocate
implementations, the written data is simply discarded by
truncate_pagecache() although it should have been shifted.

Fix the problem by moving filemap_write_and_wait_range() call inside
i_mutex and i_mmap_sem. That way we are protected against races with
both buffered writes and page faults.
Signed-off-by: NJan Kara <jack@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

32ebffd3

ext4: move unlocked dio protection from ext4_alloc_file_blocks() · 17048e8a

由 Jan Kara 提交于 12月 07, 2015

Currently ext4_alloc_file_blocks() was handling protection against
unlocked DIO. However we now need to sometimes call it under i_mmap_sem
and sometimes not and DIO protection ranks above it (although strictly
speaking this cannot currently create any deadlocks). Also
ext4_zero_range() was actually getting & releasing unlocked DIO
protection twice in some cases. Luckily it didn't introduce any real bug
but it was a land mine waiting to be stepped on. So move DIO protection
out from ext4_alloc_file_blocks() into the two callsites.
Signed-off-by: NJan Kara <jack@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

17048e8a

ext4: fix races between page faults and hole punching · ea3d7209

由 Jan Kara 提交于 12月 07, 2015

Currently, page faults and hole punching are completely unsynchronized.
This can result in page fault faulting in a page into a range that we
are punching after truncate_pagecache_range() has been called and thus
we can end up with a page mapped to disk blocks that will be shortly
freed. Filesystem corruption will shortly follow. Note that the same
race is avoided for truncate by checking page fault offset against
i_size but there isn't similar mechanism available for punching holes.

Fix the problem by creating new rw semaphore i_mmap_sem in inode and
grab it for writing over truncate, hole punching, and other functions
removing blocks from extent tree and for read over page faults. We
cannot easily use i_data_sem for this since that ranks below transaction
start and we need something ranking above it so that it can be held over
the whole truncate / hole punching operation. Also remove various
workarounds we had in the code to reduce race window when page fault
could have created pages with stale mapping information.
Signed-off-by: NJan Kara <jack@suse.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

ea3d7209

07 12月, 2015 1 次提交

vfs: Distinguish between full xattr names and proper prefixes · 98e9cb57

由 Andreas Gruenbacher 提交于 12月 02, 2015

Add an additional "name" field to struct xattr_handler.  When the name
is set, the handler matches attributes with exactly that name.  When the
prefix is set instead, the handler matches attributes with the given
prefix and with a non-empty suffix.

This patch should avoid bugs like the one fixed in commit c361016a in
the future.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NJames Morris <james.l.morris@oracle.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

98e9cb57

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功